You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Piotr Nowojski <pn...@apache.org> on 2022/05/03 10:24:45 UTC

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Hi,

Sorry for chipping in so late, but I was OoO for the last two weeks.
Regarding the interfaces, I would be actually against adding those methods
to the base interfaces for the reasons mentioned above. Clogging the base
interface for new users with tons of methods that they do not need, do not
understand and do not know what to do with them. Moreover, such decorative
interfaces are solving a problem if a feature requires two or more methods
to be implemented at the same time. If we have all of the methods with
default implementation in the base interface, the API doesn't give any clue
to the user which set of methods are required to be implemented at the same
time.

> a) I feel the biggest drawback of decorative interfaces is which interface
> they can decorate and which combinations of multiple decorative interfaces
> are valid. (...)
> In the future, if there is a new feature added
> (e.g. sorted or pre-partitioned data aware), are we going to create
another
> interface of SplitReader such as SortedSplitReader or PrePartitionedAware?
> Can they be combined? So I think the additional decorative interface like
> withSplitsAlignment actually increases the understanding cost of users
> because they have to know what decorative interfaces are there, which
> interface they can decorate and which combinations of the decorative
> interfaces are valid and which are not. Ideally we want to avoid that.

I'm not sure if I understand how embedding default methods in the base
interface is solving the problem: what can be combined or not? If there are
two conflicting features, having decorative interfaces that can not be
mixed together actually makes much more sense to me rather than having them
all in one base class. How would you allow users to implement only one of
those two features?

To reiterate on the issue. Yes, there are drawbacks:
- how can a user discover what decorators are there?
- how can a user know where the decorator can be applied to?

However those are drawbacks for more power users, that can be mitigated by
the documentation. For example listing all of the decorators with
detailed explanation both in the docs and in the java docs. More
experienced users will be able to deal with those issues easier, as they
will already have some basic understanding of Flink. Also if user has a
problem that he wants to solve, he will google search a potential solution
to his problem anyway, and while doing that he is very likely to discover
the decorator that he needs anyway in the docs.

The advantages are:
- clean and easy to implement base interface
- all of the methods from a single feature are grouped in a single
decorator interface, together with their dedicated java doc. It's also
easier to google search for help using the decorator name
- if an optional feature requires two methods to be implemented at once,
decorator can guarantee that

Here I would strongly lean towards making life easier for new users,
lowering the entry barrier, at the (imo) slight expense for the power users.

Best,
Piotrek


wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com> napisał(a):

> Thanks for the reply Sebastian and Dawid,
>
> I think Sebastion has a good summary. This is a really helpful discussion.
>
> Thinking a bit more, I feel that it might still be better to add the
> supportsXXX() method in the Source rather than SourceReader.
>
> Generally speaking, what we are trying to do here is to let the Flink
> framework know what the Source is capable of. In this FLIP, it happens to
> be the capability that only involves SourceReader. But in the future, it is
> possible that another functionality involves both the SplitEnumerator and
> SourceReader. In that case, following the current approach, we should put
> the "supportsXXX()" method in both SplitEnumerator and SourceReader.
> Because if we only put this in the SourceReader, then the JM would have to
> create a SourceReader in order to know whether this feature is supported,
> which is a little ugly. But if we put the "supportsXXX()" method in the
> Source, we will break the "symmetric" design because this FLIP chose a
> different way.
>
> This is also why I think supportsXXX() method seems a good thing to have,
> because when there are a few interfaces / methods that are expected to be
> implemented at the same time in order to deliver a feature, it is always
> good to have a single source of truth to tell the framework what to do, so
> the framework can do consistent things in different parts.
>
> @Sebastian Mattheis <se...@ververica.com>
>
> Regarding interface flavor b), i.e. AlignedSourceReader +
> AlignedSplitReader, what I feel awkward about is that we are essentially
> expecting almost all the SourceReader implementations to extend
> SourceReaderBase, which effectively makes the SourceReader interface
> without the pausing support useless. So this indicates that public
> decorative interfaces (or sub-interfaces for the same purpose) only
> make sense if the original interface is also expected to be used.
> Otherwise, it seems makes more sense to add the method to the original
> interface itself.
>
> Cheers,
>
> Jiangjie (Becket) Qin
>
>
>
>
> On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
> > Thanks @Sebastian for the nice summary.
> >
> > I think most of your points aligned with the suggestions I made to the
> > FLIP, while you were writing your reply (I believe we hit enter nearly at
> > the same time ;) )
> >
> > Two points after we synced offline
> >
> > 1. I changed also the supportsWatermarksSplitAlignment to
> > supportsPausingSplits to express the general capability of pausing.
> >
> > 2. As for if we should PausingSourceReader/PausingSplitReader (option b)
> > or if we should just add the methods (option c), I suggest to simply add
> > the two methods as I felt this is much preferred approach Becket, which
> > others do not object. Unless there is an opposition let's go with this
> > option c.
> >
> > Best,
> >
> > Dawid
> > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> >
> > Hi folks,
> >
> > Sorry for being a bit silent. Many thanks for all the input and
> > suggestions. As I'm a bit new, I needed some time to catch up and
> structure
> > (for myself) the discussion and I wanted to find a way to structure the
> > conclusions. (Also because I had the feeling that some concerns got lost
> in
> > the discussion.) This is my attempt and please correct me if something is
> > wrong or misunderstood. I tried to collect and assemble the opinions,
> > suggestions, and conclusions (to the best of my knowledge):
> >
> > # Top A: Should split alignment (pause/resume behavior) be a general
> > capability?
> >
> > I personally don't see any reason no to have it a general capability
> > because for the alignSplit method it is actually independent of the
> > watermarks. If we agree here to have it a general capability, we should
> > also agree on the right wording. Does "alignSplits(splitsToResume,
> > splitsToPause)" refer to what is then actually meant? (I see it as okay.
> I
> > don't have any better idea whilst Arvid suggested "pauseOrResumeSplits".)
> >
> > # Top B: Should it be possible do enable/disable split alignment?
> >
> > I would personally not disable the split alignment on the source reader
> > side because if split alignment is used for some other use case (see A)
> it
> > could have nasty side effects on other/future use cases. Instead, I would
> > disable "watermark split alignment" where I think it should disable the
> > watermark-dependent trigger for split alignment.
> >
> > # Top C: Should we add a supportsX method?
> >
> > I find it difficult to define the scope of a supportsX method w.r.t. to
> > the following questions: a) Where is it used? and b) What is the expected
> > output? To b), it's not straight-forward to provide a meaningful output,
> > e.g., if SourceReader supports split alignment but SplitReader not. This
> is
> > because with the current implementation, we can determine whether split
> > alignment is fully supported only during runtime and specifically
> actually
> > only when calling alignSplits down the call hierarchy up to the actual
> > SplitReaders.
> >
> > Therefore, I would suggest to either raise an error or warning if the
> > alignment is called but not supported at some point. I know we should
> > carefully think about when this could be the case because we don't want
> to
> > flood anybody with such warnings. However, warnings could be an indicator
> > for the user that for watermark split alignment use case split reading is
> > imbalanced with the conclusion to either disable the trigger for
> watermark
> > split alignment (see Top B) or to use/implement a source and reader that
> > fully supports split alignment.
> >
> > # Top D: How to design interfaces?
> >
> > Thanks for structuring the discussion with the the various possibilities
> > (a-d). From the discussion and emails, I would like to summarize the
> > following requirements:
> > - Interfaces should be consistent ("symmetric"), i.e., similar semantics
> > should have similar interfaces with similar usage.
> > - Make explicit which implementations implement interfaces/support
> > behavior.
> > - Make clear what are default implementations and how to implement
> > interfaces with desired behavior.
> >
> > This is a simplified view of the relations between relevant classes of
> the
> > PoC implementation:
> >
> > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- .. <|--
> > MySourceReader
> >
> > MySourceReader <>-- SplitFetcherManager (Internal) <>-- SplitFetcher
> > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
> >
> > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> >
> > Note that SourceReaderBase and SplitFetcherManager implement most of the
> > "logic" for split alignment just because we wanted to implement split
> > alignment and wanted it to be available as kind of a default. As a
> > consequence, we have a "default implementation" for SourceReader that
> > implements the actual logic for split alignment. For that reason, I find
> it
> > very confusing to have a NOOP default implementation in the interface for
> > the SourceReader. As a consequence, interface strategy c) is difficult
> > because this would require NOOP default implementations in the public
> > interfaces of SourceReader and SplitReader. This is the same for strategy
> > d) because it would require NOOP default implementation in the
> > SourceReader. Further, as Dawid described method signatures of alignSplit
> > for SourceReader and SplitReader differ and it would be extremely
> difficult
> > to make the signatures the same (with even potential performance impact
> > because of additional loop-ups of split ids). Therefore, having a
> symmetric
> > decorative interface as of strategy a) is actually not possible and
> having
> > two decorative interfaces with different method signatures is confusing.
> My
> > conclusion is that we are best with strategy b) which means to have
> > specializing sub-interfaces that inherit from the parent interface:
> > SourceReader <|-- AlignedSourceReader, SplitReader <|--
> AlignedSplitReader
> > With this option, I'm not 100% sure what the implications are and if this
> > could get nasty. I would suggest that Dawid and I just try to implement
> and
> > see if we like it. :)
> >
> > # Summary
> >
> > In conclusion, please let me know your perspectives. Please correct me,
> if
> > something is wrong or if I misunderstood something. My perspective would
> be:
> >
> > Top A: Yes
> > Top B: Yes (but disable watermark trigger for split alignment)
> > Top C: No
> > Top D: b)
> >
> > Best,
> > Sebastian
> >
> > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <dwysakowicz@apache.org
> >
> > wrote:
> >
> >> @Arvid:
> >>
> >> While I also like Becket's capability approach, I fear that it doesn't
> work
> >> for this particular use case: Sources can always be aligned cross-task
> and
> >> this is just about intra-task alignment. So it's plausible to put
> sources
> >> into an alignment group even though they do not use any of the presented
> >> API of FLIP-217. They should just issue a warning, if they handle
> multiple
> >> splits (see motivation section).
> >>
> >> Yes, but the "supportXXX" method would be for telling if it supports
> that
> >> intra-task alignment. Cross-task alignment would always be supported.
> >>
> >> I updated interfaces to what I believe to be closest to a consensus
> >> between all participants. Do you mind taking a look?
> >>
> >> @Sebastian Do you mind addressing the nits?
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 25/04/2022 13:39, Arvid Heise wrote:
> >>
> >> Thanks for pushing this effort.
> >>
> >> I'd actually be in favor of 1b). I fully agree that decorator interfaces
> >> should be avoided but I'm also not a big fan of overloading the base
> >> interfaces (they are hard to implement as is). The usual feedback to
> >> Source-related interfaces are always that they are overwhelming and too
> >> hard to implement. However, I'd also not oppose 1c) as scattered
> interfaces
> >> also have drawbacks. I'd just dislike 1a) and 1d).
> >> While I also like Becket's capability approach, I fear that it doesn't
> work
> >> for this particular use case: Sources can always be aligned cross-task
> and
> >> this is just about intra-task alignment. So it's plausible to put
> sources
> >> into an alignment group even though they do not use any of the presented
> >> API of FLIP-217. They should just issue a warning, if they handle
> multiple
> >> splits (see motivation section).
> >>
> >> I think renaming alignSplits to facilitate future use cases makes sense
> but
> >> then all interfaces (if 1c) is chosen) should be adjusted accordingly.
> >> AlignedSourceReader could be PausingSourceReader and I'd go for
> >> pauseOrResumeSplits (Becket's proposal afaik). We could also split it
> into
> >> pauseSplit and resumeSplit. While pauseOrResumeSplits may allow Sources
> to
> >> just use 1 instead of 2 library calls (as written in the Javadoc), both
> >> Kafka and Pulsar can't use it and I'm not sure if there is a system that
> >> can.
> >>
> >> Some nit for the FLIP:
> >> - Please replace "stop" with "pause".
> >> - Not sure if it's worth it in the capability section: Sources that
> adopt
> >> this interface cannot be used in earlier versions. So it feels like we
> are
> >> only forward compatible (old sources can be used after the change); but
> I
> >> guess this holds for any API addition.
> >> - You might want to add what happens when all splits are paused.
> >> - You may want to describe how the 3 flavors of SourceReaderBase
> interact
> >> with the interface.
> >> - I'm not sure if it makes sense to include Kafka and Pulsar in the
> FLIP.
> >> For me, this is rather immediate follow-up work. (could be in the same
> >> umbrella ticket)
> >>
> >> Best,
> >>
> >> Arvid
> >>
> >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> dwysakowicz@apache.org> <dw...@apache.org>
> >> wrote:
> >>
> >>
> >> a) "MySourceReader implements SourceReader, WithSplitsAlignment", along
> >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
> >> b) "MySourceReader implements AlignedSourceReader" and "MySplitReader
> >> implements AlignedSplitReader", or
> >> c) "MySourceReader implements SourceReader" and "MySplitReader
> implements
> >> SplitReader".
> >>
> >> I think the latest proposal according to Dawid would be:
> >> d) "MySourceReader implements SourceReader" and "MySplitReader
> implements
> >> AlignedSplitReader".
> >> I am fine with this API, although personally speaking I think it is
> simpler
> >> to just add a new method to the split reader with default impl.
> >>
> >>
> >> I think that is a good idea to have it aligned as much as possible. I'd
> be
> >> +1 for your option c). We can merge AlignedSplitReader with
> SplitReader. We
> >> will update the FLIP shortly.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 25/04/2022 12:43, Becket Qin wrote:
> >>
> >> Thanks for the comment, Jark.
> >>
> >> 3. Interface/Method Name.
> >>
> >> Can the interface be used to align other things in the future? For
> example,
> >> align read speed, I have
> >> seen users requesting global rate limits. This feature may also need an
> >> interface like this.
> >> If we don't plan to extend this interface to support align other
> things, I
> >> suggest explicitly declaring
> >> the purpose of the methods, such as `alignWatermarksForSplits` instead
> of
> >> `alignSplits`.
> >>
> >> This is a good point. Naming wise, it would usually be more extensible
> to
> >> just describe what the method actually does, instead of assuming the
> >> purpose of doing this. For example, in this case, pauseOrResumeSplits()
> >> would be more extensible because this can be used for any kind of flow
> >> control, be it watermark alignment or simple rate limiting.
> >>
> >> 4. Interface or Method.
> >>
> >> I don't have a strong opinion on this. I think they have their own
> >> advantages.
> >> In Flink SQL, we heavily use Interfaces for extending abilities
> >> (SupportsXxxx) for TableSource/TableSink,
> >> and I prefer Interfaces rather than methods in this case. When you have
> a
> >> bunch of abilities and each ability
> >> has more than one method, Interfaces can help to organize them and make
> >> users clear which methods
> >> need to implement when you want to have an ability.
> >>
> >> I am OK with decorative interfaces if this is a general design pattern
> in
> >> the other components in Flink. But it looks like the current API
> proposal
> >> is not symmetric.
> >>
> >> The current proposal is essentially "MySourceReader implements
> >> SourceReader, WithSplitsAlignment", along with "MySplitReader implements
> >> AlignedSplitsReader".
> >>
> >> Should we make the API symmetric? I'd consider any one of the following
> as
> >> symmetric.
> >>
> >> a) "MySourceReader implements SourceReader, WithSplitsAlignment", along
> >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
> >> b) "MySourceReader implements AlignedSourceReader" and "MySplitReader
> >> implements AlignedSplitReader", or
> >> c) "MySourceReader implements SourceReader" and "MySplitReader
> implements
> >> SplitReader".
> >>
> >> I think the latest proposal according to Dawid would be:
> >> d) "MySourceReader implements SourceReader" and "MySplitReader
> implements
> >> AlignedSplitReader".
> >> I am fine with this API, although personally speaking I think it is
> simpler
> >> to just add a new method to the split reader with default impl.
> >>
> >> @Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org> <dw...@apache.org>, thanks for the reply.
> >>
> >> Having said that, as I don't have a preference and I agree most of the
> >>
> >> sources will support the alignment I am fine following your suggestion
> to
> >> have the SourceReader extending from WithWatermarksSplitsAlignment, but
> >> would put the "supportsXXX" there, not in the Source to keep the two
> >> methods together.
> >>
> >> One benefit of having the "supportsXXX" in Source is that this allows
> some
> >> compile time check. For example, if a user enabled watermark alignment
> >> while it is not supported by the Source, an exception can be thrown at
> >> compile time. It seems in general useful. That said, I agree that API
> >> cleanliness wise it is better to put the two methods together.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
> imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
> >>
> >>
> >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
> >> I don't have other concerns then.
> >>
> >> Best,
> >> Jark
> >>
> >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <dw...@apache.org>
> <dw...@apache.org> <dw...@apache.org> <dw...@apache.org>
> >> wrote:
> >>
> >>
> >> @Jark:
> >>
> >> 1. Will the framework always align with watermarks when the source
> >> implements the interface?
> >> I'm afraid not every case needs watermark alignment even if Kafka
> >> implements the interface,
> >> and this will affect the throughput somehow. I agree with Becket
> >> we may need a
> >> `supportSplitsAlignment()` method for users to configure the source to
> >> enable/disable the alignment.
> >>
> >> 2. How does the framework calculate maxDesiredWatermark?
> >> I think the algorithm of maxDesiredWatermark will greatly affect
> >>
> >> throughput
> >>
> >> if the reader is constantly
> >>  switching between pause and resume. Can users configure the alignment
> >> offset?
> >>
> >>
> >> This is covered in the previous FLIP[1] which has been already
> >>
> >> implemented
> >>
> >> in 1.15. In short, it must be enabled with the watermark strategy which
> >> also configures drift and update interval.
> >>
> >> If we don't plan to extend this interface to support align other things,
> >>
> >> I
> >>
> >> suggest explicitly declaring
> >> the purpose of the methods, such as `alignWatermarksForSplits` instead
> of
> >> `alignSplits`.
> >>
> >>
> >> Sure let's rename it.
> >>
> >> @Becket:
> >>
> >> I understand your point. On the other hand putting all methods, even
> with
> >> "supportsXXX" methods for enabling certain features, makes the entry
> >> threshold for writing a new source higher. Instead of focusing on the
> >>
> >> basic
> >>
> >> and required properties of the Source, the person implementing a source
> >> must bother with and need to figure out what all of the extra features
> >>
> >> are
> >>
> >> about and how to deal with them. It makes it also harder to organize
> >> methods in coupled groups as Jark said.
> >>
> >> Having said that, as I don't have a preference and I agree most of the
> >> sources will support the alignment I am fine following your suggestion
> to
> >> have the SourceReader extending from WithWatermarksSplitsAlignment, but
> >> would put the "supportsXXX" there, not in the Source to keep the two
> >> methods together.
> >>
> >> Lastly, I agree it is really unfortunate the "alignSplits" methods
> differ
> >> slightly for SourceReader and SpitReader. The reason for that is
> >> SourceReaderBase deals only with SplitIds, whereas SplitReader needs the
> >> actual splits to pause them. I found the discrepancy acceptable for the
> >> sake of simplifying changes significantly, especially as they would
> >>
> >> highly
> >>
> >> likely impact performance as we would have to perform additional
> lookups.
> >> Moreover the SplitReader is a secondary interface.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> >>
> >> On 24/04/2022 17:15, Jark Wu wrote:
> >>
> >> Thanks for the effort, Dawid and Sebastian!
> >>
> >> I just have some minor questions (maybe I missed something).
> >>
> >> 1. Will the framework always align with watermarks when the source
> >> implements the interface?
> >> I'm afraid not every case needs watermark alignment even if Kafka
> >> implements the interface,
> >> and this will affect the throughput somehow. I agree with Becket
> >> we may need a
> >> `supportSplitsAlignment()` method for users to configure the source to
> >> enable/disable the alignment.
> >>
> >> 2. How does the framework calculate maxDesiredWatermark?
> >> I think the algorithm of maxDesiredWatermark will greatly affect
> >>
> >> throughput
> >>
> >> if the reader is constantly
> >>  switching between pause and resume. Can users configure the alignment
> >> offset?
> >>
> >> 3. Interface/Method Name.
> >> Can the interface be used to align other things in the future? For
> >>
> >> example,
> >>
> >> align read speed, I have
> >> seen users requesting global rate limits. This feature may also need an
> >> interface like this.
> >> If we don't plan to extend this interface to support align other things,
> >>
> >> I
> >>
> >> suggest explicitly declaring
> >> the purpose of the methods, such as `alignWatermarksForSplits` instead
> of
> >> `alignSplits`.
> >>
> >> 4. Interface or Method.
> >> I don't have a strong opinion on this. I think they have their own
> >> advantages.
> >> In Flink SQL, we heavily use Interfaces for extending abilities
> >> (SupportsXxxx) for TableSource/TableSink,
> >> and I prefer Interfaces rather than methods in this case. When you have
> a
> >> bunch of abilities and each ability
> >> has more than one method, Interfaces can help to organize them and make
> >> users clear which methods
> >> need to implement when you want to have an ability.
> >>
> >>
> >> Best,
> >> Jark
> >>
> >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <be...@gmail.com> <
> becket.qin@gmail.com> <be...@gmail.com> <be...@gmail.com> <
> >> becket.qin@gmail.com> wrote:
> >>
> >> Hi Dawid,
> >>
> >> Thanks for the explanation. Apologies that I somehow misread a bunch of
> >> "align" and thought they were "assign".
> >>
> >> Regarding 1, by default implementation, I was thinking of the default
> >>
> >> no-op
> >>
> >> implementation. I am a little worried about the proliferation of
> >>
> >> decorative
> >>
> >> interfaces. I think the most important thing about interfaces is that
> >>
> >> they
> >>
> >> are easy to understand. In this case, I prefer adding new method to the
> >> existing interface for the following reasons:
> >>
> >> a) I feel the biggest drawback of decorative interfaces is which
> >>
> >> interface
> >>
> >> they can decorate and which combinations of multiple decorative
> >>
> >> interfaces
> >>
> >> are valid. In the current FLIP, the withSplitsAlignment interface is
> only
> >> applicable to the SourceReader which means it can't decorate any other
> >> interface. From an interface design perspective, a natural question is
> >>
> >> why
> >>
> >> not let "AlignedSplitReader" extend "withSplitsAlignment"? And it is
> also
> >> natural to assume that a split reader implementing both SplitReader and
> >> WithSplitAlignment would work, because a source reader implementing
> >> SourceReader and withSplitsAlignment works. So why isn't there an
> >>
> >> interface
> >>
> >> of AlignedSourceReader? In the future, if there is a new feature added
> >> (e.g. sorted or pre-partitioned data aware), are we going to create
> >>
> >> another
> >>
> >> interface of SplitReader such as SortedSplitReader or
> >>
> >> PrePartitionedAware?
> >>
> >> Can they be combined? So I think the additional decorative interface
> like
> >> withSplitsAlignment actually increases the understanding cost of users
> >> because they have to know what decorative interfaces are there, which
> >> interface they can decorate and which combinations of the decorative
> >> interfaces are valid and which are not. Ideally we want to avoid that.
> To
> >> be clear, I am not opposing having an interface of withSplitsAlignment,
> >>
> >> it
> >>
> >> is completely OK to have it as an internal interface and let
> SourceReader
> >> and SplitReader both extend it.
> >>
> >> b) Adding a new method to the SourceReader with a default implementation
> >>
> >> of
> >>
> >> no-op would help avoid logic branching in the source logic, especially
> >> given that we agree that the vast majority of the SourceReader
> >> implementations, if not all, would just extend from the
> SourceReaderBase.
> >> That means adding a new method to the interface would effectively give
> >>
> >> the
> >>
> >> same user experience, but simpler.
> >>
> >> c) A related design principle that may be worth discussing is how do we
> >>
> >> let
> >>
> >> the Source implementations tell Flink what capability is supported and
> >>
> >> what
> >>
> >> is not. Personally speaking I feel the most intuitive place to me is in
> >>
> >> the
> >>
> >> Source itself, because that is the entrance of the entire Source
> >>
> >> connector
> >>
> >> logic.
> >>
> >> Based on the above thoughts, I am wondering if the following interface
> >> would be easier to understand by the users.
> >>
> >> - Change "withSplitsAlignment" to internal interface, let both
> >>
> >> SourceReader
> >>
> >> and SplitReader extend from it, with a default no-op implementation.
> >> - Add a new method "boolean supportSplitsAlignment()" to the Source
> >> interface, with a default implementation returning false. Sources that
> >>
> >> have
> >>
> >> implemented the alignment logic can change this to return true, and
> >> override the alignSplits() methods in the SourceReader / SplitReader if
> >> needed.
> >> - In the future, if a new optional feature is going to be added to the
> >> Source, and that feature requires the awareness from Flink, we can add
> >>
> >> more
> >>
> >> such methods to the Source.
> >>
> >> What do you think?
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >>
> >>
> >>
> >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> dwysakowicz@apache.org> <dw...@apache.org> <dw...@apache.org>
> <dw...@apache.org>
> >> <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org> <dw...@apache.org>
> >>
> >> wrote:
> >>
> >>
> >> @Konstantin:
> >>
> >> As part of this FLIP, the `AlignedSplitReader` interface (aka the stop &
> >> resume behavior) will be implemented for Kafka and Pulsar only, correct?
> >>
> >> Correct, as far as I know though, those are the only sources which
> >>
> >> consume
> >>
> >> concurrently from multiple splits and thus alignment applies.
> >>
> >> @Thomas:
> >>
> >> I wonder if "supporting" split alignment in SourceReaderBase and then
> >>
> >> doing
> >>
> >> nothing if the split reader does not implement AlignedSplitReader could
> >>
> >> be
> >>
> >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> >> specific source reader (i.e. KafkaSourceReader) to make it explicit that
> >> the source actually supports it.
> >>
> >> I understand your concern. Hmm, I think we could actually do that. Given
> >> the actual implementation of the SourceReaderBase#alignSplits is rather
> >> short (just a forward to the corresponding method of SplitFetcher), we
> >> could reimplement it in the actual source implementations. This solution
> >> has the downside though. Authors of new sources would have to do two
> >> things: extend from AlignedSplitReader and implement
> >>
> >> WithSplitsAssignment,
> >>
> >> instead of just extending AlignedSplitReader. I would be fine with such
> a
> >> tradeoff though. What others think?
> >>
> >> @Steven:
> >>
> >> For this part from the motivation section, is it accurate? Let's assume
> >>
> >> one
> >>
> >> source task consumes from 3 partitions and one of the partition is
> >> significantly slower. In this situation, watermark for this source task
> >> won't hold back as it is reading recent data from other two Kafka
> >> partitions. As a result, it won't hold back the overall watermark. I
> >> thought the problem is that we may have late data for this slow
> >>
> >> partition.
> >>
> >> It will hold back the watermark. Watermark of an operator is the minimum
> >> of watermarks of all splits[1]
> >>
> >> I have another question about the restart. Say split alignment is
> >> triggered. checkpoint is completed. job failed and restored from the
> last
> >> checkpoint. because alignment decision is not checkpointed, initially
> >> alignment won't be enforced until we get a cycle of watermark
> aggregation
> >> and propagation, right? Not saying this corner is a problem. Just want
> to
> >> understand it more.
> >>
> >> Your understanding is correct.
> >>
> >> @Becket:
> >>
> >> 1. I think watermark alignment is sort of a general use case, so should
> >>
> >> we
> >>
> >> just add the related methods to SourceReader directly instead of
> >> introducing the new interface of WithSplitAssignment? We can provide
> >> default implementations, so backwards compatibility won't be an issue.
> >>
> >> I don't think we can provide a default implementation. How would we do
> >> that? Would it be just a no-op? Is it better than having an opt-in
> >> interface? The default implementation would have to be added exclusively
> >>
> >> in
> >>
> >> a *Public* SourceReader interface. By the way notice SourceReaderBase
> >> does extend from WithSplitsAlignment, so effectively all implementations
> >>
> >> do
> >>
> >> handle the alignment case. To be honest I think it is impossible to
> >> implement the SourceReader interface directly by end users.
> >>
> >> 2. As you mentioned, the SplitReader interface probably also needs some
> >> change to support throttling at the split granularity. Can you add that
> >> interface change into the public interface section as well?
> >>
> >> It has been added from the beginning. See *AlignedSplitReader.*
> >>
> >> 3. Nit, can we avoid using the method name assignSplits here, given that
> >>
> >> it
> >>
> >> is not actually changing the split assignments? It seems something like
> >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
> >>
> >> The method's called *alignSplits*, not assign. Do you still prefer a
> >> different name for that? Personally, I am open for suggestions here.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> [1]
> >>
> >>
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> >>
> >> On 22/04/2022 05:59, Becket Qin wrote:
> >>
> >> Thanks for driving the effort, Sebastion. I think the motivation makes a
> >> lot of sense. Just a few suggestions / questions.
> >>
> >> 1. I think watermark alignment is sort of a general use case, so should
> >>
> >> we
> >>
> >> just add the related methods to SourceReader directly instead of
> >> introducing the new interface of WithSplitAssignment? We can provide
> >> default implementations, so backwards compatibility won't be an issue.
> >>
> >> 2. As you mentioned, the SplitReader interface probably also needs some
> >> change to support throttling at the split granularity. Can you add that
> >> interface change into the public interface section as well?
> >>
> >> 3. Nit, can we avoid using the method name assignSplits here, given that
> >>
> >> it
> >>
> >> is not actually changing the split assignments? It seems something like
> >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <st...@gmail.com> <
> stevenz3wu@gmail.com> <st...@gmail.com> <st...@gmail.com> <
> >> stevenz3wu@gmail.com> <
> >> stevenz3wu@gmail.com> wrote:
> >>
> >> However, a single source operator may read data from multiple
> >>
> >> splits/partitions, e.g., multiple Kafka partitions, such that even with
> >> watermark alignment the source operator may need to buffer excessive
> >>
> >> amount
> >>
> >> of data if one split emits data faster than another.
> >>
> >> For this part from the motivation section, is it accurate? Let's assume
> >>
> >> one
> >>
> >> source task consumes from 3 partitions and one of the partition is
> >> significantly slower. In this situation, watermark for this source task
> >> won't hold back as it is reading recent data from other two Kafka
> >> partitions. As a result, it won't hold back the overall watermark. I
> >> thought the problem is that we may have late data for this slow
> >>
> >> partition.
> >>
> >> I have another question about the restart. Say split alignment is
> >> triggered. checkpoint is completed. job failed and restored from the
> last
> >> checkpoint. because alignment decision is not checkpointed, initially
> >> alignment won't be enforced until we get a cycle of watermark
> aggregation
> >> and propagation, right? Not saying this corner is a problem. Just want
> to
> >> understand it more.
> >>
> >>
> >>
> >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org> <
> thw@apache.org> <th...@apache.org> <th...@apache.org> <
> >> thw@apache.org> <
> >> thw@apache.org> wrote:
> >>
> >> Thanks for working on this!
> >>
> >> I wonder if "supporting" split alignment in SourceReaderBase and then
> >>
> >> doing
> >>
> >> nothing if the split reader does not implement AlignedSplitReader could
> >>
> >> be
> >>
> >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> >> specific source reader (i.e. KafkaSourceReader) to make it explicit that
> >> the source actually supports it.
> >>
> >> Thanks,
> >> Thomas
> >>
> >>
> >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <kn...@apache.org> <
> knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
> >> knaufk@apache.org> <
> >> knaufk@apache.org>
> >>
> >> wrote:
> >>
> >>
> >> Hi Sebastian, Hi Dawid,
> >>
> >> As part of this FLIP, the `AlignedSplitReader` interface (aka the stop
> >>
> >> &
> >>
> >> resume behavior) will be implemented for Kafka and Pulsar only,
> >>
> >> correct?
> >>
> >> +1 in general. I believe it is valuable to complete the watermark
> >>
> >> aligned
> >>
> >> story with this FLIP.
> >>
> >> Cheers,
> >>
> >> Konstantin
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> dwysakowicz@apache.org> <dw...@apache.org>
> >>
> >> wrote:
> >>
> >>
> >> To be explicit, having worked on it, I support it ;) I think we can
> >> start a vote thread soonish, as there are no concerns so far.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> >>
> >> Dear Flink developers,
> >>
> >> I would like to open a discussion on FLIP 217 [1] for an extension
> >>
> >> of
> >>
> >> Watermark Alignment to perform alignment also in SplitReaders. To
> >>
> >> do
> >>
> >> so,
> >>
> >> SplitReaders must be able to suspend and resume reading from split
> >>
> >> sources
> >>
> >> where the SourceOperator coordinates and controlls suspend and
> >>
> >> resume.
> >>
> >> To
> >>
> >> gather information about current watermarks of the SplitReaders, we
> >>
> >> extend
> >>
> >> the internal WatermarkOutputMulitplexer and report watermarks to
> >>
> >> the
> >>
> >> SourceOperator.
> >>
> >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
> >>
> >> revised
> >>
> >> and
> >>
> >> reworked by Dawid Wysakowicz (He did most of the work.) and me. The
> >>
> >> changes
> >>
> >> are backwards compatible in a way that if affected components do
> >>
> >> not
> >>
> >> support split alignment the behavior is as before.
> >>
> >> Best,
> >> Sebastian
> >>
> >> [1]
> >>
> >>
> >>
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> >>
> >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
> >>
> >> --
> >>
> >> Konstantin Knaufhttps://twitter.com/snntrablehttps://github.com/knaufk
> >>
> >>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Thomas Weise <th...@apache.org>.
Hi Sebastian,

Thank you for updating the FLIP page. It looks good and I think you
can start a VOTE.

Thomas

On Tue, Jul 26, 2022 at 10:57 AM Sebastian Mattheis
<se...@ververica.com> wrote:
>
> Hi everybody,
>
> I have updated FLIP-217 [1] and have implemented the respective changes in
> [2]. Please review. If there are no concerns, I would initiate the voting
> on Thursday.
>
> Best regards,
> Sebastian
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> [2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>
> On Mon, Jul 25, 2022 at 9:19 AM Piotr Nowojski <pn...@apache.org> wrote:
>
> > Thanks for the update Sebastian :)
> >
> > Best,
> > Piotrek
> >
> > pon., 25 lip 2022 o 08:12 Sebastian Mattheis <se...@ververica.com>
> > napisał(a):
> >
> >> Hi everybody,
> >>
> >> I discussed last week the semantics and an implementation stragegy of the
> >> configuration parameter with Piotr and did the implementation and some
> >> tests this weekend.
> >>
> >> A short summary of what I discussed and recapped with Piotr:
> >>
> >>    - The configuration parameter allows (and tolerates) the use of
> >>    `SourceReader`s that do not implement `pauseOrResumeSplits` method. (The
> >>    exception is ignored in `SourceOperator`.)
> >>    - The configuration parameter allows (and tolerates) the use of
> >>    `SourceSplitReader`s that do not implement `pauseOrResumeSplits` method.
> >>    (The exception is ignored in the `PauseResumeSplitsTask` of the
> >>    `SplitFetcher`.)
> >>
> >> In particular, this means that a `SourceReader` with two `SplitReader`s
> >> where one does not implement `pauseOrResumeSplits` and the other does. It
> >> will allow the use of the one that doesn't and will, nevertheless, still
> >> attempt to pause/resume the other. (Consequently, if the one that doesn't
> >> support pause is ahead it simply cannot not pause the `SplitReader` but if
> >> the other is ahead it will be paused until watermarks are aligned.)
> >>
> >> There is one flaw that I don't really like but which I accept as from the
> >> discussion and which I will add/update in the FLIP:
> >> If there is any other mechanism (e.g. other than watermark alignment)
> >> that attempts to pause or resume `SplitReader`s, it will have side effects
> >> and potential unexpected behavior if one or more `SplitReader`s do not
> >> implement `pauseOrResumeSplits` and the user set the configuration
> >> parameter to allow/tolerate it for split-level watermark alignment. (The
> >> reason is simply that we cannot differentiate which mechanism attempts to
> >> pause/resume, i.e., if it used for watermark alignment or something else.)
> >> Given that this configuration parameter is supposed to be an intermediate
> >> fallback, it is acceptable for me but changed at latest when some other
> >> mechanism uses pauseOrResumeSplits.
> >>
> >> As for the parameter naming, I have implemented it the following way
> >> (reason: There exists a parameter `pipeline.auto-watermark-interval`.):
> >>
> >> pipeline.watermark-alignment.allow-unaligned-source-splits (default:
> >> false)
> >>
> >> Status: I have implemented the configuration parameter (and an IT case).
> >> I still need to update the FLIP and will ping you (tomorrow or so) when I'm
> >> done with that. Please check/review my description from above if you see
> >> any problems with that.
> >>
> >> Thanks a lot and regards,
> >> Sebastian
> >>
> >>
> >> On Wed, Jul 20, 2022 at 11:24 PM Thomas Weise <th...@apache.org> wrote:
> >>
> >>> Hi Sebastian,
> >>>
> >>> Thank you for updating the FLIP and sorry for my delayed response. As
> >>> Piotr pointed out, we would need to incorporate the fallback flag into
> >>> the design to reflect the outcome of the previous discussion.
> >>>
> >>> Based on the current FLIP and as detailed by Becket, the
> >>> SourceOperator coordinates the alignment. It is responsible for the
> >>> pause/resume decision and knows how many splits are assigned.
> >>> Therefore shouldn't it have all the information needed to efficiently
> >>> handle the case of UnsupportedOperationException thrown by a reader?
> >>>
> >>> Although the fallback requires some extra implementation effort, I
> >>> think that is more than offset by not surprising users and offering a
> >>> smoother migration path. Yes, the flag is a temporary feature that
> >>> will become obsolete in perhaps 2-3 releases (can we please also
> >>> include that into the FLIP?). But since it would be just a
> >>> configuration property that can be ignored at that point (for which
> >>> there is precedence), no code change will be forced on users.
> >>>
> >>> As for the property name, perhaps the following would be even more
> >>> descriptive?
> >>>
> >>> coarse.grained.wm.alignment.fallback.enabled
> >>>
> >>> Thanks!
> >>> Thomas
> >>>
> >>>
> >>> On Wed, Jul 13, 2022 at 10:59 AM Becket Qin <be...@gmail.com>
> >>> wrote:
> >>> >
> >>> > Thanks for the explanation, Sebastian. I understand your concern now.
> >>> >
> >>> > 1. About the major concern. Personally I'd consider the coarse grained
> >>> watermark alignment as a special case for backward compatibility. In the
> >>> future, if for whatever reason we want to pause a split and that is not
> >>> supported, it seems the only thing that makes sense is throwing an
> >>> exception, instead of pausing the entire source reader. Regarding this
> >>> FLIP, if the logic that determines which split should be paused is in the
> >>> SourceOperator, the SourceOperator actually knows the reason why it pauses
> >>> a split. It also knows whether there are more than one split assigned to
> >>> the source reader. So it can just fallback to the coarse grained watermark
> >>> alignment, without affecting other reasons of pausing a split, right? And
> >>> in the future, if there are more purposes for pausing / resuming a split,
> >>> the SourceOperator still needs to understand each of the reasons in order
> >>> to resume the splits after all the pausing conditions are no longer met.
> >>> >
> >>> > 2. Naming wise, would "coarse.grained.watermark.alignment.enabled"
> >>> address your concern?
> >>> >
> >>> > The only concern I have for Option A is that people may not be able to
> >>> benefit from split level WM alignment until all the sources they need have
> >>> that implemented. This seems unnecessarily delaying the adoption of a new
> >>> feature, which looks like a more substantive downside compared with the
> >>> "coarse.grained.wm.alignment.enabled" option.
> >>> >
> >>> > BTW, the SourceOperator doesn't need to invoke the
> >>> pauseOrResumeSplit() method and catch the UnsupportedOperation every time.
> >>> A flag can be set so it doesn't attempt to pause the split after the first
> >>> time it sees the exception.
> >>> >
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Jiangjie (Becket) Qin
> >>> >
> >>> >
> >>> >
> >>> > On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis <
> >>> sebastian@ververica.com> wrote:
> >>> >>
> >>> >> Hi Becket, Hi Thomas, Hi Piotrek,
> >>> >>
> >>> >> Thanks for the feedback. I would like to highlight some concerns:
> >>> >>
> >>> >> Major: A configuration parameter like "allow coarse grained
> >>> alignment" defines a semantic that mixes two contexts conditionally as
> >>> follows: "ignore incapability to pause splits in SourceReader/SplitReader"
> >>> IF (conditional) we "allow coarse grained watermark alignment". At the same
> >>> time we said that there is no way to check the capability of
> >>> SourceReader/SplitReader to pause/resume other than observing a
> >>> UnsupportedOperationException during runtime such that we cannot disable
> >>> the trigger for watermark split alignment in the SourceOperator. Instead,
> >>> we can only ignore the incapability of SourceReader/SplitReader during
> >>> execution of a pause/resume attempt which, consequently, requires to check
> >>> the "allow coarse grained alignment " parameter value (to implement the
> >>> conditional semantic). However, during this execution we actually don't
> >>> know whether the attempt was executed for the purpose of watermark
> >>> alignment or for some other purpose such that the check actually depends on
> >>> who triggered the pause/resume attempt and hides the exception potentially
> >>> unexpectedly for some other use case. Of course, currently there is no
> >>> other purpose and, hence, no other trigger than watermark alignment.
> >>> However, this breaks, in my perspective, the idea of having
> >>> pauseOrResumeSplits (re)usable for other use cases.
> >>> >> Minor: I'm not aware of any configuration parameter in the format
> >>> like `allow.*` as you suggested with
> >>> `allow.coarse.grained.watermark.alignment`. Would that still be okay to do?
> >>> >>
> >>> >> As we have agreed to not have a "supportsPausableSplits" method
> >>> because of potential inconsistencies between return value of this method
> >>> and the actual implementation (and also the difficulty to have a meaningful
> >>> return value where the support actually depends on SourceReader AND the
> >>> assigned SplitReaders), I don't want to bring up the discussion about the
> >>> "supportsPauseableSplits" method again. Instead, I see the following
> >>> options:
> >>> >>
> >>> >> Option A: I would drop the idea of "allow coarse grained alignment"
> >>> semantic of the parameter but implement a parameter to "enable/disable
> >>> split watermark alignment" which we can easily use in the SourceOperator to
> >>> disable the trigger of split alignment. This is indeed more static and less
> >>> flexible, because it disables split alignment unconditionally, but it is
> >>> "context-decoupled" and more straight-forward to use. This would also
> >>> address the use case of disabling split alignment for the purpose of
> >>> runtime behavior evaluation, as mentioned by Thomas (if I remember
> >>> correctly.) I would implement the parameter with a default where watermark
> >>> split alignment is enabled which requires users to check their application
> >>> when upgrading to 1.16 if a) there is a source that reads from multiple
> >>> splits and b), if yes, all splits of that source support pause/resume. If
> >>> a) yes and b) no, the user must take action to disable watermark split
> >>> alignment (which disables the trigger of split alignment only for the
> >>> purpose).
> >>> >>
> >>> >> Option B: If we ignore my concern, I would simply check the "allow
> >>> coarse grained watermark alignment" parameter value on every attempt to
> >>> execute pause/resume in the SourceReader and in the SplitReader and will
> >>> not throw UnsupportedOperationException if the parameter value is set to
> >>> true.
> >>> >>
> >>> >> Please note that the parameter is also used only for some kind of
> >>> migration phase. Therefore, I wonder if we need to overcomplicate things.
> >>> >>
> >>> >> @Piotrek, @Becket, @Thomas: I would recommend/favour option A. Please
> >>> let me know your feedback and/or concerns as soon as possible, if possible.
> >>> :)
> >>> >>
> >>> >> Regards,
> >>> >> Sebastian
> >>> >>
> >>> >>
> >>> >> On Wed, Jul 13, 2022 at 9:37 AM Becket Qin <be...@gmail.com>
> >>> wrote:
> >>> >>>
> >>> >>> Hi Sebastian,
> >>> >>>
> >>> >>> Thanks for updating the FLIP wiki.
> >>> >>>
> >>> >>> Just to double confirm, I was thinking of a configuration like
> >>> "allow.coarse.grained.watermark.alignment". This will allow the coarse
> >>> grained watermark alignment as a fallback instead of bubbling up an
> >>> exception if split pausing is not supported in some Sources in a Flink job.
> >>> And this will only affect the Sources that do not support split pausing,
> >>> but not the Sources that have split pausing supported.
> >>> >>>
> >>> >>> This seems slightly different from a <knob> enables / disables split
> >>> alignment. This sounds like a global thing, and it seems not necessary to
> >>> disable the split alignment, as long as the coarse grained alignment can be
> >>> a fallback.
> >>> >>>
> >>> >>> Thanks,
> >>> >>>
> >>> >>> Jiangjie (Becket) Qin
> >>> >>>
> >>> >>> On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <
> >>> sebastian@ververica.com> wrote:
> >>> >>>>
> >>> >>>> Hi Piotrek,
> >>> >>>>
> >>> >>>> Sorry I've read it and forgot it when I was ripping out the
> >>> supportsPauseOrResume method again. Thanks for pointing that out. I will
> >>> add it as follows: The <knob> enables/disables split alignment in the
> >>> SourceOperator where the default is that split alignment is enabled. (And I
> >>> will add the note: "In future releases, the <knob> may be ignored such that
> >>> split alignment is always enabled.")
> >>> >>>>
> >>> >>>> Cheers,
> >>> >>>> Sebastian
> >>> >>>>
> >>> >>>> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <
> >>> pnowojski@apache.org> wrote:
> >>> >>>>>
> >>> >>>>> Hi Sebastian,
> >>> >>>>>
> >>> >>>>> Thanks for picking this up.
> >>> >>>>>
> >>> >>>>> > 5. There is NO configuration option to enable watermark
> >>> alignment of
> >>> >>>>> splits or disable pause/resume capabilities.
> >>> >>>>>
> >>> >>>>> Isn't this contradicting what we actually agreed on?
> >>> >>>>>
> >>> >>>>> > we are planning to have a configuration based way to revert to
> >>> the
> >>> >>>>> previous behavior
> >>> >>>>>
> >>> >>>>> I think what we agreed in the last couple of emails was to add a
> >>> >>>>> configuration toggle, that would allow Flink 1.15 users, that are
> >>> using
> >>> >>>>> watermark alignment with multiple splits per source operator, to
> >>> continue
> >>> >>>>> using it with the old 1.15 semantic, even if their source doesn't
> >>> support
> >>> >>>>> pausing/resuming splits. It seems to me like the current FLIP and
> >>> >>>>> implementation proposal would always throw an exception in that
> >>> case?
> >>> >>>>>
> >>> >>>>> Best,
> >>> >>>>> Piotrek
> >>> >>>>>
> >>> >>>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <
> >>> sebastian@ververica.com>
> >>> >>>>> napisał(a):
> >>> >>>>>
> >>> >>>>> > Hi all,
> >>> >>>>> >
> >>> >>>>> > I have updated FLIP-217 [1] to the proposed specification and
> >>> adapted the
> >>> >>>>> > current implementation [2] respectively.
> >>> >>>>> >
> >>> >>>>> > This means both, FLIP and implementation, are ready for review
> >>> from my
> >>> >>>>> > side. (I would revise commit history and messages for the final
> >>> PR but left
> >>> >>>>> > it as is for now and the records of this discussion.)
> >>> >>>>> >
> >>> >>>>> > 1. Please review the updated version of FLIP-217 [1]. If there
> >>> are no
> >>> >>>>> > further concerns, I would initiate the voting.
> >>> >>>>> > (2. If you want to speed up things, please also have a look into
> >>> the
> >>> >>>>> > updated implementation [2].)
> >>> >>>>> >
> >>> >>>>> > Please consider the following updated specification in the
> >>> current status
> >>> >>>>> > of FLIP-217 where the essence is as follows:
> >>> >>>>> >
> >>> >>>>> > 1. A method pauseOrResumeSplits is added to SourceReader with
> >>> default
> >>> >>>>> > implementation that throws UnsupportedOperationException.
> >>> >>>>> > 2.  method pauseOrResumeSplits is added to SplitReader with
> >>> default
> >>> >>>>> > implementation that throws UnsupportedOperationException.
> >>> >>>>> > 3. SourceOperator initiates split alignment only if more than
> >>> one split is
> >>> >>>>> > assigned to the source (and, of course, only if
> >>> withSplitAlignment is used).
> >>> >>>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place
> >>> (to
> >>> >>>>> > indicate if the implementation supports pause/resume
> >>> capabilities).
> >>> >>>>> > 5. There is NO configuration option to enable watermark
> >>> alignment of
> >>> >>>>> > splits or disable pause/resume capabilities.
> >>> >>>>> >
> >>> >>>>> > *Note:* If the SourceReader or some SplitReader do not override
> >>> >>>>> > pauseOrResumeSplits but it is required in the application, an
> >>> exception is
> >>> >>>>> > thrown at runtime when an split alignment attempt is executed
> >>> (not during
> >>> >>>>> > startup or any time earlier).
> >>> >>>>> >
> >>> >>>>> > Also, I have revised the compatibility/migration section to
> >>> describe a bit
> >>> >>>>> > of a rationale for the default implementation with exception
> >>> throwing
> >>> >>>>> > behavior.
> >>> >>>>> >
> >>> >>>>> > Regards,
> >>> >>>>> > Sebastian
> >>> >>>>> >
> >>> >>>>> > [1]
> >>> >>>>> >
> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> >>> >>>>> > [2]
> >>> https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
> >>> >>>>> >
> >>> >>>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org>
> >>> wrote:
> >>> >>>>> >
> >>> >>>>> >> Hi,
> >>> >>>>> >>
> >>> >>>>> >> Thank you Becket and Piotr for ironing out the "case 2"
> >>> behavior.
> >>> >>>>> >> Strictly speaking we are introducing a regression by allowing an
> >>> >>>>> >> exception to bubble up that did not exist in the previous
> >>> release,
> >>> >>>>> >> regardless how suboptimal the behavior may be. However, given
> >>> that the
> >>> >>>>> >> feature is still experimental and we are planning to have a
> >>> >>>>> >> configuration based way to revert to the previous behavior, I
> >>> think
> >>> >>>>> >> this is a good solution.
> >>> >>>>> >>
> >>> >>>>> >> +1 from my side
> >>> >>>>> >>
> >>> >>>>> >> Thomas
> >>> >>>>> >>
> >>> >>>>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <
> >>> pnowojski@apache.org>
> >>> >>>>> >> wrote:
> >>> >>>>> >> >
> >>> >>>>> >> > +1 :)
> >>> >>>>> >> >
> >>> >>>>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
> >>> napisał(a):
> >>> >>>>> >> >
> >>> >>>>> >> > >  Thanks for the explanation, Piotr.
> >>> >>>>> >> > >
> >>> >>>>> >> > > So it looks like we have a conclusion here.
> >>> >>>>> >> > >
> >>> >>>>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it
> >>> brings more
> >>> >>>>> >> > > confusion while the benefit is marginal, so I prefer not
> >>> having that
> >>> >>>>> >> if
> >>> >>>>> >> > > possible. It would be good to also hear @Thomas Weise <
> >>> thw@apache.org
> >>> >>>>> >> >'s
> >>> >>>>> >> > > opinion as he mentioned some concern earlier.
> >>> >>>>> >> > > 2. Let's add the feature knob then. In the future we can
> >>> simply
> >>> >>>>> >> ignore the
> >>> >>>>> >> > > configuration when deprecating it.
> >>> >>>>> >> > >
> >>> >>>>> >> > > Thanks,
> >>> >>>>> >> > >
> >>> >>>>> >> > > Jiangjie (Becket) Qin
> >>> >>>>> >> > >
> >>> >>>>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
> >>> pnowojski@apache.org
> >>> >>>>> >> >
> >>> >>>>> >> > > wrote:
> >>> >>>>> >> > >
> >>> >>>>> >> > > > Hi,
> >>> >>>>> >> > > >
> >>> >>>>> >> > > > I mean I'm fine with throwing an exception by default in
> >>> Flink 1.16
> >>> >>>>> >> in
> >>> >>>>> >> > > the
> >>> >>>>> >> > > > "Case 2", but I think we need to provide a way to
> >>> workaround it for
> >>> >>>>> >> > > example
> >>> >>>>> >> > > > via a feature toggle, if it's an easy thing to do. And it
> >>> seems to
> >>> >>>>> >> be a
> >>> >>>>> >> > > > simple thing.
> >>> >>>>> >> > > >
> >>> >>>>> >> > > > However this is orthogonal to the
> >>> `supportsPausingSplits()` issue. I
> >>> >>>>> >> > > don't
> >>> >>>>> >> > > > have a big preference whether
> >>> >>>>> >> > > >   a) the exception should originate on JM, using `default
> >>> boolean
> >>> >>>>> >> > > > supportsPausingSplits() { return false; }` (as currently
> >>> proposed
> >>> >>>>> >> in the
> >>> >>>>> >> > > > FLIP),
> >>> >>>>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
> >>> >>>>> >> > > > `UnsupportedOperationException` as you are proposing.
> >>> >>>>> >> > > >
> >>> >>>>> >> > > > a) fails earlier, so it's more user friendly from this
> >>> perspective,
> >>> >>>>> >> but
> >>> >>>>> >> > > it
> >>> >>>>> >> > > > provides more possibilities for bugs/inconsistencies for
> >>> connector
> >>> >>>>> >> > > > developers, since `supportsPausingSplits()` would have to
> >>> be kept
> >>> >>>>> >> in sync
> >>> >>>>> >> > > > with `pauseOrResumeSplits()`.
> >>> >>>>> >> > > >
> >>> >>>>> >> > > > Best,
> >>> >>>>> >> > > > Piotrek
> >>> >>>>> >> > > >
> >>> >>>>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <becket.qin@gmail.com
> >>> >
> >>> >>>>> >> napisał(a):
> >>> >>>>> >> > > >
> >>> >>>>> >> > > > > Hi Piotr,
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > > Just to make sure we are on the same page. There are
> >>> two cases
> >>> >>>>> >> for the
> >>> >>>>> >> > > > > existing FLIP-182 users:
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > > Case 1: Each source reader only has one split assigned.
> >>> This is
> >>> >>>>> >> the
> >>> >>>>> >> > > > > targeted case for FLIP-182.
> >>> >>>>> >> > > > > Case 2: Each source reader has multiple splits
> >>> assigned. This is
> >>> >>>>> >> the
> >>> >>>>> >> > > > flaky
> >>> >>>>> >> > > > > case that may or may not work.
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > > With solution 1, the users of case 1 won't be impacted.
> >>> The users
> >>> >>>>> >> in
> >>> >>>>> >> > > > case 2
> >>> >>>>> >> > > > > will receive an exception which they won't get at the
> >>> moment.
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > > Do you mean we should not throw an exception in case 2?
> >>> >>>>> >> Personally I
> >>> >>>>> >> > > feel
> >>> >>>>> >> > > > > that is OK and could have been done in FLIP-182 itself
> >>> because
> >>> >>>>> >> it's
> >>> >>>>> >> > > not a
> >>> >>>>> >> > > > > designed use case. As a user I may see a big variation
> >>> of the job
> >>> >>>>> >> state
> >>> >>>>> >> > > > > sizes from time to time and I am not able to rely on
> >>> this feature
> >>> >>>>> >> to
> >>> >>>>> >> > > plan
> >>> >>>>> >> > > > > my resources and uphold the SLA.
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > > That said, if you have a strong opinion on this, I am
> >>> fine with
> >>> >>>>> >> having
> >>> >>>>> >> > > > the
> >>> >>>>> >> > > > > configuration like
> >>> "allow.coarse-grained.watermark.alignment"
> >>> >>>>> >> with the
> >>> >>>>> >> > > > > default value set to false, given that a configuration
> >>> is much
> >>> >>>>> >> easier
> >>> >>>>> >> > > to
> >>> >>>>> >> > > > > deprecate than a method.
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > > Thanks,
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > > Jiangjie (Becket) Qin
> >>> >>>>> >> > > > >
> >>> >>>>> >> > > > >
> >>> >>>>> >>
> >>> >>>>> >
> >>>
> >>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Sebastian Mattheis <se...@ververica.com>.
Hi everybody,

I have updated FLIP-217 [1] and have implemented the respective changes in
[2]. Please review. If there are no concerns, I would initiate the voting
on Thursday.

Best regards,
Sebastian

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
[2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment

On Mon, Jul 25, 2022 at 9:19 AM Piotr Nowojski <pn...@apache.org> wrote:

> Thanks for the update Sebastian :)
>
> Best,
> Piotrek
>
> pon., 25 lip 2022 o 08:12 Sebastian Mattheis <se...@ververica.com>
> napisał(a):
>
>> Hi everybody,
>>
>> I discussed last week the semantics and an implementation stragegy of the
>> configuration parameter with Piotr and did the implementation and some
>> tests this weekend.
>>
>> A short summary of what I discussed and recapped with Piotr:
>>
>>    - The configuration parameter allows (and tolerates) the use of
>>    `SourceReader`s that do not implement `pauseOrResumeSplits` method. (The
>>    exception is ignored in `SourceOperator`.)
>>    - The configuration parameter allows (and tolerates) the use of
>>    `SourceSplitReader`s that do not implement `pauseOrResumeSplits` method.
>>    (The exception is ignored in the `PauseResumeSplitsTask` of the
>>    `SplitFetcher`.)
>>
>> In particular, this means that a `SourceReader` with two `SplitReader`s
>> where one does not implement `pauseOrResumeSplits` and the other does. It
>> will allow the use of the one that doesn't and will, nevertheless, still
>> attempt to pause/resume the other. (Consequently, if the one that doesn't
>> support pause is ahead it simply cannot not pause the `SplitReader` but if
>> the other is ahead it will be paused until watermarks are aligned.)
>>
>> There is one flaw that I don't really like but which I accept as from the
>> discussion and which I will add/update in the FLIP:
>> If there is any other mechanism (e.g. other than watermark alignment)
>> that attempts to pause or resume `SplitReader`s, it will have side effects
>> and potential unexpected behavior if one or more `SplitReader`s do not
>> implement `pauseOrResumeSplits` and the user set the configuration
>> parameter to allow/tolerate it for split-level watermark alignment. (The
>> reason is simply that we cannot differentiate which mechanism attempts to
>> pause/resume, i.e., if it used for watermark alignment or something else.)
>> Given that this configuration parameter is supposed to be an intermediate
>> fallback, it is acceptable for me but changed at latest when some other
>> mechanism uses pauseOrResumeSplits.
>>
>> As for the parameter naming, I have implemented it the following way
>> (reason: There exists a parameter `pipeline.auto-watermark-interval`.):
>>
>> pipeline.watermark-alignment.allow-unaligned-source-splits (default:
>> false)
>>
>> Status: I have implemented the configuration parameter (and an IT case).
>> I still need to update the FLIP and will ping you (tomorrow or so) when I'm
>> done with that. Please check/review my description from above if you see
>> any problems with that.
>>
>> Thanks a lot and regards,
>> Sebastian
>>
>>
>> On Wed, Jul 20, 2022 at 11:24 PM Thomas Weise <th...@apache.org> wrote:
>>
>>> Hi Sebastian,
>>>
>>> Thank you for updating the FLIP and sorry for my delayed response. As
>>> Piotr pointed out, we would need to incorporate the fallback flag into
>>> the design to reflect the outcome of the previous discussion.
>>>
>>> Based on the current FLIP and as detailed by Becket, the
>>> SourceOperator coordinates the alignment. It is responsible for the
>>> pause/resume decision and knows how many splits are assigned.
>>> Therefore shouldn't it have all the information needed to efficiently
>>> handle the case of UnsupportedOperationException thrown by a reader?
>>>
>>> Although the fallback requires some extra implementation effort, I
>>> think that is more than offset by not surprising users and offering a
>>> smoother migration path. Yes, the flag is a temporary feature that
>>> will become obsolete in perhaps 2-3 releases (can we please also
>>> include that into the FLIP?). But since it would be just a
>>> configuration property that can be ignored at that point (for which
>>> there is precedence), no code change will be forced on users.
>>>
>>> As for the property name, perhaps the following would be even more
>>> descriptive?
>>>
>>> coarse.grained.wm.alignment.fallback.enabled
>>>
>>> Thanks!
>>> Thomas
>>>
>>>
>>> On Wed, Jul 13, 2022 at 10:59 AM Becket Qin <be...@gmail.com>
>>> wrote:
>>> >
>>> > Thanks for the explanation, Sebastian. I understand your concern now.
>>> >
>>> > 1. About the major concern. Personally I'd consider the coarse grained
>>> watermark alignment as a special case for backward compatibility. In the
>>> future, if for whatever reason we want to pause a split and that is not
>>> supported, it seems the only thing that makes sense is throwing an
>>> exception, instead of pausing the entire source reader. Regarding this
>>> FLIP, if the logic that determines which split should be paused is in the
>>> SourceOperator, the SourceOperator actually knows the reason why it pauses
>>> a split. It also knows whether there are more than one split assigned to
>>> the source reader. So it can just fallback to the coarse grained watermark
>>> alignment, without affecting other reasons of pausing a split, right? And
>>> in the future, if there are more purposes for pausing / resuming a split,
>>> the SourceOperator still needs to understand each of the reasons in order
>>> to resume the splits after all the pausing conditions are no longer met.
>>> >
>>> > 2. Naming wise, would "coarse.grained.watermark.alignment.enabled"
>>> address your concern?
>>> >
>>> > The only concern I have for Option A is that people may not be able to
>>> benefit from split level WM alignment until all the sources they need have
>>> that implemented. This seems unnecessarily delaying the adoption of a new
>>> feature, which looks like a more substantive downside compared with the
>>> "coarse.grained.wm.alignment.enabled" option.
>>> >
>>> > BTW, the SourceOperator doesn't need to invoke the
>>> pauseOrResumeSplit() method and catch the UnsupportedOperation every time.
>>> A flag can be set so it doesn't attempt to pause the split after the first
>>> time it sees the exception.
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Jiangjie (Becket) Qin
>>> >
>>> >
>>> >
>>> > On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis <
>>> sebastian@ververica.com> wrote:
>>> >>
>>> >> Hi Becket, Hi Thomas, Hi Piotrek,
>>> >>
>>> >> Thanks for the feedback. I would like to highlight some concerns:
>>> >>
>>> >> Major: A configuration parameter like "allow coarse grained
>>> alignment" defines a semantic that mixes two contexts conditionally as
>>> follows: "ignore incapability to pause splits in SourceReader/SplitReader"
>>> IF (conditional) we "allow coarse grained watermark alignment". At the same
>>> time we said that there is no way to check the capability of
>>> SourceReader/SplitReader to pause/resume other than observing a
>>> UnsupportedOperationException during runtime such that we cannot disable
>>> the trigger for watermark split alignment in the SourceOperator. Instead,
>>> we can only ignore the incapability of SourceReader/SplitReader during
>>> execution of a pause/resume attempt which, consequently, requires to check
>>> the "allow coarse grained alignment " parameter value (to implement the
>>> conditional semantic). However, during this execution we actually don't
>>> know whether the attempt was executed for the purpose of watermark
>>> alignment or for some other purpose such that the check actually depends on
>>> who triggered the pause/resume attempt and hides the exception potentially
>>> unexpectedly for some other use case. Of course, currently there is no
>>> other purpose and, hence, no other trigger than watermark alignment.
>>> However, this breaks, in my perspective, the idea of having
>>> pauseOrResumeSplits (re)usable for other use cases.
>>> >> Minor: I'm not aware of any configuration parameter in the format
>>> like `allow.*` as you suggested with
>>> `allow.coarse.grained.watermark.alignment`. Would that still be okay to do?
>>> >>
>>> >> As we have agreed to not have a "supportsPausableSplits" method
>>> because of potential inconsistencies between return value of this method
>>> and the actual implementation (and also the difficulty to have a meaningful
>>> return value where the support actually depends on SourceReader AND the
>>> assigned SplitReaders), I don't want to bring up the discussion about the
>>> "supportsPauseableSplits" method again. Instead, I see the following
>>> options:
>>> >>
>>> >> Option A: I would drop the idea of "allow coarse grained alignment"
>>> semantic of the parameter but implement a parameter to "enable/disable
>>> split watermark alignment" which we can easily use in the SourceOperator to
>>> disable the trigger of split alignment. This is indeed more static and less
>>> flexible, because it disables split alignment unconditionally, but it is
>>> "context-decoupled" and more straight-forward to use. This would also
>>> address the use case of disabling split alignment for the purpose of
>>> runtime behavior evaluation, as mentioned by Thomas (if I remember
>>> correctly.) I would implement the parameter with a default where watermark
>>> split alignment is enabled which requires users to check their application
>>> when upgrading to 1.16 if a) there is a source that reads from multiple
>>> splits and b), if yes, all splits of that source support pause/resume. If
>>> a) yes and b) no, the user must take action to disable watermark split
>>> alignment (which disables the trigger of split alignment only for the
>>> purpose).
>>> >>
>>> >> Option B: If we ignore my concern, I would simply check the "allow
>>> coarse grained watermark alignment" parameter value on every attempt to
>>> execute pause/resume in the SourceReader and in the SplitReader and will
>>> not throw UnsupportedOperationException if the parameter value is set to
>>> true.
>>> >>
>>> >> Please note that the parameter is also used only for some kind of
>>> migration phase. Therefore, I wonder if we need to overcomplicate things.
>>> >>
>>> >> @Piotrek, @Becket, @Thomas: I would recommend/favour option A. Please
>>> let me know your feedback and/or concerns as soon as possible, if possible.
>>> :)
>>> >>
>>> >> Regards,
>>> >> Sebastian
>>> >>
>>> >>
>>> >> On Wed, Jul 13, 2022 at 9:37 AM Becket Qin <be...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> Hi Sebastian,
>>> >>>
>>> >>> Thanks for updating the FLIP wiki.
>>> >>>
>>> >>> Just to double confirm, I was thinking of a configuration like
>>> "allow.coarse.grained.watermark.alignment". This will allow the coarse
>>> grained watermark alignment as a fallback instead of bubbling up an
>>> exception if split pausing is not supported in some Sources in a Flink job.
>>> And this will only affect the Sources that do not support split pausing,
>>> but not the Sources that have split pausing supported.
>>> >>>
>>> >>> This seems slightly different from a <knob> enables / disables split
>>> alignment. This sounds like a global thing, and it seems not necessary to
>>> disable the split alignment, as long as the coarse grained alignment can be
>>> a fallback.
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>> Jiangjie (Becket) Qin
>>> >>>
>>> >>> On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <
>>> sebastian@ververica.com> wrote:
>>> >>>>
>>> >>>> Hi Piotrek,
>>> >>>>
>>> >>>> Sorry I've read it and forgot it when I was ripping out the
>>> supportsPauseOrResume method again. Thanks for pointing that out. I will
>>> add it as follows: The <knob> enables/disables split alignment in the
>>> SourceOperator where the default is that split alignment is enabled. (And I
>>> will add the note: "In future releases, the <knob> may be ignored such that
>>> split alignment is always enabled.")
>>> >>>>
>>> >>>> Cheers,
>>> >>>> Sebastian
>>> >>>>
>>> >>>> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <
>>> pnowojski@apache.org> wrote:
>>> >>>>>
>>> >>>>> Hi Sebastian,
>>> >>>>>
>>> >>>>> Thanks for picking this up.
>>> >>>>>
>>> >>>>> > 5. There is NO configuration option to enable watermark
>>> alignment of
>>> >>>>> splits or disable pause/resume capabilities.
>>> >>>>>
>>> >>>>> Isn't this contradicting what we actually agreed on?
>>> >>>>>
>>> >>>>> > we are planning to have a configuration based way to revert to
>>> the
>>> >>>>> previous behavior
>>> >>>>>
>>> >>>>> I think what we agreed in the last couple of emails was to add a
>>> >>>>> configuration toggle, that would allow Flink 1.15 users, that are
>>> using
>>> >>>>> watermark alignment with multiple splits per source operator, to
>>> continue
>>> >>>>> using it with the old 1.15 semantic, even if their source doesn't
>>> support
>>> >>>>> pausing/resuming splits. It seems to me like the current FLIP and
>>> >>>>> implementation proposal would always throw an exception in that
>>> case?
>>> >>>>>
>>> >>>>> Best,
>>> >>>>> Piotrek
>>> >>>>>
>>> >>>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <
>>> sebastian@ververica.com>
>>> >>>>> napisał(a):
>>> >>>>>
>>> >>>>> > Hi all,
>>> >>>>> >
>>> >>>>> > I have updated FLIP-217 [1] to the proposed specification and
>>> adapted the
>>> >>>>> > current implementation [2] respectively.
>>> >>>>> >
>>> >>>>> > This means both, FLIP and implementation, are ready for review
>>> from my
>>> >>>>> > side. (I would revise commit history and messages for the final
>>> PR but left
>>> >>>>> > it as is for now and the records of this discussion.)
>>> >>>>> >
>>> >>>>> > 1. Please review the updated version of FLIP-217 [1]. If there
>>> are no
>>> >>>>> > further concerns, I would initiate the voting.
>>> >>>>> > (2. If you want to speed up things, please also have a look into
>>> the
>>> >>>>> > updated implementation [2].)
>>> >>>>> >
>>> >>>>> > Please consider the following updated specification in the
>>> current status
>>> >>>>> > of FLIP-217 where the essence is as follows:
>>> >>>>> >
>>> >>>>> > 1. A method pauseOrResumeSplits is added to SourceReader with
>>> default
>>> >>>>> > implementation that throws UnsupportedOperationException.
>>> >>>>> > 2.  method pauseOrResumeSplits is added to SplitReader with
>>> default
>>> >>>>> > implementation that throws UnsupportedOperationException.
>>> >>>>> > 3. SourceOperator initiates split alignment only if more than
>>> one split is
>>> >>>>> > assigned to the source (and, of course, only if
>>> withSplitAlignment is used).
>>> >>>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place
>>> (to
>>> >>>>> > indicate if the implementation supports pause/resume
>>> capabilities).
>>> >>>>> > 5. There is NO configuration option to enable watermark
>>> alignment of
>>> >>>>> > splits or disable pause/resume capabilities.
>>> >>>>> >
>>> >>>>> > *Note:* If the SourceReader or some SplitReader do not override
>>> >>>>> > pauseOrResumeSplits but it is required in the application, an
>>> exception is
>>> >>>>> > thrown at runtime when an split alignment attempt is executed
>>> (not during
>>> >>>>> > startup or any time earlier).
>>> >>>>> >
>>> >>>>> > Also, I have revised the compatibility/migration section to
>>> describe a bit
>>> >>>>> > of a rationale for the default implementation with exception
>>> throwing
>>> >>>>> > behavior.
>>> >>>>> >
>>> >>>>> > Regards,
>>> >>>>> > Sebastian
>>> >>>>> >
>>> >>>>> > [1]
>>> >>>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>> >>>>> > [2]
>>> https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>>> >>>>> >
>>> >>>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org>
>>> wrote:
>>> >>>>> >
>>> >>>>> >> Hi,
>>> >>>>> >>
>>> >>>>> >> Thank you Becket and Piotr for ironing out the "case 2"
>>> behavior.
>>> >>>>> >> Strictly speaking we are introducing a regression by allowing an
>>> >>>>> >> exception to bubble up that did not exist in the previous
>>> release,
>>> >>>>> >> regardless how suboptimal the behavior may be. However, given
>>> that the
>>> >>>>> >> feature is still experimental and we are planning to have a
>>> >>>>> >> configuration based way to revert to the previous behavior, I
>>> think
>>> >>>>> >> this is a good solution.
>>> >>>>> >>
>>> >>>>> >> +1 from my side
>>> >>>>> >>
>>> >>>>> >> Thomas
>>> >>>>> >>
>>> >>>>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <
>>> pnowojski@apache.org>
>>> >>>>> >> wrote:
>>> >>>>> >> >
>>> >>>>> >> > +1 :)
>>> >>>>> >> >
>>> >>>>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
>>> napisał(a):
>>> >>>>> >> >
>>> >>>>> >> > >  Thanks for the explanation, Piotr.
>>> >>>>> >> > >
>>> >>>>> >> > > So it looks like we have a conclusion here.
>>> >>>>> >> > >
>>> >>>>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it
>>> brings more
>>> >>>>> >> > > confusion while the benefit is marginal, so I prefer not
>>> having that
>>> >>>>> >> if
>>> >>>>> >> > > possible. It would be good to also hear @Thomas Weise <
>>> thw@apache.org
>>> >>>>> >> >'s
>>> >>>>> >> > > opinion as he mentioned some concern earlier.
>>> >>>>> >> > > 2. Let's add the feature knob then. In the future we can
>>> simply
>>> >>>>> >> ignore the
>>> >>>>> >> > > configuration when deprecating it.
>>> >>>>> >> > >
>>> >>>>> >> > > Thanks,
>>> >>>>> >> > >
>>> >>>>> >> > > Jiangjie (Becket) Qin
>>> >>>>> >> > >
>>> >>>>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
>>> pnowojski@apache.org
>>> >>>>> >> >
>>> >>>>> >> > > wrote:
>>> >>>>> >> > >
>>> >>>>> >> > > > Hi,
>>> >>>>> >> > > >
>>> >>>>> >> > > > I mean I'm fine with throwing an exception by default in
>>> Flink 1.16
>>> >>>>> >> in
>>> >>>>> >> > > the
>>> >>>>> >> > > > "Case 2", but I think we need to provide a way to
>>> workaround it for
>>> >>>>> >> > > example
>>> >>>>> >> > > > via a feature toggle, if it's an easy thing to do. And it
>>> seems to
>>> >>>>> >> be a
>>> >>>>> >> > > > simple thing.
>>> >>>>> >> > > >
>>> >>>>> >> > > > However this is orthogonal to the
>>> `supportsPausingSplits()` issue. I
>>> >>>>> >> > > don't
>>> >>>>> >> > > > have a big preference whether
>>> >>>>> >> > > >   a) the exception should originate on JM, using `default
>>> boolean
>>> >>>>> >> > > > supportsPausingSplits() { return false; }` (as currently
>>> proposed
>>> >>>>> >> in the
>>> >>>>> >> > > > FLIP),
>>> >>>>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
>>> >>>>> >> > > > `UnsupportedOperationException` as you are proposing.
>>> >>>>> >> > > >
>>> >>>>> >> > > > a) fails earlier, so it's more user friendly from this
>>> perspective,
>>> >>>>> >> but
>>> >>>>> >> > > it
>>> >>>>> >> > > > provides more possibilities for bugs/inconsistencies for
>>> connector
>>> >>>>> >> > > > developers, since `supportsPausingSplits()` would have to
>>> be kept
>>> >>>>> >> in sync
>>> >>>>> >> > > > with `pauseOrResumeSplits()`.
>>> >>>>> >> > > >
>>> >>>>> >> > > > Best,
>>> >>>>> >> > > > Piotrek
>>> >>>>> >> > > >
>>> >>>>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <becket.qin@gmail.com
>>> >
>>> >>>>> >> napisał(a):
>>> >>>>> >> > > >
>>> >>>>> >> > > > > Hi Piotr,
>>> >>>>> >> > > > >
>>> >>>>> >> > > > > Just to make sure we are on the same page. There are
>>> two cases
>>> >>>>> >> for the
>>> >>>>> >> > > > > existing FLIP-182 users:
>>> >>>>> >> > > > >
>>> >>>>> >> > > > > Case 1: Each source reader only has one split assigned.
>>> This is
>>> >>>>> >> the
>>> >>>>> >> > > > > targeted case for FLIP-182.
>>> >>>>> >> > > > > Case 2: Each source reader has multiple splits
>>> assigned. This is
>>> >>>>> >> the
>>> >>>>> >> > > > flaky
>>> >>>>> >> > > > > case that may or may not work.
>>> >>>>> >> > > > >
>>> >>>>> >> > > > > With solution 1, the users of case 1 won't be impacted.
>>> The users
>>> >>>>> >> in
>>> >>>>> >> > > > case 2
>>> >>>>> >> > > > > will receive an exception which they won't get at the
>>> moment.
>>> >>>>> >> > > > >
>>> >>>>> >> > > > > Do you mean we should not throw an exception in case 2?
>>> >>>>> >> Personally I
>>> >>>>> >> > > feel
>>> >>>>> >> > > > > that is OK and could have been done in FLIP-182 itself
>>> because
>>> >>>>> >> it's
>>> >>>>> >> > > not a
>>> >>>>> >> > > > > designed use case. As a user I may see a big variation
>>> of the job
>>> >>>>> >> state
>>> >>>>> >> > > > > sizes from time to time and I am not able to rely on
>>> this feature
>>> >>>>> >> to
>>> >>>>> >> > > plan
>>> >>>>> >> > > > > my resources and uphold the SLA.
>>> >>>>> >> > > > >
>>> >>>>> >> > > > > That said, if you have a strong opinion on this, I am
>>> fine with
>>> >>>>> >> having
>>> >>>>> >> > > > the
>>> >>>>> >> > > > > configuration like
>>> "allow.coarse-grained.watermark.alignment"
>>> >>>>> >> with the
>>> >>>>> >> > > > > default value set to false, given that a configuration
>>> is much
>>> >>>>> >> easier
>>> >>>>> >> > > to
>>> >>>>> >> > > > > deprecate than a method.
>>> >>>>> >> > > > >
>>> >>>>> >> > > > > Thanks,
>>> >>>>> >> > > > >
>>> >>>>> >> > > > > Jiangjie (Becket) Qin
>>> >>>>> >> > > > >
>>> >>>>> >> > > > >
>>> >>>>> >>
>>> >>>>> >
>>>
>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Thanks for the update Sebastian :)

Best,
Piotrek

pon., 25 lip 2022 o 08:12 Sebastian Mattheis <se...@ververica.com>
napisał(a):

> Hi everybody,
>
> I discussed last week the semantics and an implementation stragegy of the
> configuration parameter with Piotr and did the implementation and some
> tests this weekend.
>
> A short summary of what I discussed and recapped with Piotr:
>
>    - The configuration parameter allows (and tolerates) the use of
>    `SourceReader`s that do not implement `pauseOrResumeSplits` method. (The
>    exception is ignored in `SourceOperator`.)
>    - The configuration parameter allows (and tolerates) the use of
>    `SourceSplitReader`s that do not implement `pauseOrResumeSplits` method.
>    (The exception is ignored in the `PauseResumeSplitsTask` of the
>    `SplitFetcher`.)
>
> In particular, this means that a `SourceReader` with two `SplitReader`s
> where one does not implement `pauseOrResumeSplits` and the other does. It
> will allow the use of the one that doesn't and will, nevertheless, still
> attempt to pause/resume the other. (Consequently, if the one that doesn't
> support pause is ahead it simply cannot not pause the `SplitReader` but if
> the other is ahead it will be paused until watermarks are aligned.)
>
> There is one flaw that I don't really like but which I accept as from the
> discussion and which I will add/update in the FLIP:
> If there is any other mechanism (e.g. other than watermark alignment) that
> attempts to pause or resume `SplitReader`s, it will have side effects and
> potential unexpected behavior if one or more `SplitReader`s do not
> implement `pauseOrResumeSplits` and the user set the configuration
> parameter to allow/tolerate it for split-level watermark alignment. (The
> reason is simply that we cannot differentiate which mechanism attempts to
> pause/resume, i.e., if it used for watermark alignment or something else.)
> Given that this configuration parameter is supposed to be an intermediate
> fallback, it is acceptable for me but changed at latest when some other
> mechanism uses pauseOrResumeSplits.
>
> As for the parameter naming, I have implemented it the following way
> (reason: There exists a parameter `pipeline.auto-watermark-interval`.):
>
> pipeline.watermark-alignment.allow-unaligned-source-splits (default: false)
>
> Status: I have implemented the configuration parameter (and an IT case). I
> still need to update the FLIP and will ping you (tomorrow or so) when I'm
> done with that. Please check/review my description from above if you see
> any problems with that.
>
> Thanks a lot and regards,
> Sebastian
>
>
> On Wed, Jul 20, 2022 at 11:24 PM Thomas Weise <th...@apache.org> wrote:
>
>> Hi Sebastian,
>>
>> Thank you for updating the FLIP and sorry for my delayed response. As
>> Piotr pointed out, we would need to incorporate the fallback flag into
>> the design to reflect the outcome of the previous discussion.
>>
>> Based on the current FLIP and as detailed by Becket, the
>> SourceOperator coordinates the alignment. It is responsible for the
>> pause/resume decision and knows how many splits are assigned.
>> Therefore shouldn't it have all the information needed to efficiently
>> handle the case of UnsupportedOperationException thrown by a reader?
>>
>> Although the fallback requires some extra implementation effort, I
>> think that is more than offset by not surprising users and offering a
>> smoother migration path. Yes, the flag is a temporary feature that
>> will become obsolete in perhaps 2-3 releases (can we please also
>> include that into the FLIP?). But since it would be just a
>> configuration property that can be ignored at that point (for which
>> there is precedence), no code change will be forced on users.
>>
>> As for the property name, perhaps the following would be even more
>> descriptive?
>>
>> coarse.grained.wm.alignment.fallback.enabled
>>
>> Thanks!
>> Thomas
>>
>>
>> On Wed, Jul 13, 2022 at 10:59 AM Becket Qin <be...@gmail.com> wrote:
>> >
>> > Thanks for the explanation, Sebastian. I understand your concern now.
>> >
>> > 1. About the major concern. Personally I'd consider the coarse grained
>> watermark alignment as a special case for backward compatibility. In the
>> future, if for whatever reason we want to pause a split and that is not
>> supported, it seems the only thing that makes sense is throwing an
>> exception, instead of pausing the entire source reader. Regarding this
>> FLIP, if the logic that determines which split should be paused is in the
>> SourceOperator, the SourceOperator actually knows the reason why it pauses
>> a split. It also knows whether there are more than one split assigned to
>> the source reader. So it can just fallback to the coarse grained watermark
>> alignment, without affecting other reasons of pausing a split, right? And
>> in the future, if there are more purposes for pausing / resuming a split,
>> the SourceOperator still needs to understand each of the reasons in order
>> to resume the splits after all the pausing conditions are no longer met.
>> >
>> > 2. Naming wise, would "coarse.grained.watermark.alignment.enabled"
>> address your concern?
>> >
>> > The only concern I have for Option A is that people may not be able to
>> benefit from split level WM alignment until all the sources they need have
>> that implemented. This seems unnecessarily delaying the adoption of a new
>> feature, which looks like a more substantive downside compared with the
>> "coarse.grained.wm.alignment.enabled" option.
>> >
>> > BTW, the SourceOperator doesn't need to invoke the pauseOrResumeSplit()
>> method and catch the UnsupportedOperation every time. A flag can be set so
>> it doesn't attempt to pause the split after the first time it sees the
>> exception.
>> >
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> >
>> >
>> > On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis <
>> sebastian@ververica.com> wrote:
>> >>
>> >> Hi Becket, Hi Thomas, Hi Piotrek,
>> >>
>> >> Thanks for the feedback. I would like to highlight some concerns:
>> >>
>> >> Major: A configuration parameter like "allow coarse grained alignment"
>> defines a semantic that mixes two contexts conditionally as follows:
>> "ignore incapability to pause splits in SourceReader/SplitReader" IF
>> (conditional) we "allow coarse grained watermark alignment". At the same
>> time we said that there is no way to check the capability of
>> SourceReader/SplitReader to pause/resume other than observing a
>> UnsupportedOperationException during runtime such that we cannot disable
>> the trigger for watermark split alignment in the SourceOperator. Instead,
>> we can only ignore the incapability of SourceReader/SplitReader during
>> execution of a pause/resume attempt which, consequently, requires to check
>> the "allow coarse grained alignment " parameter value (to implement the
>> conditional semantic). However, during this execution we actually don't
>> know whether the attempt was executed for the purpose of watermark
>> alignment or for some other purpose such that the check actually depends on
>> who triggered the pause/resume attempt and hides the exception potentially
>> unexpectedly for some other use case. Of course, currently there is no
>> other purpose and, hence, no other trigger than watermark alignment.
>> However, this breaks, in my perspective, the idea of having
>> pauseOrResumeSplits (re)usable for other use cases.
>> >> Minor: I'm not aware of any configuration parameter in the format like
>> `allow.*` as you suggested with `allow.coarse.grained.watermark.alignment`.
>> Would that still be okay to do?
>> >>
>> >> As we have agreed to not have a "supportsPausableSplits" method
>> because of potential inconsistencies between return value of this method
>> and the actual implementation (and also the difficulty to have a meaningful
>> return value where the support actually depends on SourceReader AND the
>> assigned SplitReaders), I don't want to bring up the discussion about the
>> "supportsPauseableSplits" method again. Instead, I see the following
>> options:
>> >>
>> >> Option A: I would drop the idea of "allow coarse grained alignment"
>> semantic of the parameter but implement a parameter to "enable/disable
>> split watermark alignment" which we can easily use in the SourceOperator to
>> disable the trigger of split alignment. This is indeed more static and less
>> flexible, because it disables split alignment unconditionally, but it is
>> "context-decoupled" and more straight-forward to use. This would also
>> address the use case of disabling split alignment for the purpose of
>> runtime behavior evaluation, as mentioned by Thomas (if I remember
>> correctly.) I would implement the parameter with a default where watermark
>> split alignment is enabled which requires users to check their application
>> when upgrading to 1.16 if a) there is a source that reads from multiple
>> splits and b), if yes, all splits of that source support pause/resume. If
>> a) yes and b) no, the user must take action to disable watermark split
>> alignment (which disables the trigger of split alignment only for the
>> purpose).
>> >>
>> >> Option B: If we ignore my concern, I would simply check the "allow
>> coarse grained watermark alignment" parameter value on every attempt to
>> execute pause/resume in the SourceReader and in the SplitReader and will
>> not throw UnsupportedOperationException if the parameter value is set to
>> true.
>> >>
>> >> Please note that the parameter is also used only for some kind of
>> migration phase. Therefore, I wonder if we need to overcomplicate things.
>> >>
>> >> @Piotrek, @Becket, @Thomas: I would recommend/favour option A. Please
>> let me know your feedback and/or concerns as soon as possible, if possible.
>> :)
>> >>
>> >> Regards,
>> >> Sebastian
>> >>
>> >>
>> >> On Wed, Jul 13, 2022 at 9:37 AM Becket Qin <be...@gmail.com>
>> wrote:
>> >>>
>> >>> Hi Sebastian,
>> >>>
>> >>> Thanks for updating the FLIP wiki.
>> >>>
>> >>> Just to double confirm, I was thinking of a configuration like
>> "allow.coarse.grained.watermark.alignment". This will allow the coarse
>> grained watermark alignment as a fallback instead of bubbling up an
>> exception if split pausing is not supported in some Sources in a Flink job.
>> And this will only affect the Sources that do not support split pausing,
>> but not the Sources that have split pausing supported.
>> >>>
>> >>> This seems slightly different from a <knob> enables / disables split
>> alignment. This sounds like a global thing, and it seems not necessary to
>> disable the split alignment, as long as the coarse grained alignment can be
>> a fallback.
>> >>>
>> >>> Thanks,
>> >>>
>> >>> Jiangjie (Becket) Qin
>> >>>
>> >>> On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <
>> sebastian@ververica.com> wrote:
>> >>>>
>> >>>> Hi Piotrek,
>> >>>>
>> >>>> Sorry I've read it and forgot it when I was ripping out the
>> supportsPauseOrResume method again. Thanks for pointing that out. I will
>> add it as follows: The <knob> enables/disables split alignment in the
>> SourceOperator where the default is that split alignment is enabled. (And I
>> will add the note: "In future releases, the <knob> may be ignored such that
>> split alignment is always enabled.")
>> >>>>
>> >>>> Cheers,
>> >>>> Sebastian
>> >>>>
>> >>>> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <
>> pnowojski@apache.org> wrote:
>> >>>>>
>> >>>>> Hi Sebastian,
>> >>>>>
>> >>>>> Thanks for picking this up.
>> >>>>>
>> >>>>> > 5. There is NO configuration option to enable watermark alignment
>> of
>> >>>>> splits or disable pause/resume capabilities.
>> >>>>>
>> >>>>> Isn't this contradicting what we actually agreed on?
>> >>>>>
>> >>>>> > we are planning to have a configuration based way to revert to the
>> >>>>> previous behavior
>> >>>>>
>> >>>>> I think what we agreed in the last couple of emails was to add a
>> >>>>> configuration toggle, that would allow Flink 1.15 users, that are
>> using
>> >>>>> watermark alignment with multiple splits per source operator, to
>> continue
>> >>>>> using it with the old 1.15 semantic, even if their source doesn't
>> support
>> >>>>> pausing/resuming splits. It seems to me like the current FLIP and
>> >>>>> implementation proposal would always throw an exception in that
>> case?
>> >>>>>
>> >>>>> Best,
>> >>>>> Piotrek
>> >>>>>
>> >>>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <
>> sebastian@ververica.com>
>> >>>>> napisał(a):
>> >>>>>
>> >>>>> > Hi all,
>> >>>>> >
>> >>>>> > I have updated FLIP-217 [1] to the proposed specification and
>> adapted the
>> >>>>> > current implementation [2] respectively.
>> >>>>> >
>> >>>>> > This means both, FLIP and implementation, are ready for review
>> from my
>> >>>>> > side. (I would revise commit history and messages for the final
>> PR but left
>> >>>>> > it as is for now and the records of this discussion.)
>> >>>>> >
>> >>>>> > 1. Please review the updated version of FLIP-217 [1]. If there
>> are no
>> >>>>> > further concerns, I would initiate the voting.
>> >>>>> > (2. If you want to speed up things, please also have a look into
>> the
>> >>>>> > updated implementation [2].)
>> >>>>> >
>> >>>>> > Please consider the following updated specification in the
>> current status
>> >>>>> > of FLIP-217 where the essence is as follows:
>> >>>>> >
>> >>>>> > 1. A method pauseOrResumeSplits is added to SourceReader with
>> default
>> >>>>> > implementation that throws UnsupportedOperationException.
>> >>>>> > 2.  method pauseOrResumeSplits is added to SplitReader with
>> default
>> >>>>> > implementation that throws UnsupportedOperationException.
>> >>>>> > 3. SourceOperator initiates split alignment only if more than one
>> split is
>> >>>>> > assigned to the source (and, of course, only if
>> withSplitAlignment is used).
>> >>>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place
>> (to
>> >>>>> > indicate if the implementation supports pause/resume
>> capabilities).
>> >>>>> > 5. There is NO configuration option to enable watermark alignment
>> of
>> >>>>> > splits or disable pause/resume capabilities.
>> >>>>> >
>> >>>>> > *Note:* If the SourceReader or some SplitReader do not override
>> >>>>> > pauseOrResumeSplits but it is required in the application, an
>> exception is
>> >>>>> > thrown at runtime when an split alignment attempt is executed
>> (not during
>> >>>>> > startup or any time earlier).
>> >>>>> >
>> >>>>> > Also, I have revised the compatibility/migration section to
>> describe a bit
>> >>>>> > of a rationale for the default implementation with exception
>> throwing
>> >>>>> > behavior.
>> >>>>> >
>> >>>>> > Regards,
>> >>>>> > Sebastian
>> >>>>> >
>> >>>>> > [1]
>> >>>>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>> >>>>> > [2]
>> https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>> >>>>> >
>> >>>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org>
>> wrote:
>> >>>>> >
>> >>>>> >> Hi,
>> >>>>> >>
>> >>>>> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
>> >>>>> >> Strictly speaking we are introducing a regression by allowing an
>> >>>>> >> exception to bubble up that did not exist in the previous
>> release,
>> >>>>> >> regardless how suboptimal the behavior may be. However, given
>> that the
>> >>>>> >> feature is still experimental and we are planning to have a
>> >>>>> >> configuration based way to revert to the previous behavior, I
>> think
>> >>>>> >> this is a good solution.
>> >>>>> >>
>> >>>>> >> +1 from my side
>> >>>>> >>
>> >>>>> >> Thomas
>> >>>>> >>
>> >>>>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <
>> pnowojski@apache.org>
>> >>>>> >> wrote:
>> >>>>> >> >
>> >>>>> >> > +1 :)
>> >>>>> >> >
>> >>>>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
>> napisał(a):
>> >>>>> >> >
>> >>>>> >> > >  Thanks for the explanation, Piotr.
>> >>>>> >> > >
>> >>>>> >> > > So it looks like we have a conclusion here.
>> >>>>> >> > >
>> >>>>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it
>> brings more
>> >>>>> >> > > confusion while the benefit is marginal, so I prefer not
>> having that
>> >>>>> >> if
>> >>>>> >> > > possible. It would be good to also hear @Thomas Weise <
>> thw@apache.org
>> >>>>> >> >'s
>> >>>>> >> > > opinion as he mentioned some concern earlier.
>> >>>>> >> > > 2. Let's add the feature knob then. In the future we can
>> simply
>> >>>>> >> ignore the
>> >>>>> >> > > configuration when deprecating it.
>> >>>>> >> > >
>> >>>>> >> > > Thanks,
>> >>>>> >> > >
>> >>>>> >> > > Jiangjie (Becket) Qin
>> >>>>> >> > >
>> >>>>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
>> pnowojski@apache.org
>> >>>>> >> >
>> >>>>> >> > > wrote:
>> >>>>> >> > >
>> >>>>> >> > > > Hi,
>> >>>>> >> > > >
>> >>>>> >> > > > I mean I'm fine with throwing an exception by default in
>> Flink 1.16
>> >>>>> >> in
>> >>>>> >> > > the
>> >>>>> >> > > > "Case 2", but I think we need to provide a way to
>> workaround it for
>> >>>>> >> > > example
>> >>>>> >> > > > via a feature toggle, if it's an easy thing to do. And it
>> seems to
>> >>>>> >> be a
>> >>>>> >> > > > simple thing.
>> >>>>> >> > > >
>> >>>>> >> > > > However this is orthogonal to the
>> `supportsPausingSplits()` issue. I
>> >>>>> >> > > don't
>> >>>>> >> > > > have a big preference whether
>> >>>>> >> > > >   a) the exception should originate on JM, using `default
>> boolean
>> >>>>> >> > > > supportsPausingSplits() { return false; }` (as currently
>> proposed
>> >>>>> >> in the
>> >>>>> >> > > > FLIP),
>> >>>>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
>> >>>>> >> > > > `UnsupportedOperationException` as you are proposing.
>> >>>>> >> > > >
>> >>>>> >> > > > a) fails earlier, so it's more user friendly from this
>> perspective,
>> >>>>> >> but
>> >>>>> >> > > it
>> >>>>> >> > > > provides more possibilities for bugs/inconsistencies for
>> connector
>> >>>>> >> > > > developers, since `supportsPausingSplits()` would have to
>> be kept
>> >>>>> >> in sync
>> >>>>> >> > > > with `pauseOrResumeSplits()`.
>> >>>>> >> > > >
>> >>>>> >> > > > Best,
>> >>>>> >> > > > Piotrek
>> >>>>> >> > > >
>> >>>>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
>> >>>>> >> napisał(a):
>> >>>>> >> > > >
>> >>>>> >> > > > > Hi Piotr,
>> >>>>> >> > > > >
>> >>>>> >> > > > > Just to make sure we are on the same page. There are two
>> cases
>> >>>>> >> for the
>> >>>>> >> > > > > existing FLIP-182 users:
>> >>>>> >> > > > >
>> >>>>> >> > > > > Case 1: Each source reader only has one split assigned.
>> This is
>> >>>>> >> the
>> >>>>> >> > > > > targeted case for FLIP-182.
>> >>>>> >> > > > > Case 2: Each source reader has multiple splits assigned.
>> This is
>> >>>>> >> the
>> >>>>> >> > > > flaky
>> >>>>> >> > > > > case that may or may not work.
>> >>>>> >> > > > >
>> >>>>> >> > > > > With solution 1, the users of case 1 won't be impacted.
>> The users
>> >>>>> >> in
>> >>>>> >> > > > case 2
>> >>>>> >> > > > > will receive an exception which they won't get at the
>> moment.
>> >>>>> >> > > > >
>> >>>>> >> > > > > Do you mean we should not throw an exception in case 2?
>> >>>>> >> Personally I
>> >>>>> >> > > feel
>> >>>>> >> > > > > that is OK and could have been done in FLIP-182 itself
>> because
>> >>>>> >> it's
>> >>>>> >> > > not a
>> >>>>> >> > > > > designed use case. As a user I may see a big variation
>> of the job
>> >>>>> >> state
>> >>>>> >> > > > > sizes from time to time and I am not able to rely on
>> this feature
>> >>>>> >> to
>> >>>>> >> > > plan
>> >>>>> >> > > > > my resources and uphold the SLA.
>> >>>>> >> > > > >
>> >>>>> >> > > > > That said, if you have a strong opinion on this, I am
>> fine with
>> >>>>> >> having
>> >>>>> >> > > > the
>> >>>>> >> > > > > configuration like
>> "allow.coarse-grained.watermark.alignment"
>> >>>>> >> with the
>> >>>>> >> > > > > default value set to false, given that a configuration
>> is much
>> >>>>> >> easier
>> >>>>> >> > > to
>> >>>>> >> > > > > deprecate than a method.
>> >>>>> >> > > > >
>> >>>>> >> > > > > Thanks,
>> >>>>> >> > > > >
>> >>>>> >> > > > > Jiangjie (Becket) Qin
>> >>>>> >> > > > >
>> >>>>> >> > > > >
>> >>>>> >>
>> >>>>> >
>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Sebastian Mattheis <se...@ververica.com>.
Hi everybody,

I discussed last week the semantics and an implementation stragegy of the
configuration parameter with Piotr and did the implementation and some
tests this weekend.

A short summary of what I discussed and recapped with Piotr:

   - The configuration parameter allows (and tolerates) the use of
   `SourceReader`s that do not implement `pauseOrResumeSplits` method. (The
   exception is ignored in `SourceOperator`.)
   - The configuration parameter allows (and tolerates) the use of
   `SourceSplitReader`s that do not implement `pauseOrResumeSplits` method.
   (The exception is ignored in the `PauseResumeSplitsTask` of the
   `SplitFetcher`.)

In particular, this means that a `SourceReader` with two `SplitReader`s
where one does not implement `pauseOrResumeSplits` and the other does. It
will allow the use of the one that doesn't and will, nevertheless, still
attempt to pause/resume the other. (Consequently, if the one that doesn't
support pause is ahead it simply cannot not pause the `SplitReader` but if
the other is ahead it will be paused until watermarks are aligned.)

There is one flaw that I don't really like but which I accept as from the
discussion and which I will add/update in the FLIP:
If there is any other mechanism (e.g. other than watermark alignment) that
attempts to pause or resume `SplitReader`s, it will have side effects and
potential unexpected behavior if one or more `SplitReader`s do not
implement `pauseOrResumeSplits` and the user set the configuration
parameter to allow/tolerate it for split-level watermark alignment. (The
reason is simply that we cannot differentiate which mechanism attempts to
pause/resume, i.e., if it used for watermark alignment or something else.)
Given that this configuration parameter is supposed to be an intermediate
fallback, it is acceptable for me but changed at latest when some other
mechanism uses pauseOrResumeSplits.

As for the parameter naming, I have implemented it the following way
(reason: There exists a parameter `pipeline.auto-watermark-interval`.):

pipeline.watermark-alignment.allow-unaligned-source-splits (default: false)

Status: I have implemented the configuration parameter (and an IT case). I
still need to update the FLIP and will ping you (tomorrow or so) when I'm
done with that. Please check/review my description from above if you see
any problems with that.

Thanks a lot and regards,
Sebastian


On Wed, Jul 20, 2022 at 11:24 PM Thomas Weise <th...@apache.org> wrote:

> Hi Sebastian,
>
> Thank you for updating the FLIP and sorry for my delayed response. As
> Piotr pointed out, we would need to incorporate the fallback flag into
> the design to reflect the outcome of the previous discussion.
>
> Based on the current FLIP and as detailed by Becket, the
> SourceOperator coordinates the alignment. It is responsible for the
> pause/resume decision and knows how many splits are assigned.
> Therefore shouldn't it have all the information needed to efficiently
> handle the case of UnsupportedOperationException thrown by a reader?
>
> Although the fallback requires some extra implementation effort, I
> think that is more than offset by not surprising users and offering a
> smoother migration path. Yes, the flag is a temporary feature that
> will become obsolete in perhaps 2-3 releases (can we please also
> include that into the FLIP?). But since it would be just a
> configuration property that can be ignored at that point (for which
> there is precedence), no code change will be forced on users.
>
> As for the property name, perhaps the following would be even more
> descriptive?
>
> coarse.grained.wm.alignment.fallback.enabled
>
> Thanks!
> Thomas
>
>
> On Wed, Jul 13, 2022 at 10:59 AM Becket Qin <be...@gmail.com> wrote:
> >
> > Thanks for the explanation, Sebastian. I understand your concern now.
> >
> > 1. About the major concern. Personally I'd consider the coarse grained
> watermark alignment as a special case for backward compatibility. In the
> future, if for whatever reason we want to pause a split and that is not
> supported, it seems the only thing that makes sense is throwing an
> exception, instead of pausing the entire source reader. Regarding this
> FLIP, if the logic that determines which split should be paused is in the
> SourceOperator, the SourceOperator actually knows the reason why it pauses
> a split. It also knows whether there are more than one split assigned to
> the source reader. So it can just fallback to the coarse grained watermark
> alignment, without affecting other reasons of pausing a split, right? And
> in the future, if there are more purposes for pausing / resuming a split,
> the SourceOperator still needs to understand each of the reasons in order
> to resume the splits after all the pausing conditions are no longer met.
> >
> > 2. Naming wise, would "coarse.grained.watermark.alignment.enabled"
> address your concern?
> >
> > The only concern I have for Option A is that people may not be able to
> benefit from split level WM alignment until all the sources they need have
> that implemented. This seems unnecessarily delaying the adoption of a new
> feature, which looks like a more substantive downside compared with the
> "coarse.grained.wm.alignment.enabled" option.
> >
> > BTW, the SourceOperator doesn't need to invoke the pauseOrResumeSplit()
> method and catch the UnsupportedOperation every time. A flag can be set so
> it doesn't attempt to pause the split after the first time it sees the
> exception.
> >
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis <
> sebastian@ververica.com> wrote:
> >>
> >> Hi Becket, Hi Thomas, Hi Piotrek,
> >>
> >> Thanks for the feedback. I would like to highlight some concerns:
> >>
> >> Major: A configuration parameter like "allow coarse grained alignment"
> defines a semantic that mixes two contexts conditionally as follows:
> "ignore incapability to pause splits in SourceReader/SplitReader" IF
> (conditional) we "allow coarse grained watermark alignment". At the same
> time we said that there is no way to check the capability of
> SourceReader/SplitReader to pause/resume other than observing a
> UnsupportedOperationException during runtime such that we cannot disable
> the trigger for watermark split alignment in the SourceOperator. Instead,
> we can only ignore the incapability of SourceReader/SplitReader during
> execution of a pause/resume attempt which, consequently, requires to check
> the "allow coarse grained alignment " parameter value (to implement the
> conditional semantic). However, during this execution we actually don't
> know whether the attempt was executed for the purpose of watermark
> alignment or for some other purpose such that the check actually depends on
> who triggered the pause/resume attempt and hides the exception potentially
> unexpectedly for some other use case. Of course, currently there is no
> other purpose and, hence, no other trigger than watermark alignment.
> However, this breaks, in my perspective, the idea of having
> pauseOrResumeSplits (re)usable for other use cases.
> >> Minor: I'm not aware of any configuration parameter in the format like
> `allow.*` as you suggested with `allow.coarse.grained.watermark.alignment`.
> Would that still be okay to do?
> >>
> >> As we have agreed to not have a "supportsPausableSplits" method because
> of potential inconsistencies between return value of this method and the
> actual implementation (and also the difficulty to have a meaningful return
> value where the support actually depends on SourceReader AND the assigned
> SplitReaders), I don't want to bring up the discussion about the
> "supportsPauseableSplits" method again. Instead, I see the following
> options:
> >>
> >> Option A: I would drop the idea of "allow coarse grained alignment"
> semantic of the parameter but implement a parameter to "enable/disable
> split watermark alignment" which we can easily use in the SourceOperator to
> disable the trigger of split alignment. This is indeed more static and less
> flexible, because it disables split alignment unconditionally, but it is
> "context-decoupled" and more straight-forward to use. This would also
> address the use case of disabling split alignment for the purpose of
> runtime behavior evaluation, as mentioned by Thomas (if I remember
> correctly.) I would implement the parameter with a default where watermark
> split alignment is enabled which requires users to check their application
> when upgrading to 1.16 if a) there is a source that reads from multiple
> splits and b), if yes, all splits of that source support pause/resume. If
> a) yes and b) no, the user must take action to disable watermark split
> alignment (which disables the trigger of split alignment only for the
> purpose).
> >>
> >> Option B: If we ignore my concern, I would simply check the "allow
> coarse grained watermark alignment" parameter value on every attempt to
> execute pause/resume in the SourceReader and in the SplitReader and will
> not throw UnsupportedOperationException if the parameter value is set to
> true.
> >>
> >> Please note that the parameter is also used only for some kind of
> migration phase. Therefore, I wonder if we need to overcomplicate things.
> >>
> >> @Piotrek, @Becket, @Thomas: I would recommend/favour option A. Please
> let me know your feedback and/or concerns as soon as possible, if possible.
> :)
> >>
> >> Regards,
> >> Sebastian
> >>
> >>
> >> On Wed, Jul 13, 2022 at 9:37 AM Becket Qin <be...@gmail.com>
> wrote:
> >>>
> >>> Hi Sebastian,
> >>>
> >>> Thanks for updating the FLIP wiki.
> >>>
> >>> Just to double confirm, I was thinking of a configuration like
> "allow.coarse.grained.watermark.alignment". This will allow the coarse
> grained watermark alignment as a fallback instead of bubbling up an
> exception if split pausing is not supported in some Sources in a Flink job.
> And this will only affect the Sources that do not support split pausing,
> but not the Sources that have split pausing supported.
> >>>
> >>> This seems slightly different from a <knob> enables / disables split
> alignment. This sounds like a global thing, and it seems not necessary to
> disable the split alignment, as long as the coarse grained alignment can be
> a fallback.
> >>>
> >>> Thanks,
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>> On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <
> sebastian@ververica.com> wrote:
> >>>>
> >>>> Hi Piotrek,
> >>>>
> >>>> Sorry I've read it and forgot it when I was ripping out the
> supportsPauseOrResume method again. Thanks for pointing that out. I will
> add it as follows: The <knob> enables/disables split alignment in the
> SourceOperator where the default is that split alignment is enabled. (And I
> will add the note: "In future releases, the <knob> may be ignored such that
> split alignment is always enabled.")
> >>>>
> >>>> Cheers,
> >>>> Sebastian
> >>>>
> >>>> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <pn...@apache.org>
> wrote:
> >>>>>
> >>>>> Hi Sebastian,
> >>>>>
> >>>>> Thanks for picking this up.
> >>>>>
> >>>>> > 5. There is NO configuration option to enable watermark alignment
> of
> >>>>> splits or disable pause/resume capabilities.
> >>>>>
> >>>>> Isn't this contradicting what we actually agreed on?
> >>>>>
> >>>>> > we are planning to have a configuration based way to revert to the
> >>>>> previous behavior
> >>>>>
> >>>>> I think what we agreed in the last couple of emails was to add a
> >>>>> configuration toggle, that would allow Flink 1.15 users, that are
> using
> >>>>> watermark alignment with multiple splits per source operator, to
> continue
> >>>>> using it with the old 1.15 semantic, even if their source doesn't
> support
> >>>>> pausing/resuming splits. It seems to me like the current FLIP and
> >>>>> implementation proposal would always throw an exception in that case?
> >>>>>
> >>>>> Best,
> >>>>> Piotrek
> >>>>>
> >>>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <sebastian@ververica.com
> >
> >>>>> napisał(a):
> >>>>>
> >>>>> > Hi all,
> >>>>> >
> >>>>> > I have updated FLIP-217 [1] to the proposed specification and
> adapted the
> >>>>> > current implementation [2] respectively.
> >>>>> >
> >>>>> > This means both, FLIP and implementation, are ready for review
> from my
> >>>>> > side. (I would revise commit history and messages for the final PR
> but left
> >>>>> > it as is for now and the records of this discussion.)
> >>>>> >
> >>>>> > 1. Please review the updated version of FLIP-217 [1]. If there are
> no
> >>>>> > further concerns, I would initiate the voting.
> >>>>> > (2. If you want to speed up things, please also have a look into
> the
> >>>>> > updated implementation [2].)
> >>>>> >
> >>>>> > Please consider the following updated specification in the current
> status
> >>>>> > of FLIP-217 where the essence is as follows:
> >>>>> >
> >>>>> > 1. A method pauseOrResumeSplits is added to SourceReader with
> default
> >>>>> > implementation that throws UnsupportedOperationException.
> >>>>> > 2.  method pauseOrResumeSplits is added to SplitReader with default
> >>>>> > implementation that throws UnsupportedOperationException.
> >>>>> > 3. SourceOperator initiates split alignment only if more than one
> split is
> >>>>> > assigned to the source (and, of course, only if withSplitAlignment
> is used).
> >>>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place
> (to
> >>>>> > indicate if the implementation supports pause/resume capabilities).
> >>>>> > 5. There is NO configuration option to enable watermark alignment
> of
> >>>>> > splits or disable pause/resume capabilities.
> >>>>> >
> >>>>> > *Note:* If the SourceReader or some SplitReader do not override
> >>>>> > pauseOrResumeSplits but it is required in the application, an
> exception is
> >>>>> > thrown at runtime when an split alignment attempt is executed (not
> during
> >>>>> > startup or any time earlier).
> >>>>> >
> >>>>> > Also, I have revised the compatibility/migration section to
> describe a bit
> >>>>> > of a rationale for the default implementation with exception
> throwing
> >>>>> > behavior.
> >>>>> >
> >>>>> > Regards,
> >>>>> > Sebastian
> >>>>> >
> >>>>> > [1]
> >>>>> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> >>>>> > [2]
> https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
> >>>>> >
> >>>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org>
> wrote:
> >>>>> >
> >>>>> >> Hi,
> >>>>> >>
> >>>>> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
> >>>>> >> Strictly speaking we are introducing a regression by allowing an
> >>>>> >> exception to bubble up that did not exist in the previous release,
> >>>>> >> regardless how suboptimal the behavior may be. However, given
> that the
> >>>>> >> feature is still experimental and we are planning to have a
> >>>>> >> configuration based way to revert to the previous behavior, I
> think
> >>>>> >> this is a good solution.
> >>>>> >>
> >>>>> >> +1 from my side
> >>>>> >>
> >>>>> >> Thomas
> >>>>> >>
> >>>>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <
> pnowojski@apache.org>
> >>>>> >> wrote:
> >>>>> >> >
> >>>>> >> > +1 :)
> >>>>> >> >
> >>>>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
> napisał(a):
> >>>>> >> >
> >>>>> >> > >  Thanks for the explanation, Piotr.
> >>>>> >> > >
> >>>>> >> > > So it looks like we have a conclusion here.
> >>>>> >> > >
> >>>>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it
> brings more
> >>>>> >> > > confusion while the benefit is marginal, so I prefer not
> having that
> >>>>> >> if
> >>>>> >> > > possible. It would be good to also hear @Thomas Weise <
> thw@apache.org
> >>>>> >> >'s
> >>>>> >> > > opinion as he mentioned some concern earlier.
> >>>>> >> > > 2. Let's add the feature knob then. In the future we can
> simply
> >>>>> >> ignore the
> >>>>> >> > > configuration when deprecating it.
> >>>>> >> > >
> >>>>> >> > > Thanks,
> >>>>> >> > >
> >>>>> >> > > Jiangjie (Becket) Qin
> >>>>> >> > >
> >>>>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
> pnowojski@apache.org
> >>>>> >> >
> >>>>> >> > > wrote:
> >>>>> >> > >
> >>>>> >> > > > Hi,
> >>>>> >> > > >
> >>>>> >> > > > I mean I'm fine with throwing an exception by default in
> Flink 1.16
> >>>>> >> in
> >>>>> >> > > the
> >>>>> >> > > > "Case 2", but I think we need to provide a way to
> workaround it for
> >>>>> >> > > example
> >>>>> >> > > > via a feature toggle, if it's an easy thing to do. And it
> seems to
> >>>>> >> be a
> >>>>> >> > > > simple thing.
> >>>>> >> > > >
> >>>>> >> > > > However this is orthogonal to the `supportsPausingSplits()`
> issue. I
> >>>>> >> > > don't
> >>>>> >> > > > have a big preference whether
> >>>>> >> > > >   a) the exception should originate on JM, using `default
> boolean
> >>>>> >> > > > supportsPausingSplits() { return false; }` (as currently
> proposed
> >>>>> >> in the
> >>>>> >> > > > FLIP),
> >>>>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
> >>>>> >> > > > `UnsupportedOperationException` as you are proposing.
> >>>>> >> > > >
> >>>>> >> > > > a) fails earlier, so it's more user friendly from this
> perspective,
> >>>>> >> but
> >>>>> >> > > it
> >>>>> >> > > > provides more possibilities for bugs/inconsistencies for
> connector
> >>>>> >> > > > developers, since `supportsPausingSplits()` would have to
> be kept
> >>>>> >> in sync
> >>>>> >> > > > with `pauseOrResumeSplits()`.
> >>>>> >> > > >
> >>>>> >> > > > Best,
> >>>>> >> > > > Piotrek
> >>>>> >> > > >
> >>>>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
> >>>>> >> napisał(a):
> >>>>> >> > > >
> >>>>> >> > > > > Hi Piotr,
> >>>>> >> > > > >
> >>>>> >> > > > > Just to make sure we are on the same page. There are two
> cases
> >>>>> >> for the
> >>>>> >> > > > > existing FLIP-182 users:
> >>>>> >> > > > >
> >>>>> >> > > > > Case 1: Each source reader only has one split assigned.
> This is
> >>>>> >> the
> >>>>> >> > > > > targeted case for FLIP-182.
> >>>>> >> > > > > Case 2: Each source reader has multiple splits assigned.
> This is
> >>>>> >> the
> >>>>> >> > > > flaky
> >>>>> >> > > > > case that may or may not work.
> >>>>> >> > > > >
> >>>>> >> > > > > With solution 1, the users of case 1 won't be impacted.
> The users
> >>>>> >> in
> >>>>> >> > > > case 2
> >>>>> >> > > > > will receive an exception which they won't get at the
> moment.
> >>>>> >> > > > >
> >>>>> >> > > > > Do you mean we should not throw an exception in case 2?
> >>>>> >> Personally I
> >>>>> >> > > feel
> >>>>> >> > > > > that is OK and could have been done in FLIP-182 itself
> because
> >>>>> >> it's
> >>>>> >> > > not a
> >>>>> >> > > > > designed use case. As a user I may see a big variation of
> the job
> >>>>> >> state
> >>>>> >> > > > > sizes from time to time and I am not able to rely on this
> feature
> >>>>> >> to
> >>>>> >> > > plan
> >>>>> >> > > > > my resources and uphold the SLA.
> >>>>> >> > > > >
> >>>>> >> > > > > That said, if you have a strong opinion on this, I am
> fine with
> >>>>> >> having
> >>>>> >> > > > the
> >>>>> >> > > > > configuration like
> "allow.coarse-grained.watermark.alignment"
> >>>>> >> with the
> >>>>> >> > > > > default value set to false, given that a configuration is
> much
> >>>>> >> easier
> >>>>> >> > > to
> >>>>> >> > > > > deprecate than a method.
> >>>>> >> > > > >
> >>>>> >> > > > > Thanks,
> >>>>> >> > > > >
> >>>>> >> > > > > Jiangjie (Becket) Qin
> >>>>> >> > > > >
> >>>>> >> > > > >
> >>>>> >>
> >>>>> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Thomas Weise <th...@apache.org>.
Hi Sebastian,

Thank you for updating the FLIP and sorry for my delayed response. As
Piotr pointed out, we would need to incorporate the fallback flag into
the design to reflect the outcome of the previous discussion.

Based on the current FLIP and as detailed by Becket, the
SourceOperator coordinates the alignment. It is responsible for the
pause/resume decision and knows how many splits are assigned.
Therefore shouldn't it have all the information needed to efficiently
handle the case of UnsupportedOperationException thrown by a reader?

Although the fallback requires some extra implementation effort, I
think that is more than offset by not surprising users and offering a
smoother migration path. Yes, the flag is a temporary feature that
will become obsolete in perhaps 2-3 releases (can we please also
include that into the FLIP?). But since it would be just a
configuration property that can be ignored at that point (for which
there is precedence), no code change will be forced on users.

As for the property name, perhaps the following would be even more descriptive?

coarse.grained.wm.alignment.fallback.enabled

Thanks!
Thomas


On Wed, Jul 13, 2022 at 10:59 AM Becket Qin <be...@gmail.com> wrote:
>
> Thanks for the explanation, Sebastian. I understand your concern now.
>
> 1. About the major concern. Personally I'd consider the coarse grained watermark alignment as a special case for backward compatibility. In the future, if for whatever reason we want to pause a split and that is not supported, it seems the only thing that makes sense is throwing an exception, instead of pausing the entire source reader. Regarding this FLIP, if the logic that determines which split should be paused is in the SourceOperator, the SourceOperator actually knows the reason why it pauses a split. It also knows whether there are more than one split assigned to the source reader. So it can just fallback to the coarse grained watermark alignment, without affecting other reasons of pausing a split, right? And in the future, if there are more purposes for pausing / resuming a split, the SourceOperator still needs to understand each of the reasons in order to resume the splits after all the pausing conditions are no longer met.
>
> 2. Naming wise, would "coarse.grained.watermark.alignment.enabled" address your concern?
>
> The only concern I have for Option A is that people may not be able to benefit from split level WM alignment until all the sources they need have that implemented. This seems unnecessarily delaying the adoption of a new feature, which looks like a more substantive downside compared with the "coarse.grained.wm.alignment.enabled" option.
>
> BTW, the SourceOperator doesn't need to invoke the pauseOrResumeSplit() method and catch the UnsupportedOperation every time. A flag can be set so it doesn't attempt to pause the split after the first time it sees the exception.
>
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis <se...@ververica.com> wrote:
>>
>> Hi Becket, Hi Thomas, Hi Piotrek,
>>
>> Thanks for the feedback. I would like to highlight some concerns:
>>
>> Major: A configuration parameter like "allow coarse grained alignment" defines a semantic that mixes two contexts conditionally as follows: "ignore incapability to pause splits in SourceReader/SplitReader" IF (conditional) we "allow coarse grained watermark alignment". At the same time we said that there is no way to check the capability of SourceReader/SplitReader to pause/resume other than observing a UnsupportedOperationException during runtime such that we cannot disable the trigger for watermark split alignment in the SourceOperator. Instead, we can only ignore the incapability of SourceReader/SplitReader during execution of a pause/resume attempt which, consequently, requires to check the "allow coarse grained alignment " parameter value (to implement the conditional semantic). However, during this execution we actually don't know whether the attempt was executed for the purpose of watermark alignment or for some other purpose such that the check actually depends on who triggered the pause/resume attempt and hides the exception potentially unexpectedly for some other use case. Of course, currently there is no other purpose and, hence, no other trigger than watermark alignment. However, this breaks, in my perspective, the idea of having pauseOrResumeSplits (re)usable for other use cases.
>> Minor: I'm not aware of any configuration parameter in the format like `allow.*` as you suggested with `allow.coarse.grained.watermark.alignment`. Would that still be okay to do?
>>
>> As we have agreed to not have a "supportsPausableSplits" method because of potential inconsistencies between return value of this method and the actual implementation (and also the difficulty to have a meaningful return value where the support actually depends on SourceReader AND the assigned SplitReaders), I don't want to bring up the discussion about the "supportsPauseableSplits" method again. Instead, I see the following options:
>>
>> Option A: I would drop the idea of "allow coarse grained alignment" semantic of the parameter but implement a parameter to "enable/disable split watermark alignment" which we can easily use in the SourceOperator to disable the trigger of split alignment. This is indeed more static and less flexible, because it disables split alignment unconditionally, but it is "context-decoupled" and more straight-forward to use. This would also address the use case of disabling split alignment for the purpose of runtime behavior evaluation, as mentioned by Thomas (if I remember correctly.) I would implement the parameter with a default where watermark split alignment is enabled which requires users to check their application when upgrading to 1.16 if a) there is a source that reads from multiple splits and b), if yes, all splits of that source support pause/resume. If a) yes and b) no, the user must take action to disable watermark split alignment (which disables the trigger of split alignment only for the purpose).
>>
>> Option B: If we ignore my concern, I would simply check the "allow coarse grained watermark alignment" parameter value on every attempt to execute pause/resume in the SourceReader and in the SplitReader and will not throw UnsupportedOperationException if the parameter value is set to true.
>>
>> Please note that the parameter is also used only for some kind of migration phase. Therefore, I wonder if we need to overcomplicate things.
>>
>> @Piotrek, @Becket, @Thomas: I would recommend/favour option A. Please let me know your feedback and/or concerns as soon as possible, if possible. :)
>>
>> Regards,
>> Sebastian
>>
>>
>> On Wed, Jul 13, 2022 at 9:37 AM Becket Qin <be...@gmail.com> wrote:
>>>
>>> Hi Sebastian,
>>>
>>> Thanks for updating the FLIP wiki.
>>>
>>> Just to double confirm, I was thinking of a configuration like "allow.coarse.grained.watermark.alignment". This will allow the coarse grained watermark alignment as a fallback instead of bubbling up an exception if split pausing is not supported in some Sources in a Flink job. And this will only affect the Sources that do not support split pausing, but not the Sources that have split pausing supported.
>>>
>>> This seems slightly different from a <knob> enables / disables split alignment. This sounds like a global thing, and it seems not necessary to disable the split alignment, as long as the coarse grained alignment can be a fallback.
>>>
>>> Thanks,
>>>
>>> Jiangjie (Becket) Qin
>>>
>>> On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <se...@ververica.com> wrote:
>>>>
>>>> Hi Piotrek,
>>>>
>>>> Sorry I've read it and forgot it when I was ripping out the supportsPauseOrResume method again. Thanks for pointing that out. I will add it as follows: The <knob> enables/disables split alignment in the SourceOperator where the default is that split alignment is enabled. (And I will add the note: "In future releases, the <knob> may be ignored such that split alignment is always enabled.")
>>>>
>>>> Cheers,
>>>> Sebastian
>>>>
>>>> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <pn...@apache.org> wrote:
>>>>>
>>>>> Hi Sebastian,
>>>>>
>>>>> Thanks for picking this up.
>>>>>
>>>>> > 5. There is NO configuration option to enable watermark alignment of
>>>>> splits or disable pause/resume capabilities.
>>>>>
>>>>> Isn't this contradicting what we actually agreed on?
>>>>>
>>>>> > we are planning to have a configuration based way to revert to the
>>>>> previous behavior
>>>>>
>>>>> I think what we agreed in the last couple of emails was to add a
>>>>> configuration toggle, that would allow Flink 1.15 users, that are using
>>>>> watermark alignment with multiple splits per source operator, to continue
>>>>> using it with the old 1.15 semantic, even if their source doesn't support
>>>>> pausing/resuming splits. It seems to me like the current FLIP and
>>>>> implementation proposal would always throw an exception in that case?
>>>>>
>>>>> Best,
>>>>> Piotrek
>>>>>
>>>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <se...@ververica.com>
>>>>> napisał(a):
>>>>>
>>>>> > Hi all,
>>>>> >
>>>>> > I have updated FLIP-217 [1] to the proposed specification and adapted the
>>>>> > current implementation [2] respectively.
>>>>> >
>>>>> > This means both, FLIP and implementation, are ready for review from my
>>>>> > side. (I would revise commit history and messages for the final PR but left
>>>>> > it as is for now and the records of this discussion.)
>>>>> >
>>>>> > 1. Please review the updated version of FLIP-217 [1]. If there are no
>>>>> > further concerns, I would initiate the voting.
>>>>> > (2. If you want to speed up things, please also have a look into the
>>>>> > updated implementation [2].)
>>>>> >
>>>>> > Please consider the following updated specification in the current status
>>>>> > of FLIP-217 where the essence is as follows:
>>>>> >
>>>>> > 1. A method pauseOrResumeSplits is added to SourceReader with default
>>>>> > implementation that throws UnsupportedOperationException.
>>>>> > 2.  method pauseOrResumeSplits is added to SplitReader with default
>>>>> > implementation that throws UnsupportedOperationException.
>>>>> > 3. SourceOperator initiates split alignment only if more than one split is
>>>>> > assigned to the source (and, of course, only if withSplitAlignment is used).
>>>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place (to
>>>>> > indicate if the implementation supports pause/resume capabilities).
>>>>> > 5. There is NO configuration option to enable watermark alignment of
>>>>> > splits or disable pause/resume capabilities.
>>>>> >
>>>>> > *Note:* If the SourceReader or some SplitReader do not override
>>>>> > pauseOrResumeSplits but it is required in the application, an exception is
>>>>> > thrown at runtime when an split alignment attempt is executed (not during
>>>>> > startup or any time earlier).
>>>>> >
>>>>> > Also, I have revised the compatibility/migration section to describe a bit
>>>>> > of a rationale for the default implementation with exception throwing
>>>>> > behavior.
>>>>> >
>>>>> > Regards,
>>>>> > Sebastian
>>>>> >
>>>>> > [1]
>>>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>>>> > [2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>>>>> >
>>>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org> wrote:
>>>>> >
>>>>> >> Hi,
>>>>> >>
>>>>> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
>>>>> >> Strictly speaking we are introducing a regression by allowing an
>>>>> >> exception to bubble up that did not exist in the previous release,
>>>>> >> regardless how suboptimal the behavior may be. However, given that the
>>>>> >> feature is still experimental and we are planning to have a
>>>>> >> configuration based way to revert to the previous behavior, I think
>>>>> >> this is a good solution.
>>>>> >>
>>>>> >> +1 from my side
>>>>> >>
>>>>> >> Thomas
>>>>> >>
>>>>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <pn...@apache.org>
>>>>> >> wrote:
>>>>> >> >
>>>>> >> > +1 :)
>>>>> >> >
>>>>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com> napisał(a):
>>>>> >> >
>>>>> >> > >  Thanks for the explanation, Piotr.
>>>>> >> > >
>>>>> >> > > So it looks like we have a conclusion here.
>>>>> >> > >
>>>>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it brings more
>>>>> >> > > confusion while the benefit is marginal, so I prefer not having that
>>>>> >> if
>>>>> >> > > possible. It would be good to also hear @Thomas Weise <thw@apache.org
>>>>> >> >'s
>>>>> >> > > opinion as he mentioned some concern earlier.
>>>>> >> > > 2. Let's add the feature knob then. In the future we can simply
>>>>> >> ignore the
>>>>> >> > > configuration when deprecating it.
>>>>> >> > >
>>>>> >> > > Thanks,
>>>>> >> > >
>>>>> >> > > Jiangjie (Becket) Qin
>>>>> >> > >
>>>>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <pnowojski@apache.org
>>>>> >> >
>>>>> >> > > wrote:
>>>>> >> > >
>>>>> >> > > > Hi,
>>>>> >> > > >
>>>>> >> > > > I mean I'm fine with throwing an exception by default in Flink 1.16
>>>>> >> in
>>>>> >> > > the
>>>>> >> > > > "Case 2", but I think we need to provide a way to workaround it for
>>>>> >> > > example
>>>>> >> > > > via a feature toggle, if it's an easy thing to do. And it seems to
>>>>> >> be a
>>>>> >> > > > simple thing.
>>>>> >> > > >
>>>>> >> > > > However this is orthogonal to the `supportsPausingSplits()` issue. I
>>>>> >> > > don't
>>>>> >> > > > have a big preference whether
>>>>> >> > > >   a) the exception should originate on JM, using `default boolean
>>>>> >> > > > supportsPausingSplits() { return false; }` (as currently proposed
>>>>> >> in the
>>>>> >> > > > FLIP),
>>>>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
>>>>> >> > > > `UnsupportedOperationException` as you are proposing.
>>>>> >> > > >
>>>>> >> > > > a) fails earlier, so it's more user friendly from this perspective,
>>>>> >> but
>>>>> >> > > it
>>>>> >> > > > provides more possibilities for bugs/inconsistencies for connector
>>>>> >> > > > developers, since `supportsPausingSplits()` would have to be kept
>>>>> >> in sync
>>>>> >> > > > with `pauseOrResumeSplits()`.
>>>>> >> > > >
>>>>> >> > > > Best,
>>>>> >> > > > Piotrek
>>>>> >> > > >
>>>>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
>>>>> >> napisał(a):
>>>>> >> > > >
>>>>> >> > > > > Hi Piotr,
>>>>> >> > > > >
>>>>> >> > > > > Just to make sure we are on the same page. There are two cases
>>>>> >> for the
>>>>> >> > > > > existing FLIP-182 users:
>>>>> >> > > > >
>>>>> >> > > > > Case 1: Each source reader only has one split assigned. This is
>>>>> >> the
>>>>> >> > > > > targeted case for FLIP-182.
>>>>> >> > > > > Case 2: Each source reader has multiple splits assigned. This is
>>>>> >> the
>>>>> >> > > > flaky
>>>>> >> > > > > case that may or may not work.
>>>>> >> > > > >
>>>>> >> > > > > With solution 1, the users of case 1 won't be impacted. The users
>>>>> >> in
>>>>> >> > > > case 2
>>>>> >> > > > > will receive an exception which they won't get at the moment.
>>>>> >> > > > >
>>>>> >> > > > > Do you mean we should not throw an exception in case 2?
>>>>> >> Personally I
>>>>> >> > > feel
>>>>> >> > > > > that is OK and could have been done in FLIP-182 itself because
>>>>> >> it's
>>>>> >> > > not a
>>>>> >> > > > > designed use case. As a user I may see a big variation of the job
>>>>> >> state
>>>>> >> > > > > sizes from time to time and I am not able to rely on this feature
>>>>> >> to
>>>>> >> > > plan
>>>>> >> > > > > my resources and uphold the SLA.
>>>>> >> > > > >
>>>>> >> > > > > That said, if you have a strong opinion on this, I am fine with
>>>>> >> having
>>>>> >> > > > the
>>>>> >> > > > > configuration like "allow.coarse-grained.watermark.alignment"
>>>>> >> with the
>>>>> >> > > > > default value set to false, given that a configuration is much
>>>>> >> easier
>>>>> >> > > to
>>>>> >> > > > > deprecate than a method.
>>>>> >> > > > >
>>>>> >> > > > > Thanks,
>>>>> >> > > > >
>>>>> >> > > > > Jiangjie (Becket) Qin
>>>>> >> > > > >
>>>>> >> > > > >
>>>>> >>
>>>>> >

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Thanks for the explanation, Sebastian. I understand your concern now.

1. About the major concern. Personally I'd consider the coarse grained
watermark alignment as a special case for backward compatibility. In the
future, if for whatever reason we want to pause a split and that is not
supported, it seems the only thing that makes sense is throwing an
exception, instead of pausing the entire source reader. Regarding this
FLIP, if the logic that determines which split should be paused is in the
SourceOperator, the SourceOperator actually knows the reason why it pauses
a split. It also knows whether there are more than one split assigned to
the source reader. So it can just fallback to the coarse grained watermark
alignment, without affecting other reasons of pausing a split, right? And
in the future, if there are more purposes for pausing / resuming a split,
the SourceOperator still needs to understand each of the reasons in order
to resume the splits after all the pausing conditions are no longer met.

2. Naming wise, would "coarse.grained.watermark.alignment.enabled" address
your concern?

The only concern I have for Option A is that people may not be able to
benefit from split level WM alignment until all the sources they need have
that implemented. This seems unnecessarily delaying the adoption of a new
feature, which looks like a more substantive downside compared with the
"coarse.grained.wm.alignment.enabled" option.

BTW, the SourceOperator doesn't need to invoke the pauseOrResumeSplit()
method and catch the UnsupportedOperation every time. A flag can be set so
it doesn't attempt to pause the split after the first time it sees the
exception.


Thanks,

Jiangjie (Becket) Qin



On Wed, Jul 13, 2022 at 5:11 PM Sebastian Mattheis <se...@ververica.com>
wrote:

> Hi Becket, Hi Thomas, Hi Piotrek,
>
> Thanks for the feedback. I would like to highlight some concerns:
>
>    1. Major: A configuration parameter like "allow coarse grained
>    alignment" defines a semantic that mixes two contexts conditionally as
>    follows: "ignore incapability to pause splits in SourceReader/SplitReader"
>    IF (conditional) we "allow coarse grained watermark alignment". At the same
>    time we said that there is no way to check the capability of
>    SourceReader/SplitReader to pause/resume other than observing a
>    UnsupportedOperationException during runtime such that we cannot disable
>    the trigger for watermark split alignment in the SourceOperator. Instead,
>    we can only ignore the incapability of SourceReader/SplitReader during
>    execution of a pause/resume attempt which, consequently, requires to check
>    the "allow coarse grained alignment " parameter value (to implement the
>    conditional semantic). However, during this execution we actually don't
>    know whether the attempt was executed for the purpose of watermark
>    alignment or for some other purpose such that the check actually depends on
>    who triggered the pause/resume attempt and hides the exception potentially
>    unexpectedly for some other use case. Of course, currently there is no
>    other purpose and, hence, no other trigger than watermark alignment.
>    However, this breaks, in my perspective, the idea of having
>    pauseOrResumeSplits (re)usable for other use cases.
>    2. Minor: I'm not aware of any configuration parameter in the format
>    like `allow.*` as you suggested with
>    `allow.coarse.grained.watermark.alignment`. Would that still be okay to do?
>
> As we have agreed to not have a "supportsPausableSplits" method because of
> potential inconsistencies between return value of this method and the
> actual implementation (and also the difficulty to have a meaningful return
> value where the support actually depends on SourceReader AND the assigned
> SplitReaders), I don't want to bring up the discussion about the
> "supportsPauseableSplits" method again. Instead, I see the following
> options:
>
> Option A: I would drop the idea of "allow coarse grained alignment"
> semantic of the parameter but implement a parameter to "enable/disable
> split watermark alignment" which we can easily use in the SourceOperator to
> disable the trigger of split alignment. This is indeed more static and less
> flexible, because it disables split alignment unconditionally, but it is
> "context-decoupled" and more straight-forward to use. This would also
> address the use case of disabling split alignment for the purpose of
> runtime behavior evaluation, as mentioned by Thomas (if I remember
> correctly.) I would implement the parameter with a default where watermark
> split alignment is enabled which requires users to check their application
> when upgrading to 1.16 if a) there is a source that reads from multiple
> splits and b), if yes, all splits of that source support pause/resume. If
> a) yes and b) no, the user must take action to disable watermark split
> alignment (which disables the trigger of split alignment only for the
> purpose).
>
> Option B: If we ignore my concern, I would simply check the "allow coarse
> grained watermark alignment" parameter value on every attempt to execute
> pause/resume in the SourceReader and in the SplitReader and will not throw
> UnsupportedOperationException if the parameter value is set to true.
>
> Please note that the parameter is also used only for some kind of
> migration phase. Therefore, I wonder if we need to overcomplicate things.
>
> @Piotrek, @Becket, @Thomas: I would recommend/favour option A. Please let
> me know your feedback and/or concerns as soon as possible, if possible. :)
>
> Regards,
> Sebastian
>
>
> On Wed, Jul 13, 2022 at 9:37 AM Becket Qin <be...@gmail.com> wrote:
>
>> Hi Sebastian,
>>
>> Thanks for updating the FLIP wiki.
>>
>> Just to double confirm, I was thinking of a configuration like
>> "allow.coarse.grained.watermark.alignment". This will allow the coarse
>> grained watermark alignment as a fallback instead of bubbling up an
>> exception if split pausing is not supported in some Sources in a Flink job.
>> And this will only affect the Sources that do not support split pausing,
>> but not the Sources that have split pausing supported.
>>
>> This seems slightly different from a <knob> enables / disables split
>> alignment. This sounds like a global thing, and it seems not necessary to
>> disable the split alignment, as long as the coarse grained alignment can be
>> a fallback.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <
>> sebastian@ververica.com> wrote:
>>
>>> Hi Piotrek,
>>>
>>> Sorry I've read it and forgot it when I was ripping out the
>>> supportsPauseOrResume method again. Thanks for pointing that out. I will
>>> add it as follows: The <knob> enables/disables split alignment in the
>>> SourceOperator where the default is that split alignment is enabled. (And I
>>> will add the note: "In future releases, the <knob> may be ignored such that
>>> split alignment is always enabled.")
>>>
>>> Cheers,
>>> Sebastian
>>>
>>> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <pn...@apache.org>
>>> wrote:
>>>
>>>> Hi Sebastian,
>>>>
>>>> Thanks for picking this up.
>>>>
>>>> > 5. There is NO configuration option to enable watermark alignment of
>>>> splits or disable pause/resume capabilities.
>>>>
>>>> Isn't this contradicting what we actually agreed on?
>>>>
>>>> > we are planning to have a configuration based way to revert to the
>>>> previous behavior
>>>>
>>>> I think what we agreed in the last couple of emails was to add a
>>>> configuration toggle, that would allow Flink 1.15 users, that are using
>>>> watermark alignment with multiple splits per source operator, to
>>>> continue
>>>> using it with the old 1.15 semantic, even if their source doesn't
>>>> support
>>>> pausing/resuming splits. It seems to me like the current FLIP and
>>>> implementation proposal would always throw an exception in that case?
>>>>
>>>> Best,
>>>> Piotrek
>>>>
>>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <se...@ververica.com>
>>>> napisał(a):
>>>>
>>>> > Hi all,
>>>> >
>>>> > I have updated FLIP-217 [1] to the proposed specification and adapted
>>>> the
>>>> > current implementation [2] respectively.
>>>> >
>>>> > This means both, FLIP and implementation, are ready for review from my
>>>> > side. (I would revise commit history and messages for the final PR
>>>> but left
>>>> > it as is for now and the records of this discussion.)
>>>> >
>>>> > 1. Please review the updated version of FLIP-217 [1]. If there are no
>>>> > further concerns, I would initiate the voting.
>>>> > (2. If you want to speed up things, please also have a look into the
>>>> > updated implementation [2].)
>>>> >
>>>> > Please consider the following updated specification in the current
>>>> status
>>>> > of FLIP-217 where the essence is as follows:
>>>> >
>>>> > 1. A method pauseOrResumeSplits is added to SourceReader with default
>>>> > implementation that throws UnsupportedOperationException.
>>>> > 2.  method pauseOrResumeSplits is added to SplitReader with default
>>>> > implementation that throws UnsupportedOperationException.
>>>> > 3. SourceOperator initiates split alignment only if more than one
>>>> split is
>>>> > assigned to the source (and, of course, only if withSplitAlignment is
>>>> used).
>>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place (to
>>>> > indicate if the implementation supports pause/resume capabilities).
>>>> > 5. There is NO configuration option to enable watermark alignment of
>>>> > splits or disable pause/resume capabilities.
>>>> >
>>>> > *Note:* If the SourceReader or some SplitReader do not override
>>>> > pauseOrResumeSplits but it is required in the application, an
>>>> exception is
>>>> > thrown at runtime when an split alignment attempt is executed (not
>>>> during
>>>> > startup or any time earlier).
>>>> >
>>>> > Also, I have revised the compatibility/migration section to describe
>>>> a bit
>>>> > of a rationale for the default implementation with exception throwing
>>>> > behavior.
>>>> >
>>>> > Regards,
>>>> > Sebastian
>>>> >
>>>> > [1]
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>>> > [2]
>>>> https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>>>> >
>>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org> wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
>>>> >> Strictly speaking we are introducing a regression by allowing an
>>>> >> exception to bubble up that did not exist in the previous release,
>>>> >> regardless how suboptimal the behavior may be. However, given that
>>>> the
>>>> >> feature is still experimental and we are planning to have a
>>>> >> configuration based way to revert to the previous behavior, I think
>>>> >> this is a good solution.
>>>> >>
>>>> >> +1 from my side
>>>> >>
>>>> >> Thomas
>>>> >>
>>>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <pnowojski@apache.org
>>>> >
>>>> >> wrote:
>>>> >> >
>>>> >> > +1 :)
>>>> >> >
>>>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
>>>> napisał(a):
>>>> >> >
>>>> >> > >  Thanks for the explanation, Piotr.
>>>> >> > >
>>>> >> > > So it looks like we have a conclusion here.
>>>> >> > >
>>>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it
>>>> brings more
>>>> >> > > confusion while the benefit is marginal, so I prefer not having
>>>> that
>>>> >> if
>>>> >> > > possible. It would be good to also hear @Thomas Weise <
>>>> thw@apache.org
>>>> >> >'s
>>>> >> > > opinion as he mentioned some concern earlier.
>>>> >> > > 2. Let's add the feature knob then. In the future we can simply
>>>> >> ignore the
>>>> >> > > configuration when deprecating it.
>>>> >> > >
>>>> >> > > Thanks,
>>>> >> > >
>>>> >> > > Jiangjie (Becket) Qin
>>>> >> > >
>>>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
>>>> pnowojski@apache.org
>>>> >> >
>>>> >> > > wrote:
>>>> >> > >
>>>> >> > > > Hi,
>>>> >> > > >
>>>> >> > > > I mean I'm fine with throwing an exception by default in Flink
>>>> 1.16
>>>> >> in
>>>> >> > > the
>>>> >> > > > "Case 2", but I think we need to provide a way to workaround
>>>> it for
>>>> >> > > example
>>>> >> > > > via a feature toggle, if it's an easy thing to do. And it
>>>> seems to
>>>> >> be a
>>>> >> > > > simple thing.
>>>> >> > > >
>>>> >> > > > However this is orthogonal to the `supportsPausingSplits()`
>>>> issue. I
>>>> >> > > don't
>>>> >> > > > have a big preference whether
>>>> >> > > >   a) the exception should originate on JM, using `default
>>>> boolean
>>>> >> > > > supportsPausingSplits() { return false; }` (as currently
>>>> proposed
>>>> >> in the
>>>> >> > > > FLIP),
>>>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
>>>> >> > > > `UnsupportedOperationException` as you are proposing.
>>>> >> > > >
>>>> >> > > > a) fails earlier, so it's more user friendly from this
>>>> perspective,
>>>> >> but
>>>> >> > > it
>>>> >> > > > provides more possibilities for bugs/inconsistencies for
>>>> connector
>>>> >> > > > developers, since `supportsPausingSplits()` would have to be
>>>> kept
>>>> >> in sync
>>>> >> > > > with `pauseOrResumeSplits()`.
>>>> >> > > >
>>>> >> > > > Best,
>>>> >> > > > Piotrek
>>>> >> > > >
>>>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
>>>> >> napisał(a):
>>>> >> > > >
>>>> >> > > > > Hi Piotr,
>>>> >> > > > >
>>>> >> > > > > Just to make sure we are on the same page. There are two
>>>> cases
>>>> >> for the
>>>> >> > > > > existing FLIP-182 users:
>>>> >> > > > >
>>>> >> > > > > Case 1: Each source reader only has one split assigned. This
>>>> is
>>>> >> the
>>>> >> > > > > targeted case for FLIP-182.
>>>> >> > > > > Case 2: Each source reader has multiple splits assigned.
>>>> This is
>>>> >> the
>>>> >> > > > flaky
>>>> >> > > > > case that may or may not work.
>>>> >> > > > >
>>>> >> > > > > With solution 1, the users of case 1 won't be impacted. The
>>>> users
>>>> >> in
>>>> >> > > > case 2
>>>> >> > > > > will receive an exception which they won't get at the moment.
>>>> >> > > > >
>>>> >> > > > > Do you mean we should not throw an exception in case 2?
>>>> >> Personally I
>>>> >> > > feel
>>>> >> > > > > that is OK and could have been done in FLIP-182 itself
>>>> because
>>>> >> it's
>>>> >> > > not a
>>>> >> > > > > designed use case. As a user I may see a big variation of
>>>> the job
>>>> >> state
>>>> >> > > > > sizes from time to time and I am not able to rely on this
>>>> feature
>>>> >> to
>>>> >> > > plan
>>>> >> > > > > my resources and uphold the SLA.
>>>> >> > > > >
>>>> >> > > > > That said, if you have a strong opinion on this, I am fine
>>>> with
>>>> >> having
>>>> >> > > > the
>>>> >> > > > > configuration like "allow.coarse-grained.watermark.alignment"
>>>> >> with the
>>>> >> > > > > default value set to false, given that a configuration is
>>>> much
>>>> >> easier
>>>> >> > > to
>>>> >> > > > > deprecate than a method.
>>>> >> > > > >
>>>> >> > > > > Thanks,
>>>> >> > > > >
>>>> >> > > > > Jiangjie (Becket) Qin
>>>> >> > > > >
>>>> >> > > > >
>>>> >>
>>>> >
>>>>
>>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Sebastian Mattheis <se...@ververica.com>.
Hi Becket, Hi Thomas, Hi Piotrek,

Thanks for the feedback. I would like to highlight some concerns:

   1. Major: A configuration parameter like "allow coarse grained
   alignment" defines a semantic that mixes two contexts conditionally as
   follows: "ignore incapability to pause splits in SourceReader/SplitReader"
   IF (conditional) we "allow coarse grained watermark alignment". At the same
   time we said that there is no way to check the capability of
   SourceReader/SplitReader to pause/resume other than observing a
   UnsupportedOperationException during runtime such that we cannot disable
   the trigger for watermark split alignment in the SourceOperator. Instead,
   we can only ignore the incapability of SourceReader/SplitReader during
   execution of a pause/resume attempt which, consequently, requires to check
   the "allow coarse grained alignment " parameter value (to implement the
   conditional semantic). However, during this execution we actually don't
   know whether the attempt was executed for the purpose of watermark
   alignment or for some other purpose such that the check actually depends on
   who triggered the pause/resume attempt and hides the exception potentially
   unexpectedly for some other use case. Of course, currently there is no
   other purpose and, hence, no other trigger than watermark alignment.
   However, this breaks, in my perspective, the idea of having
   pauseOrResumeSplits (re)usable for other use cases.
   2. Minor: I'm not aware of any configuration parameter in the format
   like `allow.*` as you suggested with
   `allow.coarse.grained.watermark.alignment`. Would that still be okay to do?

As we have agreed to not have a "supportsPausableSplits" method because of
potential inconsistencies between return value of this method and the
actual implementation (and also the difficulty to have a meaningful return
value where the support actually depends on SourceReader AND the assigned
SplitReaders), I don't want to bring up the discussion about the
"supportsPauseableSplits" method again. Instead, I see the following
options:

Option A: I would drop the idea of "allow coarse grained alignment"
semantic of the parameter but implement a parameter to "enable/disable
split watermark alignment" which we can easily use in the SourceOperator to
disable the trigger of split alignment. This is indeed more static and less
flexible, because it disables split alignment unconditionally, but it is
"context-decoupled" and more straight-forward to use. This would also
address the use case of disabling split alignment for the purpose of
runtime behavior evaluation, as mentioned by Thomas (if I remember
correctly.) I would implement the parameter with a default where watermark
split alignment is enabled which requires users to check their application
when upgrading to 1.16 if a) there is a source that reads from multiple
splits and b), if yes, all splits of that source support pause/resume. If
a) yes and b) no, the user must take action to disable watermark split
alignment (which disables the trigger of split alignment only for the
purpose).

Option B: If we ignore my concern, I would simply check the "allow coarse
grained watermark alignment" parameter value on every attempt to execute
pause/resume in the SourceReader and in the SplitReader and will not throw
UnsupportedOperationException if the parameter value is set to true.

Please note that the parameter is also used only for some kind of migration
phase. Therefore, I wonder if we need to overcomplicate things.

@Piotrek, @Becket, @Thomas: I would recommend/favour option A. Please let
me know your feedback and/or concerns as soon as possible, if possible. :)

Regards,
Sebastian


On Wed, Jul 13, 2022 at 9:37 AM Becket Qin <be...@gmail.com> wrote:

> Hi Sebastian,
>
> Thanks for updating the FLIP wiki.
>
> Just to double confirm, I was thinking of a configuration like
> "allow.coarse.grained.watermark.alignment". This will allow the coarse
> grained watermark alignment as a fallback instead of bubbling up an
> exception if split pausing is not supported in some Sources in a Flink job.
> And this will only affect the Sources that do not support split pausing,
> but not the Sources that have split pausing supported.
>
> This seems slightly different from a <knob> enables / disables split
> alignment. This sounds like a global thing, and it seems not necessary to
> disable the split alignment, as long as the coarse grained alignment can be
> a fallback.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <
> sebastian@ververica.com> wrote:
>
>> Hi Piotrek,
>>
>> Sorry I've read it and forgot it when I was ripping out the
>> supportsPauseOrResume method again. Thanks for pointing that out. I will
>> add it as follows: The <knob> enables/disables split alignment in the
>> SourceOperator where the default is that split alignment is enabled. (And I
>> will add the note: "In future releases, the <knob> may be ignored such that
>> split alignment is always enabled.")
>>
>> Cheers,
>> Sebastian
>>
>> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <pn...@apache.org>
>> wrote:
>>
>>> Hi Sebastian,
>>>
>>> Thanks for picking this up.
>>>
>>> > 5. There is NO configuration option to enable watermark alignment of
>>> splits or disable pause/resume capabilities.
>>>
>>> Isn't this contradicting what we actually agreed on?
>>>
>>> > we are planning to have a configuration based way to revert to the
>>> previous behavior
>>>
>>> I think what we agreed in the last couple of emails was to add a
>>> configuration toggle, that would allow Flink 1.15 users, that are using
>>> watermark alignment with multiple splits per source operator, to continue
>>> using it with the old 1.15 semantic, even if their source doesn't support
>>> pausing/resuming splits. It seems to me like the current FLIP and
>>> implementation proposal would always throw an exception in that case?
>>>
>>> Best,
>>> Piotrek
>>>
>>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <se...@ververica.com>
>>> napisał(a):
>>>
>>> > Hi all,
>>> >
>>> > I have updated FLIP-217 [1] to the proposed specification and adapted
>>> the
>>> > current implementation [2] respectively.
>>> >
>>> > This means both, FLIP and implementation, are ready for review from my
>>> > side. (I would revise commit history and messages for the final PR but
>>> left
>>> > it as is for now and the records of this discussion.)
>>> >
>>> > 1. Please review the updated version of FLIP-217 [1]. If there are no
>>> > further concerns, I would initiate the voting.
>>> > (2. If you want to speed up things, please also have a look into the
>>> > updated implementation [2].)
>>> >
>>> > Please consider the following updated specification in the current
>>> status
>>> > of FLIP-217 where the essence is as follows:
>>> >
>>> > 1. A method pauseOrResumeSplits is added to SourceReader with default
>>> > implementation that throws UnsupportedOperationException.
>>> > 2.  method pauseOrResumeSplits is added to SplitReader with default
>>> > implementation that throws UnsupportedOperationException.
>>> > 3. SourceOperator initiates split alignment only if more than one
>>> split is
>>> > assigned to the source (and, of course, only if withSplitAlignment is
>>> used).
>>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place (to
>>> > indicate if the implementation supports pause/resume capabilities).
>>> > 5. There is NO configuration option to enable watermark alignment of
>>> > splits or disable pause/resume capabilities.
>>> >
>>> > *Note:* If the SourceReader or some SplitReader do not override
>>> > pauseOrResumeSplits but it is required in the application, an
>>> exception is
>>> > thrown at runtime when an split alignment attempt is executed (not
>>> during
>>> > startup or any time earlier).
>>> >
>>> > Also, I have revised the compatibility/migration section to describe a
>>> bit
>>> > of a rationale for the default implementation with exception throwing
>>> > behavior.
>>> >
>>> > Regards,
>>> > Sebastian
>>> >
>>> > [1]
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>> > [2]
>>> https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>>> >
>>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
>>> >> Strictly speaking we are introducing a regression by allowing an
>>> >> exception to bubble up that did not exist in the previous release,
>>> >> regardless how suboptimal the behavior may be. However, given that the
>>> >> feature is still experimental and we are planning to have a
>>> >> configuration based way to revert to the previous behavior, I think
>>> >> this is a good solution.
>>> >>
>>> >> +1 from my side
>>> >>
>>> >> Thomas
>>> >>
>>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <pn...@apache.org>
>>> >> wrote:
>>> >> >
>>> >> > +1 :)
>>> >> >
>>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
>>> napisał(a):
>>> >> >
>>> >> > >  Thanks for the explanation, Piotr.
>>> >> > >
>>> >> > > So it looks like we have a conclusion here.
>>> >> > >
>>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it brings
>>> more
>>> >> > > confusion while the benefit is marginal, so I prefer not having
>>> that
>>> >> if
>>> >> > > possible. It would be good to also hear @Thomas Weise <
>>> thw@apache.org
>>> >> >'s
>>> >> > > opinion as he mentioned some concern earlier.
>>> >> > > 2. Let's add the feature knob then. In the future we can simply
>>> >> ignore the
>>> >> > > configuration when deprecating it.
>>> >> > >
>>> >> > > Thanks,
>>> >> > >
>>> >> > > Jiangjie (Becket) Qin
>>> >> > >
>>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
>>> pnowojski@apache.org
>>> >> >
>>> >> > > wrote:
>>> >> > >
>>> >> > > > Hi,
>>> >> > > >
>>> >> > > > I mean I'm fine with throwing an exception by default in Flink
>>> 1.16
>>> >> in
>>> >> > > the
>>> >> > > > "Case 2", but I think we need to provide a way to workaround it
>>> for
>>> >> > > example
>>> >> > > > via a feature toggle, if it's an easy thing to do. And it seems
>>> to
>>> >> be a
>>> >> > > > simple thing.
>>> >> > > >
>>> >> > > > However this is orthogonal to the `supportsPausingSplits()`
>>> issue. I
>>> >> > > don't
>>> >> > > > have a big preference whether
>>> >> > > >   a) the exception should originate on JM, using `default
>>> boolean
>>> >> > > > supportsPausingSplits() { return false; }` (as currently
>>> proposed
>>> >> in the
>>> >> > > > FLIP),
>>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
>>> >> > > > `UnsupportedOperationException` as you are proposing.
>>> >> > > >
>>> >> > > > a) fails earlier, so it's more user friendly from this
>>> perspective,
>>> >> but
>>> >> > > it
>>> >> > > > provides more possibilities for bugs/inconsistencies for
>>> connector
>>> >> > > > developers, since `supportsPausingSplits()` would have to be
>>> kept
>>> >> in sync
>>> >> > > > with `pauseOrResumeSplits()`.
>>> >> > > >
>>> >> > > > Best,
>>> >> > > > Piotrek
>>> >> > > >
>>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
>>> >> napisał(a):
>>> >> > > >
>>> >> > > > > Hi Piotr,
>>> >> > > > >
>>> >> > > > > Just to make sure we are on the same page. There are two cases
>>> >> for the
>>> >> > > > > existing FLIP-182 users:
>>> >> > > > >
>>> >> > > > > Case 1: Each source reader only has one split assigned. This
>>> is
>>> >> the
>>> >> > > > > targeted case for FLIP-182.
>>> >> > > > > Case 2: Each source reader has multiple splits assigned. This
>>> is
>>> >> the
>>> >> > > > flaky
>>> >> > > > > case that may or may not work.
>>> >> > > > >
>>> >> > > > > With solution 1, the users of case 1 won't be impacted. The
>>> users
>>> >> in
>>> >> > > > case 2
>>> >> > > > > will receive an exception which they won't get at the moment.
>>> >> > > > >
>>> >> > > > > Do you mean we should not throw an exception in case 2?
>>> >> Personally I
>>> >> > > feel
>>> >> > > > > that is OK and could have been done in FLIP-182 itself because
>>> >> it's
>>> >> > > not a
>>> >> > > > > designed use case. As a user I may see a big variation of the
>>> job
>>> >> state
>>> >> > > > > sizes from time to time and I am not able to rely on this
>>> feature
>>> >> to
>>> >> > > plan
>>> >> > > > > my resources and uphold the SLA.
>>> >> > > > >
>>> >> > > > > That said, if you have a strong opinion on this, I am fine
>>> with
>>> >> having
>>> >> > > > the
>>> >> > > > > configuration like "allow.coarse-grained.watermark.alignment"
>>> >> with the
>>> >> > > > > default value set to false, given that a configuration is much
>>> >> easier
>>> >> > > to
>>> >> > > > > deprecate than a method.
>>> >> > > > >
>>> >> > > > > Thanks,
>>> >> > > > >
>>> >> > > > > Jiangjie (Becket) Qin
>>> >> > > > >
>>> >> > > > >
>>> >>
>>> >
>>>
>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Hi Sebastian,

Thanks for updating the FLIP wiki.

Just to double confirm, I was thinking of a configuration like
"allow.coarse.grained.watermark.alignment". This will allow the coarse
grained watermark alignment as a fallback instead of bubbling up an
exception if split pausing is not supported in some Sources in a Flink job.
And this will only affect the Sources that do not support split pausing,
but not the Sources that have split pausing supported.

This seems slightly different from a <knob> enables / disables split
alignment. This sounds like a global thing, and it seems not necessary to
disable the split alignment, as long as the coarse grained alignment can be
a fallback.

Thanks,

Jiangjie (Becket) Qin

On Wed, Jul 13, 2022 at 2:46 PM Sebastian Mattheis <se...@ververica.com>
wrote:

> Hi Piotrek,
>
> Sorry I've read it and forgot it when I was ripping out the
> supportsPauseOrResume method again. Thanks for pointing that out. I will
> add it as follows: The <knob> enables/disables split alignment in the
> SourceOperator where the default is that split alignment is enabled. (And I
> will add the note: "In future releases, the <knob> may be ignored such that
> split alignment is always enabled.")
>
> Cheers,
> Sebastian
>
> On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
>> Hi Sebastian,
>>
>> Thanks for picking this up.
>>
>> > 5. There is NO configuration option to enable watermark alignment of
>> splits or disable pause/resume capabilities.
>>
>> Isn't this contradicting what we actually agreed on?
>>
>> > we are planning to have a configuration based way to revert to the
>> previous behavior
>>
>> I think what we agreed in the last couple of emails was to add a
>> configuration toggle, that would allow Flink 1.15 users, that are using
>> watermark alignment with multiple splits per source operator, to continue
>> using it with the old 1.15 semantic, even if their source doesn't support
>> pausing/resuming splits. It seems to me like the current FLIP and
>> implementation proposal would always throw an exception in that case?
>>
>> Best,
>> Piotrek
>>
>> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <se...@ververica.com>
>> napisał(a):
>>
>> > Hi all,
>> >
>> > I have updated FLIP-217 [1] to the proposed specification and adapted
>> the
>> > current implementation [2] respectively.
>> >
>> > This means both, FLIP and implementation, are ready for review from my
>> > side. (I would revise commit history and messages for the final PR but
>> left
>> > it as is for now and the records of this discussion.)
>> >
>> > 1. Please review the updated version of FLIP-217 [1]. If there are no
>> > further concerns, I would initiate the voting.
>> > (2. If you want to speed up things, please also have a look into the
>> > updated implementation [2].)
>> >
>> > Please consider the following updated specification in the current
>> status
>> > of FLIP-217 where the essence is as follows:
>> >
>> > 1. A method pauseOrResumeSplits is added to SourceReader with default
>> > implementation that throws UnsupportedOperationException.
>> > 2.  method pauseOrResumeSplits is added to SplitReader with default
>> > implementation that throws UnsupportedOperationException.
>> > 3. SourceOperator initiates split alignment only if more than one split
>> is
>> > assigned to the source (and, of course, only if withSplitAlignment is
>> used).
>> > 4. There is NO "supportsPauseOrResumeSplits" method at any place (to
>> > indicate if the implementation supports pause/resume capabilities).
>> > 5. There is NO configuration option to enable watermark alignment of
>> > splits or disable pause/resume capabilities.
>> >
>> > *Note:* If the SourceReader or some SplitReader do not override
>> > pauseOrResumeSplits but it is required in the application, an exception
>> is
>> > thrown at runtime when an split alignment attempt is executed (not
>> during
>> > startup or any time earlier).
>> >
>> > Also, I have revised the compatibility/migration section to describe a
>> bit
>> > of a rationale for the default implementation with exception throwing
>> > behavior.
>> >
>> > Regards,
>> > Sebastian
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>> > [2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>> >
>> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org> wrote:
>> >
>> >> Hi,
>> >>
>> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
>> >> Strictly speaking we are introducing a regression by allowing an
>> >> exception to bubble up that did not exist in the previous release,
>> >> regardless how suboptimal the behavior may be. However, given that the
>> >> feature is still experimental and we are planning to have a
>> >> configuration based way to revert to the previous behavior, I think
>> >> this is a good solution.
>> >>
>> >> +1 from my side
>> >>
>> >> Thomas
>> >>
>> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <pn...@apache.org>
>> >> wrote:
>> >> >
>> >> > +1 :)
>> >> >
>> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
>> napisał(a):
>> >> >
>> >> > >  Thanks for the explanation, Piotr.
>> >> > >
>> >> > > So it looks like we have a conclusion here.
>> >> > >
>> >> > > 1. Regarding the supportsPausingSplits() method, I feel it brings
>> more
>> >> > > confusion while the benefit is marginal, so I prefer not having
>> that
>> >> if
>> >> > > possible. It would be good to also hear @Thomas Weise <
>> thw@apache.org
>> >> >'s
>> >> > > opinion as he mentioned some concern earlier.
>> >> > > 2. Let's add the feature knob then. In the future we can simply
>> >> ignore the
>> >> > > configuration when deprecating it.
>> >> > >
>> >> > > Thanks,
>> >> > >
>> >> > > Jiangjie (Becket) Qin
>> >> > >
>> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
>> pnowojski@apache.org
>> >> >
>> >> > > wrote:
>> >> > >
>> >> > > > Hi,
>> >> > > >
>> >> > > > I mean I'm fine with throwing an exception by default in Flink
>> 1.16
>> >> in
>> >> > > the
>> >> > > > "Case 2", but I think we need to provide a way to workaround it
>> for
>> >> > > example
>> >> > > > via a feature toggle, if it's an easy thing to do. And it seems
>> to
>> >> be a
>> >> > > > simple thing.
>> >> > > >
>> >> > > > However this is orthogonal to the `supportsPausingSplits()`
>> issue. I
>> >> > > don't
>> >> > > > have a big preference whether
>> >> > > >   a) the exception should originate on JM, using `default boolean
>> >> > > > supportsPausingSplits() { return false; }` (as currently proposed
>> >> in the
>> >> > > > FLIP),
>> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
>> >> > > > `UnsupportedOperationException` as you are proposing.
>> >> > > >
>> >> > > > a) fails earlier, so it's more user friendly from this
>> perspective,
>> >> but
>> >> > > it
>> >> > > > provides more possibilities for bugs/inconsistencies for
>> connector
>> >> > > > developers, since `supportsPausingSplits()` would have to be kept
>> >> in sync
>> >> > > > with `pauseOrResumeSplits()`.
>> >> > > >
>> >> > > > Best,
>> >> > > > Piotrek
>> >> > > >
>> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
>> >> napisał(a):
>> >> > > >
>> >> > > > > Hi Piotr,
>> >> > > > >
>> >> > > > > Just to make sure we are on the same page. There are two cases
>> >> for the
>> >> > > > > existing FLIP-182 users:
>> >> > > > >
>> >> > > > > Case 1: Each source reader only has one split assigned. This is
>> >> the
>> >> > > > > targeted case for FLIP-182.
>> >> > > > > Case 2: Each source reader has multiple splits assigned. This
>> is
>> >> the
>> >> > > > flaky
>> >> > > > > case that may or may not work.
>> >> > > > >
>> >> > > > > With solution 1, the users of case 1 won't be impacted. The
>> users
>> >> in
>> >> > > > case 2
>> >> > > > > will receive an exception which they won't get at the moment.
>> >> > > > >
>> >> > > > > Do you mean we should not throw an exception in case 2?
>> >> Personally I
>> >> > > feel
>> >> > > > > that is OK and could have been done in FLIP-182 itself because
>> >> it's
>> >> > > not a
>> >> > > > > designed use case. As a user I may see a big variation of the
>> job
>> >> state
>> >> > > > > sizes from time to time and I am not able to rely on this
>> feature
>> >> to
>> >> > > plan
>> >> > > > > my resources and uphold the SLA.
>> >> > > > >
>> >> > > > > That said, if you have a strong opinion on this, I am fine with
>> >> having
>> >> > > > the
>> >> > > > > configuration like "allow.coarse-grained.watermark.alignment"
>> >> with the
>> >> > > > > default value set to false, given that a configuration is much
>> >> easier
>> >> > > to
>> >> > > > > deprecate than a method.
>> >> > > > >
>> >> > > > > Thanks,
>> >> > > > >
>> >> > > > > Jiangjie (Becket) Qin
>> >> > > > >
>> >> > > > >
>> >>
>> >
>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Sebastian Mattheis <se...@ververica.com>.
Hi Piotrek,

Sorry I've read it and forgot it when I was ripping out the
supportsPauseOrResume method again. Thanks for pointing that out. I will
add it as follows: The <knob> enables/disables split alignment in the
SourceOperator where the default is that split alignment is enabled. (And I
will add the note: "In future releases, the <knob> may be ignored such that
split alignment is always enabled.")

Cheers,
Sebastian

On Tue, Jul 12, 2022 at 11:14 PM Piotr Nowojski <pn...@apache.org>
wrote:

> Hi Sebastian,
>
> Thanks for picking this up.
>
> > 5. There is NO configuration option to enable watermark alignment of
> splits or disable pause/resume capabilities.
>
> Isn't this contradicting what we actually agreed on?
>
> > we are planning to have a configuration based way to revert to the
> previous behavior
>
> I think what we agreed in the last couple of emails was to add a
> configuration toggle, that would allow Flink 1.15 users, that are using
> watermark alignment with multiple splits per source operator, to continue
> using it with the old 1.15 semantic, even if their source doesn't support
> pausing/resuming splits. It seems to me like the current FLIP and
> implementation proposal would always throw an exception in that case?
>
> Best,
> Piotrek
>
> wt., 12 lip 2022 o 10:18 Sebastian Mattheis <se...@ververica.com>
> napisał(a):
>
> > Hi all,
> >
> > I have updated FLIP-217 [1] to the proposed specification and adapted the
> > current implementation [2] respectively.
> >
> > This means both, FLIP and implementation, are ready for review from my
> > side. (I would revise commit history and messages for the final PR but
> left
> > it as is for now and the records of this discussion.)
> >
> > 1. Please review the updated version of FLIP-217 [1]. If there are no
> > further concerns, I would initiate the voting.
> > (2. If you want to speed up things, please also have a look into the
> > updated implementation [2].)
> >
> > Please consider the following updated specification in the current status
> > of FLIP-217 where the essence is as follows:
> >
> > 1. A method pauseOrResumeSplits is added to SourceReader with default
> > implementation that throws UnsupportedOperationException.
> > 2.  method pauseOrResumeSplits is added to SplitReader with default
> > implementation that throws UnsupportedOperationException.
> > 3. SourceOperator initiates split alignment only if more than one split
> is
> > assigned to the source (and, of course, only if withSplitAlignment is
> used).
> > 4. There is NO "supportsPauseOrResumeSplits" method at any place (to
> > indicate if the implementation supports pause/resume capabilities).
> > 5. There is NO configuration option to enable watermark alignment of
> > splits or disable pause/resume capabilities.
> >
> > *Note:* If the SourceReader or some SplitReader do not override
> > pauseOrResumeSplits but it is required in the application, an exception
> is
> > thrown at runtime when an split alignment attempt is executed (not during
> > startup or any time earlier).
> >
> > Also, I have revised the compatibility/migration section to describe a
> bit
> > of a rationale for the default implementation with exception throwing
> > behavior.
> >
> > Regards,
> > Sebastian
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > [2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
> >
> > On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org> wrote:
> >
> >> Hi,
> >>
> >> Thank you Becket and Piotr for ironing out the "case 2" behavior.
> >> Strictly speaking we are introducing a regression by allowing an
> >> exception to bubble up that did not exist in the previous release,
> >> regardless how suboptimal the behavior may be. However, given that the
> >> feature is still experimental and we are planning to have a
> >> configuration based way to revert to the previous behavior, I think
> >> this is a good solution.
> >>
> >> +1 from my side
> >>
> >> Thomas
> >>
> >> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <pn...@apache.org>
> >> wrote:
> >> >
> >> > +1 :)
> >> >
> >> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com>
> napisał(a):
> >> >
> >> > >  Thanks for the explanation, Piotr.
> >> > >
> >> > > So it looks like we have a conclusion here.
> >> > >
> >> > > 1. Regarding the supportsPausingSplits() method, I feel it brings
> more
> >> > > confusion while the benefit is marginal, so I prefer not having that
> >> if
> >> > > possible. It would be good to also hear @Thomas Weise <
> thw@apache.org
> >> >'s
> >> > > opinion as he mentioned some concern earlier.
> >> > > 2. Let's add the feature knob then. In the future we can simply
> >> ignore the
> >> > > configuration when deprecating it.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <
> pnowojski@apache.org
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I mean I'm fine with throwing an exception by default in Flink
> 1.16
> >> in
> >> > > the
> >> > > > "Case 2", but I think we need to provide a way to workaround it
> for
> >> > > example
> >> > > > via a feature toggle, if it's an easy thing to do. And it seems to
> >> be a
> >> > > > simple thing.
> >> > > >
> >> > > > However this is orthogonal to the `supportsPausingSplits()`
> issue. I
> >> > > don't
> >> > > > have a big preference whether
> >> > > >   a) the exception should originate on JM, using `default boolean
> >> > > > supportsPausingSplits() { return false; }` (as currently proposed
> >> in the
> >> > > > FLIP),
> >> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
> >> > > > `UnsupportedOperationException` as you are proposing.
> >> > > >
> >> > > > a) fails earlier, so it's more user friendly from this
> perspective,
> >> but
> >> > > it
> >> > > > provides more possibilities for bugs/inconsistencies for connector
> >> > > > developers, since `supportsPausingSplits()` would have to be kept
> >> in sync
> >> > > > with `pauseOrResumeSplits()`.
> >> > > >
> >> > > > Best,
> >> > > > Piotrek
> >> > > >
> >> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
> >> napisał(a):
> >> > > >
> >> > > > > Hi Piotr,
> >> > > > >
> >> > > > > Just to make sure we are on the same page. There are two cases
> >> for the
> >> > > > > existing FLIP-182 users:
> >> > > > >
> >> > > > > Case 1: Each source reader only has one split assigned. This is
> >> the
> >> > > > > targeted case for FLIP-182.
> >> > > > > Case 2: Each source reader has multiple splits assigned. This is
> >> the
> >> > > > flaky
> >> > > > > case that may or may not work.
> >> > > > >
> >> > > > > With solution 1, the users of case 1 won't be impacted. The
> users
> >> in
> >> > > > case 2
> >> > > > > will receive an exception which they won't get at the moment.
> >> > > > >
> >> > > > > Do you mean we should not throw an exception in case 2?
> >> Personally I
> >> > > feel
> >> > > > > that is OK and could have been done in FLIP-182 itself because
> >> it's
> >> > > not a
> >> > > > > designed use case. As a user I may see a big variation of the
> job
> >> state
> >> > > > > sizes from time to time and I am not able to rely on this
> feature
> >> to
> >> > > plan
> >> > > > > my resources and uphold the SLA.
> >> > > > >
> >> > > > > That said, if you have a strong opinion on this, I am fine with
> >> having
> >> > > > the
> >> > > > > configuration like "allow.coarse-grained.watermark.alignment"
> >> with the
> >> > > > > default value set to false, given that a configuration is much
> >> easier
> >> > > to
> >> > > > > deprecate than a method.
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Jiangjie (Becket) Qin
> >> > > > >
> >> > > > >
> >>
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi Sebastian,

Thanks for picking this up.

> 5. There is NO configuration option to enable watermark alignment of
splits or disable pause/resume capabilities.

Isn't this contradicting what we actually agreed on?

> we are planning to have a configuration based way to revert to the
previous behavior

I think what we agreed in the last couple of emails was to add a
configuration toggle, that would allow Flink 1.15 users, that are using
watermark alignment with multiple splits per source operator, to continue
using it with the old 1.15 semantic, even if their source doesn't support
pausing/resuming splits. It seems to me like the current FLIP and
implementation proposal would always throw an exception in that case?

Best,
Piotrek

wt., 12 lip 2022 o 10:18 Sebastian Mattheis <se...@ververica.com>
napisał(a):

> Hi all,
>
> I have updated FLIP-217 [1] to the proposed specification and adapted the
> current implementation [2] respectively.
>
> This means both, FLIP and implementation, are ready for review from my
> side. (I would revise commit history and messages for the final PR but left
> it as is for now and the records of this discussion.)
>
> 1. Please review the updated version of FLIP-217 [1]. If there are no
> further concerns, I would initiate the voting.
> (2. If you want to speed up things, please also have a look into the
> updated implementation [2].)
>
> Please consider the following updated specification in the current status
> of FLIP-217 where the essence is as follows:
>
> 1. A method pauseOrResumeSplits is added to SourceReader with default
> implementation that throws UnsupportedOperationException.
> 2.  method pauseOrResumeSplits is added to SplitReader with default
> implementation that throws UnsupportedOperationException.
> 3. SourceOperator initiates split alignment only if more than one split is
> assigned to the source (and, of course, only if withSplitAlignment is used).
> 4. There is NO "supportsPauseOrResumeSplits" method at any place (to
> indicate if the implementation supports pause/resume capabilities).
> 5. There is NO configuration option to enable watermark alignment of
> splits or disable pause/resume capabilities.
>
> *Note:* If the SourceReader or some SplitReader do not override
> pauseOrResumeSplits but it is required in the application, an exception is
> thrown at runtime when an split alignment attempt is executed (not during
> startup or any time earlier).
>
> Also, I have revised the compatibility/migration section to describe a bit
> of a rationale for the default implementation with exception throwing
> behavior.
>
> Regards,
> Sebastian
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> [2] https://github.com/smattheis/flink/tree/flip-217-split-wm-alignment
>
> On Mon, Jul 4, 2022 at 3:59 AM Thomas Weise <th...@apache.org> wrote:
>
>> Hi,
>>
>> Thank you Becket and Piotr for ironing out the "case 2" behavior.
>> Strictly speaking we are introducing a regression by allowing an
>> exception to bubble up that did not exist in the previous release,
>> regardless how suboptimal the behavior may be. However, given that the
>> feature is still experimental and we are planning to have a
>> configuration based way to revert to the previous behavior, I think
>> this is a good solution.
>>
>> +1 from my side
>>
>> Thomas
>>
>> On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <pn...@apache.org>
>> wrote:
>> >
>> > +1 :)
>> >
>> > śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com> napisał(a):
>> >
>> > >  Thanks for the explanation, Piotr.
>> > >
>> > > So it looks like we have a conclusion here.
>> > >
>> > > 1. Regarding the supportsPausingSplits() method, I feel it brings more
>> > > confusion while the benefit is marginal, so I prefer not having that
>> if
>> > > possible. It would be good to also hear @Thomas Weise <thw@apache.org
>> >'s
>> > > opinion as he mentioned some concern earlier.
>> > > 2. Let's add the feature knob then. In the future we can simply
>> ignore the
>> > > configuration when deprecating it.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <pnowojski@apache.org
>> >
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I mean I'm fine with throwing an exception by default in Flink 1.16
>> in
>> > > the
>> > > > "Case 2", but I think we need to provide a way to workaround it for
>> > > example
>> > > > via a feature toggle, if it's an easy thing to do. And it seems to
>> be a
>> > > > simple thing.
>> > > >
>> > > > However this is orthogonal to the `supportsPausingSplits()` issue. I
>> > > don't
>> > > > have a big preference whether
>> > > >   a) the exception should originate on JM, using `default boolean
>> > > > supportsPausingSplits() { return false; }` (as currently proposed
>> in the
>> > > > FLIP),
>> > > >   b) or on the TM from `pauseOrResumeSplits()` throwing
>> > > > `UnsupportedOperationException` as you are proposing.
>> > > >
>> > > > a) fails earlier, so it's more user friendly from this perspective,
>> but
>> > > it
>> > > > provides more possibilities for bugs/inconsistencies for connector
>> > > > developers, since `supportsPausingSplits()` would have to be kept
>> in sync
>> > > > with `pauseOrResumeSplits()`.
>> > > >
>> > > > Best,
>> > > > Piotrek
>> > > >
>> > > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com>
>> napisał(a):
>> > > >
>> > > > > Hi Piotr,
>> > > > >
>> > > > > Just to make sure we are on the same page. There are two cases
>> for the
>> > > > > existing FLIP-182 users:
>> > > > >
>> > > > > Case 1: Each source reader only has one split assigned. This is
>> the
>> > > > > targeted case for FLIP-182.
>> > > > > Case 2: Each source reader has multiple splits assigned. This is
>> the
>> > > > flaky
>> > > > > case that may or may not work.
>> > > > >
>> > > > > With solution 1, the users of case 1 won't be impacted. The users
>> in
>> > > > case 2
>> > > > > will receive an exception which they won't get at the moment.
>> > > > >
>> > > > > Do you mean we should not throw an exception in case 2?
>> Personally I
>> > > feel
>> > > > > that is OK and could have been done in FLIP-182 itself because
>> it's
>> > > not a
>> > > > > designed use case. As a user I may see a big variation of the job
>> state
>> > > > > sizes from time to time and I am not able to rely on this feature
>> to
>> > > plan
>> > > > > my resources and uphold the SLA.
>> > > > >
>> > > > > That said, if you have a strong opinion on this, I am fine with
>> having
>> > > > the
>> > > > > configuration like "allow.coarse-grained.watermark.alignment"
>> with the
>> > > > > default value set to false, given that a configuration is much
>> easier
>> > > to
>> > > > > deprecate than a method.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jiangjie (Becket) Qin
>> > > > >
>> > > > >
>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Thomas Weise <th...@apache.org>.
Hi,

Thank you Becket and Piotr for ironing out the "case 2" behavior.
Strictly speaking we are introducing a regression by allowing an
exception to bubble up that did not exist in the previous release,
regardless how suboptimal the behavior may be. However, given that the
feature is still experimental and we are planning to have a
configuration based way to revert to the previous behavior, I think
this is a good solution.

+1 from my side

Thomas

On Wed, Jun 29, 2022 at 2:43 PM Piotr Nowojski <pn...@apache.org> wrote:
>
> +1 :)
>
> śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com> napisał(a):
>
> >  Thanks for the explanation, Piotr.
> >
> > So it looks like we have a conclusion here.
> >
> > 1. Regarding the supportsPausingSplits() method, I feel it brings more
> > confusion while the benefit is marginal, so I prefer not having that if
> > possible. It would be good to also hear @Thomas Weise <th...@apache.org>'s
> > opinion as he mentioned some concern earlier.
> > 2. Let's add the feature knob then. In the future we can simply ignore the
> > configuration when deprecating it.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > I mean I'm fine with throwing an exception by default in Flink 1.16 in
> > the
> > > "Case 2", but I think we need to provide a way to workaround it for
> > example
> > > via a feature toggle, if it's an easy thing to do. And it seems to be a
> > > simple thing.
> > >
> > > However this is orthogonal to the `supportsPausingSplits()` issue. I
> > don't
> > > have a big preference whether
> > >   a) the exception should originate on JM, using `default boolean
> > > supportsPausingSplits() { return false; }` (as currently proposed in the
> > > FLIP),
> > >   b) or on the TM from `pauseOrResumeSplits()` throwing
> > > `UnsupportedOperationException` as you are proposing.
> > >
> > > a) fails earlier, so it's more user friendly from this perspective, but
> > it
> > > provides more possibilities for bugs/inconsistencies for connector
> > > developers, since `supportsPausingSplits()` would have to be kept in sync
> > > with `pauseOrResumeSplits()`.
> > >
> > > Best,
> > > Piotrek
> > >
> > > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com> napisał(a):
> > >
> > > > Hi Piotr,
> > > >
> > > > Just to make sure we are on the same page. There are two cases for the
> > > > existing FLIP-182 users:
> > > >
> > > > Case 1: Each source reader only has one split assigned. This is the
> > > > targeted case for FLIP-182.
> > > > Case 2: Each source reader has multiple splits assigned. This is the
> > > flaky
> > > > case that may or may not work.
> > > >
> > > > With solution 1, the users of case 1 won't be impacted. The users in
> > > case 2
> > > > will receive an exception which they won't get at the moment.
> > > >
> > > > Do you mean we should not throw an exception in case 2? Personally I
> > feel
> > > > that is OK and could have been done in FLIP-182 itself because it's
> > not a
> > > > designed use case. As a user I may see a big variation of the job state
> > > > sizes from time to time and I am not able to rely on this feature to
> > plan
> > > > my resources and uphold the SLA.
> > > >
> > > > That said, if you have a strong opinion on this, I am fine with having
> > > the
> > > > configuration like "allow.coarse-grained.watermark.alignment" with the
> > > > default value set to false, given that a configuration is much easier
> > to
> > > > deprecate than a method.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > > On Wed, Jun 29, 2022 at 8:02 PM Piotr Nowojski <pn...@apache.org>
> > > > wrote:
> > > >
> > > > > Thanks for the explanation.
> > > > >
> > > > > > 2. It is fully compatible with FLIP-182, if we consider it as the
> > > right
> > > > > > thing to throw an exception for readers reading from multiple
> > splits
> > > > > > without supporting split pausing.
> > > > >
> > > > > I think that's fine. But the question is should we provide a
> > workaround
> > > > for
> > > > > existing users? IMO if it's easy to do, we should.
> > > > >
> > > > > > I actually think neither solution 1 or 2 breaks FLIP-182 users,
> > > > >
> > > > > They do. User has currently a working Flink 1.15 deployment, where
> > > > > watermark alignment maybe is not behaving ideally, but it's working
> > to
> > > > some
> > > > > extent and you are proposing to throw them an exception after
> > upgrading
> > > > > Flink, without any workaround (short of implementing a feature, which
> > > is
> > > > a
> > > > > very problematic requirement). Given that costly upgrades are one of
> > > the
> > > > > major complaints, I would be definitely in favor of option 2. Given
> > the
> > > > > most likely small actually affected user base, I would be +1 for
> > > > solution 2
> > > > > with throwing an exception by default.
> > > > >
> > > > > Best,
> > > > > Piotrek
> > > > >
> > > > >
> > > > > śr., 29 cze 2022 o 12:55 Becket Qin <be...@gmail.com>
> > napisał(a):
> > > > >
> > > > > > Hi Piotr,
> > > > > >
> > > > > > Please see the reply inline below:
> > > > > >
> > > > > > On Wed, Jun 29, 2022 at 5:11 PM Piotr Nowojski <
> > pnowojski@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Becket,
> > > > > > >
> > > > > > > > My main concern of having a supportsPausingSplits() knob
> > > > > > >
> > > > > > > What is the problem with `supportsPausingSplits()` that you see?
> > > Do
> > > > > you
> > > > > > > want to remove it?
> > > > > > >
> > > > > > Just to make sure we are on the same page, I assume we are talking
> > > > about
> > > > > > this supportingPausingSplits() method in the Source interface. If
> > we
> > > go
> > > > > > with the obligatory features addition path, having this method
> > seems
> > > > > > misleading. And also, later on at some point when we see all the
> > > > sources
> > > > > > have implemented this feature, we will have to worry about
> > > deprecating
> > > > > this
> > > > > > method, which is backwards incompatible.
> > > > > >
> > > > > >
> > > > > > > Also I don't understand your proposal for Solution 1. How do you
> > > want
> > > > > to
> > > > > > > decide whether to throw an exception? For that we would need to
> > > have
> > > > > > > `supportsPausingSplits()`, right?
> > > > > > >
> > > > > >
> > > > > > What I am thinking is the following:
> > > > > >
> > > > > > 1. The Flink framework always assumes split pausing is supported
> > and
> > > > just
> > > > > > naively invokes SourceReader#pauseOrResumeSplits().
> > > > > > 2. The SourceReaderBase will basically again try to ask the
> > > SplitReader
> > > > > to
> > > > > > pause the splits.
> > > > > > 3. Because the default implementation throws an
> > > > > > UnsupportedOperationException, if the source developer did not
> > > override
> > > > > it,
> > > > > > this exception will be thrown and bubbled up.
> > > > > > 4. After catching this exception, the SourceReaderBase will just
> > > check
> > > > if
> > > > > > there is only one split that is currently assigned to the split
> > > reader.
> > > > > If
> > > > > > so, it swallows the exception, stops polling the split reader and
> > > > returns
> > > > > > NOTHING_AVAILABLE. This is the same as the current logic in the
> > > > > > SourceOperator. If we are not comfortable with moving this logic to
> > > the
> > > > > > SourceReaderBase, we can also just keep the logic there and simply
> > > let
> > > > > > SourceOperator remember if there are more than one split assigned
> > to
> > > > the
> > > > > > source reader, when SourceOperator.handleAddSplitsEvent() is
> > invoked.
> > > > > >
> > > > > > This way the existing FLIP-182 users won't be impacted by this
> > FLIP.
> > > > For
> > > > > > those source readers that only have one split assigned, it works
> > fine
> > > > > > without any change. For those source readers with multiple splits
> > > > > assigned,
> > > > > > they are already in a limp state with unpredictable side effects.
> > We
> > > > > might
> > > > > > as well let them know this instead of pretending the
> > > > > > coarse-grained watermark alignment works fine for them.
> > > > > >
> > > > > > The advantage of this solution is that we don't have to do anything
> > > > after
> > > > > > this. That would work fine as the final state, as in:
> > > > > > 1. We have already done the best we can do for the Sources that do
> > > not
> > > > > > support split pausing.
> > > > > > 2. It is fully compatible with FLIP-182, if we consider it as the
> > > right
> > > > > > thing to throw an exception for readers reading from multiple
> > splits
> > > > > > without supporting split pausing.
> > > > > > 3. There is nothing to deprecate in the future.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > If so, I would prefer solution 2, to provide a graceful migration
> > > > path
> > > > > > for
> > > > > > > any users that are already using FLIP-182 with multiple splits
> > per
> > > > > > > operator. I don't think there are many of those, but such a flag
> > > > seems
> > > > > > easy
> > > > > > > to implement while making migration easier. Having said that,
> > > > > technically
> > > > > > > we could change the behaviour and start throwing an exception
> > > always
> > > > in
> > > > > > > such a case, as this feature is marked as Experimental.
> > > > > > >
> > > > > >
> > > > > > I actually think neither solution 1 or 2 breaks FLIP-182 users, but
> > > > > > solution 2 needs a deprecation process for the option in the
> > future.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Piotrek
> > > > > > >
> > > > > > > śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com>
> > > > napisał(a):
> > > > > > >
> > > > > > > > Hi Sebastian,
> > > > > > > >
> > > > > > > > Regarding the question,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > @Becket: I'm not sure about the intention of solution 1. Can
> > > you
> > > > > > > explain
> > > > > > > > > that a bit more? In particular, I don't understand: "The
> > > > reasoning
> > > > > > > behind
> > > > > > > > > this solution is that existing users should only use the
> > > > > > > > > coarse watermark alignment when a source reader only reads
> > > from a
> > > > > > > single
> > > > > > > > > split." Why should a user not use coarse watermark alignment
> > > when
> > > > > > > source
> > > > > > > > > reader reads from multiple splits? The split alignment uses
> > the
> > > > > > "coarse
> > > > > > > > > watermark", i.e., maxDesiredWatermark, as described in the
> > FLIP
> > > > for
> > > > > > > > > alignment.
> > > > > > > >
> > > > > > > >
> > > > > > > > Imagine you have a source reader reading from two splits, and
> > the
> > > > > > > > watermarks look like the following:
> > > > > > > > 1. Watermark of Split 1: 10:00 AM Jun 29,
> > > > > > > > 2. Watermark of Split 2: 11:00 AM Jun 29
> > > > > > > > 3. maxDesiredWatermark:10:30 AM Jun 29
> > > > > > > >
> > > > > > > > At this point, the source reader's watermark is 10:00 AM which
> > is
> > > > > lower
> > > > > > > > than the maxDesiredWatermark, so the source reader won't be
> > > paused
> > > > > from
> > > > > > > > reading. However, because the source reader cannot specify
> > which
> > > > > split
> > > > > > to
> > > > > > > > read from, if it continues to read, the watermark gap between
> > the
> > > > two
> > > > > > > > splits may become even bigger. This essentially fails the main
> > > > > purpose
> > > > > > of
> > > > > > > > watermark alignment - to reduce the number of records buffered
> > in
> > > > the
> > > > > > > > state. This does not necessarily happen, but this is not what
> > > > > FLIP-182
> > > > > > > was
> > > > > > > > designed for to begin with. So I'd rather avoid extending the
> > > > feature
> > > > > > > > to that case.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> > > > > > > > sebastian@ververica.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > (Sorry I didn't send to the list but only to Becket. My bad
> > and
> > > > > > thanks
> > > > > > > > > Piotr. Next attempt:)
> > > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > Thanks for pushing the FLIP. I would drive it and would be
> > > happy
> > > > to
> > > > > > get
> > > > > > > > > back to you, @Thomas, for reviews. (Sorry for low
> > > responsiveness,
> > > > > > there
> > > > > > > > > were several efforts with high priority on my side ...) As
> > next
> > > > > > step, I
> > > > > > > > > would revise the FLIP to get the discussion concluded.
> > > > > > > > >
> > > > > > > > > However, as Becket mentioned I feel that some things are
> > still
> > > > not
> > > > > > > clear
> > > > > > > > > yet:
> > > > > > > > >
> > > > > > > > > Re: Thomas
> > > > > > > > >>
> > > > > > > > >> However, from a user perspective, should the split level
> > > > alignment
> > > > > > be
> > > > > > > > >>> an opt-in feature, at least for a few releases? If yes,
> > then
> > > we
> > > > > > would
> > > > > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > > > > understand
> > > > > > > > >>> won't be part of the revised FLIP. Such control may be
> > > > > beneficial:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> * Compare runtime behavior with split level alignment
> > on/off
> > > > > > > > >>> * Allow use of sources that don't implement pausing splits
> > > yet
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> The second point would, from the user's perspective, be
> > > > necessary
> > > > > > for
> > > > > > > > >>> backward compatibility? While the interface aspect and
> > source
> > > > > > > > >>> compatibility has been discussed in great detail, I don't
> > > think
> > > > > it
> > > > > > > > >>> would be desirable if an application that already uses
> > > > alignment
> > > > > > > fails
> > > > > > > > >>> after upgrading to the new Flink version, forcing users to
> > > lock
> > > > > > step
> > > > > > > > >>> modify sources for the new non-optional split level
> > > alignment.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> So I think clarification of the compatibility aspect on the
> > > > FLIP
> > > > > > page
> > > > > > > > >>> would be necessary.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> That is a good point. Currently FLIP-182 is marked as
> > > > > experimental.
> > > > > > So
> > > > > > > > >> technically speaking it could change. That said, I agree
> > that
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > >> good to not break the existing sources.
> > > > > > > > >>
> > > > > > > > >> My main concern of having a supportsPausingSplits() knob is
> > > that
> > > > > > this
> > > > > > > > >> might be something requiring code change on future
> > > derepcation.
> > > > I
> > > > > am
> > > > > > > > >> thinking of two potential solutions:
> > > > > > > > >>
> > > > > > > > >> Solution 1:
> > > > > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is
> > > invoked,
> > > > if
> > > > > > the
> > > > > > > > >> source reader only has one split assigned, the source reader
> > > > > simply
> > > > > > > > stops
> > > > > > > > >> polling but just returns NOTHING_AVAILABLE. If there are
> > more
> > > > than
> > > > > > one
> > > > > > > > >> splits assigned, it throws an exception with a message such
> > as
> > > > > "The
> > > > > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > > > > alignment
> > > > > > > > >> when assigned a single split. There are more than one split
> > > > > assigned
> > > > > > > to
> > > > > > > > the
> > > > > > > > >> SplitReader".
> > > > > > > > >> The reasoning behind this solution is that existing users
> > > should
> > > > > > only
> > > > > > > > use
> > > > > > > > >> the coarse watermark alignment when a source reader only
> > reads
> > > > > from
> > > > > > a
> > > > > > > > >> single split. Reading from more than one split might have
> > > > unwanted
> > > > > > > side
> > > > > > > > >> effects, so we might as well throw an exception in this
> > case.
> > > > > > > > >>
> > > > > > > > >> Solution 2:
> > > > > > > > >> Having a configuration
> > > > > "enable.coarse-grained.watermark.alignment",
> > > > > > > the
> > > > > > > > >> default value is false. Once it is set to true, we will
> > allow
> > > > > > > > >> coarse-grained watermark alignment if a SplitReader is
> > > pausable.
> > > > > > > > >> This solution allows users to keep the current FLIP-182
> > > > behavior,
> > > > > > with
> > > > > > > > >> the risk of side effects.
> > > > > > > > >>
> > > > > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > > > > >> coarse-grained watermark alignment could be frustrating to
> > the
> > > > > users
> > > > > > > > >> when more than one split is assigned. So we might as well
> > not
> > > > > > support
> > > > > > > > it at
> > > > > > > > >> all. And also there is nothing to deprecate in the future
> > with
> > > > > this
> > > > > > > > >> solution.
> > > > > > > > >>
> > > > > > > > >> What do you think?
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > @Thomas: My understanding is that you intend a simple
> > > switch/knob
> > > > > to
> > > > > > > test
> > > > > > > > > w/ and w/o (split) watermark alignment, right? Isn't the
> > > > > > coarse-grained
> > > > > > > > w/
> > > > > > > > > vs w/o watermark alignment sufficient for that? Or do you
> > think
> > > > > that
> > > > > > > > > switching watermark aligment explicitly on split level is
> > > > required?
> > > > > > > > >
> > > > > > > > > @Becket: I'm not sure about the intention of solution 1. Can
> > > you
> > > > > > > explain
> > > > > > > > > that a bit more? In particular, I don't understand: "The
> > > > reasoning
> > > > > > > behind
> > > > > > > > > this solution is that existing users should only use the
> > coarse
> > > > > > > watermark
> > > > > > > > > alignment when a source reader only reads from a single
> > split."
> > > > Why
> > > > > > > > > should a user not use coarse watermark alignment when source
> > > > reader
> > > > > > > reads
> > > > > > > > > from multiple splits? The split alignment uses the "coarse
> > > > > > watermark",
> > > > > > > > > i.e., maxDesiredWatermark, as described in the FLIP for
> > > > alignment.
> > > > > > > > >
> > > > > > > > > Could you please clarify?
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Sebastian
> > > > > > > > >
> > > > > > > > > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Thanks for the feedback, Thomas and Steve. And thanks Piotr
> > > for
> > > > > the
> > > > > > > > >> patient and detailed discussion.
> > > > > > > > >>
> > > > > > > > >> Let's move forward with option 1 then.
> > > > > > > > >>
> > > > > > > > >> Re: Thomas
> > > > > > > > >>
> > > > > > > > >> However, from a user perspective, should the split level
> > > > alignment
> > > > > > be
> > > > > > > > >>> an opt-in feature, at least for a few releases? If yes,
> > then
> > > we
> > > > > > would
> > > > > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > > > > understand
> > > > > > > > >>> won't be part of the revised FLIP. Such control may be
> > > > > beneficial:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> * Compare runtime behavior with split level alignment
> > on/off
> > > > > > > > >>> * Allow use of sources that don't implement pausing splits
> > > yet
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> The second point would, from the user's perspective, be
> > > > necessary
> > > > > > for
> > > > > > > > >>> backward compatibility? While the interface aspect and
> > source
> > > > > > > > >>> compatibility has been discussed in great detail, I don't
> > > think
> > > > > it
> > > > > > > > >>> would be desirable if an application that already uses
> > > > alignment
> > > > > > > fails
> > > > > > > > >>> after upgrading to the new Flink version, forcing users to
> > > lock
> > > > > > step
> > > > > > > > >>> modify sources for the new non-optional split level
> > > alignment.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>> So I think clarification of the compatibility aspect on the
> > > > FLIP
> > > > > > page
> > > > > > > > >>> would be necessary.
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> That is a good point. Currently FLIP-182 is marked as
> > > > > experimental.
> > > > > > So
> > > > > > > > >> technically speaking it could change. That said, I agree
> > that
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > >> good to not break the existing sources.
> > > > > > > > >>
> > > > > > > > >> My main concern of having a supportsPausingSplits() knob is
> > > that
> > > > > > this
> > > > > > > > >> might be something requiring code change on future
> > > derepcation.
> > > > I
> > > > > am
> > > > > > > > >> thinking of two potential solutions:
> > > > > > > > >>
> > > > > > > > >> Solution 1:
> > > > > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is
> > > invoked,
> > > > if
> > > > > > the
> > > > > > > > >> source reader only has one split assigned, the source reader
> > > > > simply
> > > > > > > > stops
> > > > > > > > >> polling but just returns NOTHING_AVAILABLE. If there are
> > more
> > > > than
> > > > > > one
> > > > > > > > >> splits assigned, it throws an exception with a message such
> > as
> > > > > "The
> > > > > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > > > > alignment
> > > > > > > > when
> > > > > > > > >> assigned a single split. There are more than one split
> > > assigned
> > > > to
> > > > > > the
> > > > > > > > >> SplitReader".
> > > > > > > > >> The reasoning behind this solution is that existing users
> > > should
> > > > > > only
> > > > > > > > use
> > > > > > > > >> the coarse watermark alignment when a source reader only
> > reads
> > > > > from
> > > > > > a
> > > > > > > > >> single split. Reading from more than one split might have
> > > > unwanted
> > > > > > > side
> > > > > > > > >> effects, so we might as well throw an exception in this
> > case.
> > > > > > > > >>
> > > > > > > > >> Solution 2:
> > > > > > > > >> Having a configuration
> > > > > "enable.coarse-grained.watermark.alignment",
> > > > > > > the
> > > > > > > > >> default value is false. Once it is set to true, we will
> > allow
> > > > > > > > >> coarse-grained watermark alignment if a SplitReader is
> > > pausable.
> > > > > > > > >> This solution allows users to keep the current FLIP-182
> > > > behavior,
> > > > > > with
> > > > > > > > >> the risk of side effects.
> > > > > > > > >>
> > > > > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > > > > >> coarse-grained watermark alignment could be frustrating to
> > the
> > > > > users
> > > > > > > > when
> > > > > > > > >> more than one split is assigned. So we might as well not
> > > support
> > > > > it
> > > > > > at
> > > > > > > > all.
> > > > > > > > >> And also there is nothing to deprecate in the future with
> > this
> > > > > > > solution.
> > > > > > > > >>
> > > > > > > > >> What do you think?
> > > > > > > > >>
> > > > > > > > >> Thank,
> > > > > > > > >>
> > > > > > > > >> Jiangjie (Becket) Qin
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <
> > > > > > pnowojski@apache.org>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >>> Hi,
> > > > > > > > >>>
> > > > > > > > >>> It looks like option 1 wins overall? So let's go with that.
> > > > > > > > >>>
> > > > > > > > >>> Best,
> > > > > > > > >>> Piotrek
> > > > > > > > >>>
> > > > > > > > >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com>
> > > > > > > napisał(a):
> > > > > > > > >>>
> > > > > > > > >>>> Both option 1 (default impl in base interface) and option
> > 2
> > > > > > > > (decorative
> > > > > > > > >>>> interface) are pretty common patterns. I would also be
> > fine
> > > > with
> > > > > > > > either.
> > > > > > > > >>>> The important thing is that an exception is thrown if a
> > > source
> > > > > > > doesn't
> > > > > > > > >>>> support the alignment capability.
> > > > > > > > >>>>
> > > > > > > > >>>> The other point is that we can validate the source
> > > capability
> > > > if
> > > > > > > > >>>> alignment
> > > > > > > > >>>> is enabled in WatermarkStrategy. I believe either option
> > can
> > > > > > achieve
> > > > > > > > >>>> this
> > > > > > > > >>>> goal too.
> > > > > > > > >>>> public interface WatermarkStrategy<T> {
> > > > > > > > >>>>     WatermarkStrategy<T> withWatermarkAlignment(String
> > > > > > > watermarkGroup,
> > > > > > > > >>>> Duration maxAllowedWatermarkDrift);
> > > > > > > > >>>> }
> > > > > > > > >>>>
> > > > > > > > >>>> If I have to pick one, I am slightly favoring option 1
> > (base
> > > > > > > > >>>> interface). As
> > > > > > > > >>>> watermark is already an essential concept of source, maybe
> > > > > > watermark
> > > > > > > > >>>> alignment capability can also be a property of the base
> > > > > > > source/reader
> > > > > > > > >>>> interface.
> > > > > > > > >>>>
> > > > > > > > >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <
> > > thw@apache.org>
> > > > > > > wrote:
> > > > > > > > >>>>
> > > > > > > > >>>> > Hi everyone,
> > > > > > > > >>>> >
> > > > > > > > >>>> > Thank you for all the effort that went into this
> > > discussion.
> > > > > The
> > > > > > > > split
> > > > > > > > >>>> > level watermark alignment will be an important feature
> > for
> > > > > Flink
> > > > > > > > that
> > > > > > > > >>>> > will address operational problems for various use cases.
> > > > From
> > > > > > > > reading
> > > > > > > > >>>> > through this thread it appears that not too much remains
> > > to
> > > > > > bring
> > > > > > > > this
> > > > > > > > >>>> > FLIP to acceptance and allow development to move
> > forward.
> > > I
> > > > > > would
> > > > > > > > like
> > > > > > > > >>>> > to contribute if possible.
> > > > > > > > >>>> >
> > > > > > > > >>>> > Regarding option 1 vs. option 2: I don't have a strong
> > > > > > preference,
> > > > > > > > >>>> > perhaps slightly leaning towards option 1.
> > > > > > > > >>>> >
> > > > > > > > >>>> > However, from a user perspective, should the split level
> > > > > > alignment
> > > > > > > > be
> > > > > > > > >>>> > an opt-in feature, at least for a few releases? If yes,
> > > then
> > > > > we
> > > > > > > > would
> > > > > > > > >>>> > require a knob similar to supportsPausingSplits(),
> > which I
> > > > > > > > understand
> > > > > > > > >>>> > won't be part of the revised FLIP. Such control may be
> > > > > > beneficial:
> > > > > > > > >>>> >
> > > > > > > > >>>> > * Compare runtime behavior with split level alignment
> > > on/off
> > > > > > > > >>>> > * Allow use of sources that don't implement pausing
> > splits
> > > > yet
> > > > > > > > >>>> >
> > > > > > > > >>>> > The second point would, from the user's perspective, be
> > > > > > necessary
> > > > > > > > for
> > > > > > > > >>>> > backward compatibility? While the interface aspect and
> > > > source
> > > > > > > > >>>> > compatibility has been discussed in great detail, I
> > don't
> > > > > think
> > > > > > it
> > > > > > > > >>>> > would be desirable if an application that already uses
> > > > > alignment
> > > > > > > > fails
> > > > > > > > >>>> > after upgrading to the new Flink version, forcing users
> > to
> > > > > lock
> > > > > > > step
> > > > > > > > >>>> > modify sources for the new non-optional split level
> > > > alignment.
> > > > > > > > >>>> >
> > > > > > > > >>>> > So I think clarification of the compatibility aspect on
> > > the
> > > > > FLIP
> > > > > > > > page
> > > > > > > > >>>> > would be necessary.
> > > > > > > > >>>> >
> > > > > > > > >>>> > Thanks,
> > > > > > > > >>>> > Thomas
> > > > > > > > >>>> >
> > > > > > > > >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> > > > > > > > >>>> piotr.nowojski@gmail.com>
> > > > > > > > >>>> > wrote:
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Hi Becket,
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Thanks for summing this up. Just one correction:
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > > > > >>>> > > >   e) It is OK that the code itself in option 2
> > > indicates
> > > > > the
> > > > > > > > >>>> developers
> > > > > > > > >>>> > > that a feature is optional. We will rely on the
> > > > > documentation
> > > > > > to
> > > > > > > > >>>> correct
> > > > > > > > >>>> > > that and clarify that the feature is actually
> > > obligatory.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > I would say based on a) and b) that feature would be
> > > still
> > > > > > > > >>>> optional. So
> > > > > > > > >>>> > > both the implementation and the documentation would be
> > > > > saying
> > > > > > > > that.
> > > > > > > > >>>> We
> > > > > > > > >>>> > > could add a mention to the docs and release notes,
> > that
> > > > this
> > > > > > > > >>>> feature will
> > > > > > > > >>>> > > be obligatory in the next major release and plan such
> > a
> > > > > > release
> > > > > > > > >>>> > accordingly.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Re the option 1., as you mentioned:
> > > > > > > > >>>> > > > As for option 1: For developers, the feature is
> > still
> > > > > > optional
> > > > > > > > >>>> due to
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > default implementation in the interface, regardless of
> > > > what
> > > > > > the
> > > > > > > > >>>> default
> > > > > > > > >>>> > > implementation does, because the code compiles without
> > > > > > > overriding
> > > > > > > > >>>> these
> > > > > > > > >>>> > > methods
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Also importantly, the code will work in most cases.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > > Obligatory: Jobs may fail if these methods are not
> > > > > > implemented
> > > > > > > > >>>> > properly.
> > > > > > > > >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a
> > > common
> > > > > > > pattern
> > > > > > > > in
> > > > > > > > >>>> > Java,
> > > > > > > > >>>> > > e.g. Iterator.remove() by default throws
> > > > > > > > >>>> "UnsupportedOperationException",
> > > > > > > > >>>> > > informing the implementation that things may go wrong
> > if
> > > > > this
> > > > > > > > >>>> method is
> > > > > > > > >>>> > not
> > > > > > > > >>>> > > implemented.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > For me `Iterator#remove()` is an optional feature.
> > > > > > Personally, I
> > > > > > > > >>>> don't
> > > > > > > > >>>> > > remember if I have ever implemented it.
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > Best,
> > > > > > > > >>>> > > Piotrek
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > > > > >>>> napisał(a):
> > > > > > > > >>>> > >
> > > > > > > > >>>> > > > I had an offline discussion with Piotr and here is
> > the
> > > > > > > summary.
> > > > > > > > >>>> Please
> > > > > > > > >>>> > > > correct me if I miss something, Piotr.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > There are two things we would like to seek more
> > > opinions
> > > > > > from
> > > > > > > > the
> > > > > > > > >>>> > > > community, so we can make progress on this FLIP.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > 1. The General pattern to add obligatory features to
> > > > > > existing
> > > > > > > > >>>> > interfaces.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > ***********************************************************************************
> > > > > > > > >>>> > > > For interfaces exposed to the developers for
> > > > > implementation,
> > > > > > > > they
> > > > > > > > >>>> are
> > > > > > > > >>>> > > > either intended to be *optional* or *obligatory.
> > > *While
> > > > it
> > > > > > is
> > > > > > > > >>>> quite
> > > > > > > > >>>> > clear
> > > > > > > > >>>> > > > about how to convey that intention when creating the
> > > > > > > interfaces,
> > > > > > > > >>>> it is
> > > > > > > > >>>> > not
> > > > > > > > >>>> > > > as commonly agreed when we are adding new features
> > to
> > > an
> > > > > > > > existing
> > > > > > > > >>>> > > > interface. In general, Flink uses decorative
> > > interfaces
> > > > > when
> > > > > > > > >>>> adding
> > > > > > > > >>>> > > > optional features to existing interfaces. Both Piotr
> > > > and I
> > > > > > > agree
> > > > > > > > >>>> that
> > > > > > > > >>>> > looks
> > > > > > > > >>>> > > > good.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > Different opinions are mainly about how to add
> > > > obligatory
> > > > > > > > >>>> features to
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > > existing interfaces, probably due to different
> > > > > > understandings
> > > > > > > of
> > > > > > > > >>>> > > > "obligatory".
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > We have discussed about four options:
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > *Option 1:*
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >    - Just add a new method to the existing
> > interface.
> > > > > > > > >>>> > > >    - For backwards compatibility, the method would
> > > have
> > > > a
> > > > > > > > default
> > > > > > > > >>>> > > >    implementation throwing
> > > > > "UnsupportedOperationException".
> > > > > > > > >>>> > > >    - In the next major version, remove the default
> > > > > > > > implementation.
> > > > > > > > >>>> > > >    - For the developers, any method with a default
> > > > > > > > implementation
> > > > > > > > >>>> > > >    throwing an "UnsupportedOperationException"
> > should
> > > be
> > > > > > taken
> > > > > > > > as
> > > > > > > > >>>> > obligatory.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > *Option 2:*
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >    - Always make the features optional by adding a
> > > > > > decorative
> > > > > > > > >>>> > interface,
> > > > > > > > >>>> > > >    just like ordinary optional features.
> > > > > > > > >>>> > > >    - Inform the developers via documentation that
> > this
> > > > > > feature
> > > > > > > > is
> > > > > > > > >>>> > > >    obligatory, although it looks like optional from
> > > the
> > > > > > code.
> > > > > > > > >>>> > > >    - In case the developers did not implement the
> > > > > decorative
> > > > > > > > >>>> interface,
> > > > > > > > >>>> > > >    throw an exception
> > > > > > > > >>>> > > >    - In the next major version, move the methods in
> > > the
> > > > > > > > decorative
> > > > > > > > >>>> > > >    interface to the base interface, and deprecate
> > the
> > > > > > > decorative
> > > > > > > > >>>> > interface.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > *Option 3:*
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >    - Always bump the major version when a new
> > > obligatory
> > > > > > > feature
> > > > > > > > >>>> is
> > > > > > > > >>>> > > >    added, even if we may have to do it frequently.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > *Option 4:*
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >    - Add a V2, V3... of the interface affected by
> > the
> > > > new
> > > > > > > > >>>> obligatory
> > > > > > > > >>>> > > >    feature.
> > > > > > > > >>>> > > >    - In the next major versions, deprecate old
> > > versions
> > > > of
> > > > > > the
> > > > > > > > >>>> > interfaces.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > Both Piotr and me agreed that option 3 and option 4
> > > > have a
> > > > > > big
> > > > > > > > >>>> side
> > > > > > > > >>>> > effect
> > > > > > > > >>>> > > > and should be avoided. We have different preference
> > > > > between
> > > > > > > > >>>> option 1
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > > option 2.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > Personally I prefer option 1, the reasons are:
> > > > > > > > >>>> > > >   a) simple and intuitive. Java 8 introduced the
> > > default
> > > > > > impl
> > > > > > > in
> > > > > > > > >>>> > > > interfaces exactly for interface evolving, and this
> > > is a
> > > > > > > common
> > > > > > > > >>>> > pattern in
> > > > > > > > >>>> > > > many projects.
> > > > > > > > >>>> > > >   b) prominent to the developers that the feature is
> > > > > > expected
> > > > > > > to
> > > > > > > > >>>> be
> > > > > > > > >>>> > > > implemented, because it explicitly throws an
> > exception
> > > > in
> > > > > > the
> > > > > > > > >>>> default
> > > > > > > > >>>> > impl.
> > > > > > > > >>>> > > >   c) low maintenance overhead - the Flink framework
> > > can
> > > > > > always
> > > > > > > > >>>> assume
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > > method exists, so no special handling logic is
> > needed.
> > > > > > > > >>>> > > >   d) communicate a clear semantic boundary between
> > > > > optional
> > > > > > > and
> > > > > > > > >>>> > obligatory
> > > > > > > > >>>> > > > features in the Flink to the developers.
> > > > > > > > >>>> > > >       - Optional: Jobs still run without exception
> > if
> > > > > these
> > > > > > > > >>>> methods are
> > > > > > > > >>>> > > > not implemented. e.g. all the SupportsXXXPushDown
> > > > > > interfaces.
> > > > > > > > >>>> > > >       - Obligatory: Jobs may fail if these methods
> > are
> > > > not
> > > > > > > > >>>> implemented
> > > > > > > > >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits().
> > > This
> > > > > is a
> > > > > > > > >>>> common
> > > > > > > > >>>> > pattern
> > > > > > > > >>>> > > > in Java, e.g. Iterator.remove() by default throws
> > > > > > > > >>>> > > > "UnsupportedOperationException", informing the
> > > > > > implementation
> > > > > > > > that
> > > > > > > > >>>> > things
> > > > > > > > >>>> > > > may go wrong if this method is not implemented.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > As for option 2, Although the API itself sounds
> > clean,
> > > > it
> > > > > > > > misleads
> > > > > > > > >>>> > people
> > > > > > > > >>>> > > > to think of an obligatory feature to be optional -
> > > from
> > > > > the
> > > > > > > code
> > > > > > > > >>>> the
> > > > > > > > >>>> > > > feature is optional, but the documents say it is
> > > > > obligatory.
> > > > > > > We
> > > > > > > > >>>> > probably
> > > > > > > > >>>> > > > should avoid such code-doc inconsistency, as people
> > > will
> > > > > be
> > > > > > > > >>>> confused.
> > > > > > > > >>>> > And I
> > > > > > > > >>>> > > > would actually be bewildered that sometimes not
> > > > > implementing
> > > > > > > an
> > > > > > > > >>>> > "optional"
> > > > > > > > >>>> > > > feature is fine, but sometimes it causes the jobs to
> > > > fail.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > In response to the argument that the method with a
> > > > default
> > > > > > > > >>>> > implementation
> > > > > > > > >>>> > > > is always optional, if that is true, it actually
> > means
> > > > all
> > > > > > the
> > > > > > > > >>>> > interfaces
> > > > > > > > >>>> > > > should be immutable once they are created. If we
> > want
> > > to
> > > > > > add a
> > > > > > > > >>>> method
> > > > > > > > >>>> > to an
> > > > > > > > >>>> > > > existing interface, for backwards compatibility, we
> > > will
> > > > > > have
> > > > > > > to
> > > > > > > > >>>> > provide a
> > > > > > > > >>>> > > > default implementation. And the fact it has a
> > default
> > > > > > > > >>>> implementation
> > > > > > > > >>>> > > > indicates the method is optional. If that method is
> > > > > > optional,
> > > > > > > it
> > > > > > > > >>>> should
> > > > > > > > >>>> > > > reside in a separate decorative interface, otherwise
> > > it
> > > > > > clogs
> > > > > > > > that
> > > > > > > > >>>> > existing
> > > > > > > > >>>> > > > interface. Therefore, people should never add a
> > method
> > > > to
> > > > > an
> > > > > > > > >>>> existing
> > > > > > > > >>>> > > > interface. I find this conclusion a bit extreme.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > > > > >>>> > > >     a) Obligatory methods are the methods that fail
> > > the
> > > > > code
> > > > > > > > >>>> > compilation
> > > > > > > > >>>> > > > if not implemented.
> > > > > > > > >>>> > > >     b) All obligatory methods should reside in the
> > > base
> > > > > > > > interface,
> > > > > > > > >>>> > without
> > > > > > > > >>>> > > > a default implementation. And all the optional
> > methods
> > > > > > should
> > > > > > > be
> > > > > > > > >>>> in
> > > > > > > > >>>> > > > decorative interfaces. This is a clean API.
> > > > > > > > >>>> > > >     c) due to b), there isn't a viable solution to
> > add
> > > > an
> > > > > > > > >>>> obligatory
> > > > > > > > >>>> > > > method to an existing interface in a backwards
> > > > compatible
> > > > > > way.
> > > > > > > > >>>> Unless
> > > > > > > > >>>> > we
> > > > > > > > >>>> > > > are OK with breaking backwards compatibility, all
> > the
> > > > > > > interfaces
> > > > > > > > >>>> > should be
> > > > > > > > >>>> > > > treated as immutable. As a compromise, we might as
> > > well
> > > > > just
> > > > > > > > >>>> treat all
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > > features added later as optional features. This way
> > we
> > > > > keep
> > > > > > > the
> > > > > > > > >>>> API
> > > > > > > > >>>> > clean.
> > > > > > > > >>>> > > >     d) based on b) and c), option 2 has a clean API,
> > > > while
> > > > > > > > option
> > > > > > > > >>>> 1
> > > > > > > > >>>> > does
> > > > > > > > >>>> > > > not.
> > > > > > > > >>>> > > >     e) It is OK that the code itself in option 2
> > > > indicates
> > > > > > the
> > > > > > > > >>>> > developers
> > > > > > > > >>>> > > > that a feature is optional. We will rely on the
> > > > > > documentation
> > > > > > > to
> > > > > > > > >>>> > correct
> > > > > > > > >>>> > > > that and clarify that the feature is actually
> > > > obligatory.
> > > > > > > > >>>> > > >     f) Regarding the effectiveness of making people
> > > > aware
> > > > > > that
> > > > > > > > the
> > > > > > > > >>>> > feature
> > > > > > > > >>>> > > > is obligatory, Option 1 and Option 2 are similar.
> > For
> > > > > people
> > > > > > > > that
> > > > > > > > >>>> do
> > > > > > > > >>>> > not
> > > > > > > > >>>> > > > read the release note / documentation, they will
> > > mistake
> > > > > the
> > > > > > > > >>>> feature
> > > > > > > > >>>> > to be
> > > > > > > > >>>> > > > optional anyways.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > As for option 1: For developers, the feature is
> > still
> > > > > > optional
> > > > > > > > >>>> due to
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > > default implementation in the interface, regardless
> > of
> > > > > what
> > > > > > > the
> > > > > > > > >>>> default
> > > > > > > > >>>> > > > implementation does, because the code compiles
> > without
> > > > > > > > overriding
> > > > > > > > >>>> these
> > > > > > > > >>>> > > > methods. Also, another problem of this option is
> > that
> > > > for
> > > > > > > users
> > > > > > > > >>>> that
> > > > > > > > >>>> > do not
> > > > > > > > >>>> > > > know about the history of the interface, they may be
> > > > > > confused
> > > > > > > by
> > > > > > > > >>>> the
> > > > > > > > >>>> > > > default implementation throwing an exception.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > 2. For this particular FLIP, should it be optional
> > or
> > > > not?
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > ***********************************************************************************
> > > > > > > > >>>> > > > As mentioned in the previous email, I feel this FLIP
> > > > > should
> > > > > > be
> > > > > > > > >>>> > obligatory,
> > > > > > > > >>>> > > > for the following reasons:
> > > > > > > > >>>> > > > 1. The Flink framework exposes the watermark
> > alignment
> > > > API
> > > > > > to
> > > > > > > > the
> > > > > > > > >>>> end
> > > > > > > > >>>> > > > users. From the end users' perspective, the feature
> > > > should
> > > > > > be
> > > > > > > > >>>> available
> > > > > > > > >>>> > > > regardless of the implementation details in the
> > > > > pluggables.
> > > > > > > This
> > > > > > > > >>>> is
> > > > > > > > >>>> > true
> > > > > > > > >>>> > > > for any other methods exposed as the Flink API.
> > > > > > > > >>>> > > > 2. If a Source is not pausable, the end user should
> > > > > receive
> > > > > > an
> > > > > > > > >>>> > exception
> > > > > > > > >>>> > > > when enable the watermark alignment, (both Piotr and
> > > me
> > > > > > agree
> > > > > > > on
> > > > > > > > >>>> > this). In
> > > > > > > > >>>> > > > that case, it meets my criteria of obligatory
> > feature
> > > > > > because
> > > > > > > > not
> > > > > > > > >>>> > > > implementing the feature causes a framework API to
> > > throw
> > > > > > > > >>>> exception and
> > > > > > > > >>>> > > > fails the job.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > On the other hand, Piotr does not have a strong
> > > opinion
> > > > > > > > regarding
> > > > > > > > >>>> > whether
> > > > > > > > >>>> > > > this feature should be optional or not.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > Thanks for reading through this long email. So
> > > basically
> > > > > in
> > > > > > > > order
> > > > > > > > >>>> to
> > > > > > > > >>>> > make
> > > > > > > > >>>> > > > progress on this FLIP, we want to see what do people
> > > > feel
> > > > > > > about
> > > > > > > > >>>> the
> > > > > > > > >>>> > above
> > > > > > > > >>>> > > > two topics.
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > Thanks,
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> > > > > > > > >>>> pnowojski@apache.org>
> > > > > > > > >>>> > > > wrote:
> > > > > > > > >>>> > > >
> > > > > > > > >>>> > > >> Hi Becket,
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> I still sustain what I wrote before:
> > > > > > > > >>>> > > >> > I think I would still vote soft -1 on this
> > option,
> > > > but
> > > > > I
> > > > > > > > >>>> wouldn't
> > > > > > > > >>>> > block
> > > > > > > > >>>> > > >> it in case I am out-voted.
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> > I think it might be helpful to agree on the
> > > > definition
> > > > > of
> > > > > > > > >>>> optional
> > > > > > > > >>>> > in
> > > > > > > > >>>> > > >> our
> > > > > > > > >>>> > > >> case.
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> For me it doesn't matter whether a default method
> > > > > throwing
> > > > > > an
> > > > > > > > >>>> > exception we
> > > > > > > > >>>> > > >> call optional or non-optional. As long as we keep
> > it
> > > > this
> > > > > > > way,
> > > > > > > > >>>> the
> > > > > > > > >>>> > effect
> > > > > > > > >>>> > > >> is the same. It's effectively a method that a user
> > > > > doesn't
> > > > > > > have
> > > > > > > > >>>> to
> > > > > > > > >>>> > > >> implement. If interface/system allows some methods
> > to
> > > > be
> > > > > > not
> > > > > > > > >>>> > implemented,
> > > > > > > > >>>> > > >> some users will do just that, regardless if we call
> > > it
> > > > > and
> > > > > > > > >>>> document as
> > > > > > > > >>>> > > >> non-optional. And at the same time it's clogging
> > the
> > > > base
> > > > > > > > >>>> interface.
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> By the way, just the need for a
> > > java-doc/documentation
> > > > > > > > >>>> explaining the
> > > > > > > > >>>> > > >> existence of some construct is a bad smell (code
> > > should
> > > > > be
> > > > > > > > >>>> > > >> self-documenting
> > > > > > > > >>>> > > >> and default method throwing an
> > > > > > UnsupportedOperationException
> > > > > > > is
> > > > > > > > >>>> not).
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> > Please note that so far we do not assume whether
> > > the
> > > > > > > feature
> > > > > > > > >>>> is in
> > > > > > > > >>>> > > >> > the original API or it is added later. A newly
> > > added
> > > > > > > feature
> > > > > > > > >>>> can
> > > > > > > > >>>> > also be
> > > > > > > > >>>> > > >> > non-optional, although it might take some time
> > for
> > > > all
> > > > > > the
> > > > > > > > >>>> pluggable
> > > > > > > > >>>> > > >> > developers to catch up, and they should still
> > work
> > > if
> > > > > the
> > > > > > > new
> > > > > > > > >>>> > feature is
> > > > > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> > > > never
> > > > > > > > expect
> > > > > > > > >>>> an
> > > > > > > > >>>> > > >> optional
> > > > > > > > >>>> > > >> > feature to catch up, because leaving it
> > > unimplemented
> > > > > is
> > > > > > > also
> > > > > > > > >>>> > blessed.
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > Let's take the checkpointing as an example.
> > Imagine
> > > > > Flink
> > > > > > > did
> > > > > > > > >>>> not
> > > > > > > > >>>> > > >> support
> > > > > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > > > > trying
> > > > > > to
> > > > > > > > add
> > > > > > > > >>>> > > >> > checkpointing to Flink. So we exposed the
> > > checkpoint
> > > > > > > > >>>> configuration
> > > > > > > > >>>> > to
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > end users. In the meantime, will we tell the
> > > > pluggable
> > > > > > > (e.g.
> > > > > > > > >>>> > operators,
> > > > > > > > >>>> > > >> > connectors) developers that methods like
> > > > > > "snapshotState()"
> > > > > > > is
> > > > > > > > >>>> > optional?
> > > > > > > > >>>> > > >> If
> > > > > > > > >>>> > > >> > we do that, the availability of checkpointing in
> > > > Flink
> > > > > > > would
> > > > > > > > be
> > > > > > > > >>>> > severely
> > > > > > > > >>>> > > >> > weakened. But apparently we should still allow
> > the
> > > > > > existing
> > > > > > > > >>>> > > >> implementations
> > > > > > > > >>>> > > >> > to work without checkpointing. It looks to me
> > that
> > > > > adding
> > > > > > > the
> > > > > > > > >>>> > method to
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > pluggable interfaces with a default
> > implementation
> > > > > > throwing
> > > > > > > > >>>> > > >> > "UnsupportedOperationException" would be the
> > > solution
> > > > > > here.
> > > > > > > > >>>> Please
> > > > > > > > >>>> > note
> > > > > > > > >>>> > > >> > that in this case, having the default
> > > implementation
> > > > > does
> > > > > > > not
> > > > > > > > >>>> mean
> > > > > > > > >>>> > this
> > > > > > > > >>>> > > >> is
> > > > > > > > >>>> > > >> > optional. It is just the technique to support
> > > > backwards
> > > > > > > > >>>> > compatibility in
> > > > > > > > >>>> > > >> > the feature evolution. The fact that this method
> > is
> > > > in
> > > > > > the
> > > > > > > > base
> > > > > > > > >>>> > > >> interface
> > > > > > > > >>>> > > >> > suggests it is not optional, so the developers
> > > SHOULD
> > > > > > > > >>>> implement it.
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> I would soft vote -1 for having the default method
> > > > > throwing
> > > > > > > > >>>> > > >> UnsupportedOperationException as one of thing for
> > > this
> > > > > > > > (FLIP-217)
> > > > > > > > >>>> > special
> > > > > > > > >>>> > > >> circumstances.
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> At the moment, without thinking this over too
> > much, I
> > > > > would
> > > > > > > > vote
> > > > > > > > >>>> > harder -1
> > > > > > > > >>>> > > >> for having this as a general rule when adding new
> > > > > features.
> > > > > > > If
> > > > > > > > we
> > > > > > > > >>>> > ever end
> > > > > > > > >>>> > > >> up with an API that is littered with default
> > methods
> > > > > > throwing
> > > > > > > > >>>> > > >> UnsupportedOperationException that are documented
> > as
> > > > "non
> > > > > > > > >>>> optional" it
> > > > > > > > >>>> > > >> would be IMO a big design failure. I would be
> > ok-ish
> > > > with
> > > > > > > that,
> > > > > > > > >>>> only
> > > > > > > > >>>> > if
> > > > > > > > >>>> > > >> that was a temporary thing and we had an aggressive
> > > > plan
> > > > > to
> > > > > > > > >>>> release
> > > > > > > > >>>> > more
> > > > > > > > >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...)
> > > > > breaking
> > > > > > > API
> > > > > > > > >>>> > > >> compatibility, that would get rid of those default
> > > > > methods.
> > > > > > > > >>>> Adding
> > > > > > > > >>>> > > >> checkpointing and methods like "snapshotState()"
> > > would
> > > > > IMO
> > > > > > > > easily
> > > > > > > > >>>> > justify
> > > > > > > > >>>> > > >> a
> > > > > > > > >>>> > > >> new major Flink release. In that case we could add
> > > > those
> > > > > > > > methods
> > > > > > > > >>>> with
> > > > > > > > >>>> > > >> default implementation for some transition period,
> > a
> > > > one
> > > > > or
> > > > > > > two
> > > > > > > > >>>> minor
> > > > > > > > >>>> > > >> releases, followed by a clean up in a major
> > release.
> > > > > > However
> > > > > > > I
> > > > > > > > >>>> would
> > > > > > > > >>>> > still
> > > > > > > > >>>> > > >> argue that it would be cleaner/better to add a
> > > > decorative
> > > > > > > > >>>> interface
> > > > > > > > >>>> > like
> > > > > > > > >>>> > > >> `CheckpointedOperator` interface instead of adding
> > > > those
> > > > > > > > default
> > > > > > > > >>>> > methods
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> the base `Operator` interface.
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> I think I can sum up our disagreement here is that
> > I
> > > > > would
> > > > > > > like
> > > > > > > > >>>> to
> > > > > > > > >>>> > keep
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> interfaces simpler, with only obligatory
> > > > methods/features
> > > > > > on
> > > > > > > > one
> > > > > > > > >>>> side
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> clearly optional features on the other. While you
> > > would
> > > > > > like
> > > > > > > to
> > > > > > > > >>>> add an
> > > > > > > > >>>> > > >> extra third state in between those two?
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> Best,
> > > > > > > > >>>> > > >> Piotrek
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <
> > > > > becket.qin@gmail.com
> > > > > > >
> > > > > > > > >>>> > napisał(a):
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >> > Thanks for the clarification, Piotr and
> > Sebastian.
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > It looks like the key problem is still whether
> > the
> > > > > > > > >>>> implementation of
> > > > > > > > >>>> > > >> > pausable splits in the Sources should be optional
> > > or
> > > > > not.
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > I think it might be helpful to agree on the
> > > > definition
> > > > > of
> > > > > > > > >>>> optional
> > > > > > > > >>>> > in
> > > > > > > > >>>> > > >> our
> > > > > > > > >>>> > > >> > case. To me:
> > > > > > > > >>>> > > >> > Optional = "You CAN leave the method
> > unimplemented,
> > > > and
> > > > > > > that
> > > > > > > > is
> > > > > > > > >>>> > fine."
> > > > > > > > >>>> > > >> > Non-Optional = "You CAN leave the method
> > > > unimplemented,
> > > > > > but
> > > > > > > > you
> > > > > > > > >>>> > SHOULD
> > > > > > > > >>>> > > >> NOT,
> > > > > > > > >>>> > > >> > because people assume this works."
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > I think one sufficient condition of a
> > Non-Optional
> > > > > > feature
> > > > > > > is
> > > > > > > > >>>> that
> > > > > > > > >>>> > if
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > feature is exposed through the framework API,
> > Flink
> > > > > > should
> > > > > > > > >>>> expect
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > pluggables to support this feature by default.
> > > > > Otherwise
> > > > > > > the
> > > > > > > > >>>> > > >> availability
> > > > > > > > >>>> > > >> > of that feature becomes undefined.
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > Please note that so far we do not assume whether
> > > the
> > > > > > > feature
> > > > > > > > >>>> is in
> > > > > > > > >>>> > > >> > the original API or it is added later. A newly
> > > added
> > > > > > > feature
> > > > > > > > >>>> can
> > > > > > > > >>>> > also be
> > > > > > > > >>>> > > >> > non-optional, although it might take some time
> > for
> > > > all
> > > > > > the
> > > > > > > > >>>> pluggable
> > > > > > > > >>>> > > >> > developers to catch up, and they should still
> > work
> > > if
> > > > > the
> > > > > > > new
> > > > > > > > >>>> > feature is
> > > > > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> > > > never
> > > > > > > > expect
> > > > > > > > >>>> an
> > > > > > > > >>>> > > >> optional
> > > > > > > > >>>> > > >> > feature to catch up, because leaving it
> > > unimplemented
> > > > > is
> > > > > > > also
> > > > > > > > >>>> > blessed.
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > Let's take the checkpointing as an example.
> > Imagine
> > > > > Flink
> > > > > > > did
> > > > > > > > >>>> not
> > > > > > > > >>>> > > >> support
> > > > > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > > > > trying
> > > > > > to
> > > > > > > > add
> > > > > > > > >>>> > > >> > checkpointing to Flink. So we exposed the
> > > checkpoint
> > > > > > > > >>>> configuration
> > > > > > > > >>>> > to
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > end users. In the meantime, will we tell the
> > > > pluggable
> > > > > > > (e.g.
> > > > > > > > >>>> > operators,
> > > > > > > > >>>> > > >> > connectors) developers that methods like
> > > > > > "snapshotState()"
> > > > > > > is
> > > > > > > > >>>> > optional?
> > > > > > > > >>>> > > >> If
> > > > > > > > >>>> > > >> > we do that, the availability of checkpointing in
> > > > Flink
> > > > > > > would
> > > > > > > > be
> > > > > > > > >>>> > severely
> > > > > > > > >>>> > > >> > weakened. But apparently we should still allow
> > the
> > > > > > existing
> > > > > > > > >>>> > > >> implementations
> > > > > > > > >>>> > > >> > to work without checkpointing. It looks to me
> > that
> > > > > adding
> > > > > > > the
> > > > > > > > >>>> > method to
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > pluggable interfaces with a default
> > implementation
> > > > > > throwing
> > > > > > > > >>>> > > >> > "UnsupportedOperationException" would be the
> > > solution
> > > > > > here.
> > > > > > > > >>>> Please
> > > > > > > > >>>> > note
> > > > > > > > >>>> > > >> > that in this case, having the default
> > > implementation
> > > > > does
> > > > > > > not
> > > > > > > > >>>> mean
> > > > > > > > >>>> > this
> > > > > > > > >>>> > > >> is
> > > > > > > > >>>> > > >> > optional. It is just the technique to support
> > > > backwards
> > > > > > > > >>>> > compatibility in
> > > > > > > > >>>> > > >> > the feature evolution. The fact that this method
> > is
> > > > in
> > > > > > the
> > > > > > > > base
> > > > > > > > >>>> > > >> interface
> > > > > > > > >>>> > > >> > suggests it is not optional, so the developers
> > > SHOULD
> > > > > > > > >>>> implement it.
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > When it comes to this FLIP, I think it meets the
> > > > > criteria
> > > > > > > of
> > > > > > > > >>>> > > >> non-optional
> > > > > > > > >>>> > > >> > features, so we should just use the evolution
> > path
> > > of
> > > > > > > > >>>> non-optional
> > > > > > > > >>>> > > >> > features.
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > Thanks,
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > > > > > > > >>>> > pnowojski@apache.org>
> > > > > > > > >>>> > > >> > wrote:
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >> > > Hi,
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> > > Actually previously I thought about having a
> > > > > decorative
> > > > > > > > >>>> interface
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> > > whenever watermark alignment is enabled,
> > checking
> > > > > that
> > > > > > > the
> > > > > > > > >>>> source
> > > > > > > > >>>> > > >> > > implements the decorative interface. If not,
> > > > throwing
> > > > > > an
> > > > > > > > >>>> > exception.
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> > > The option with default methods in the source
> > > > > > interfaces
> > > > > > > > >>>> throwing
> > > > > > > > >>>> > > >> > > `UnsupportedOperationException` I think still
> > > > suffers
> > > > > > > from
> > > > > > > > >>>> the
> > > > > > > > >>>> > same
> > > > > > > > >>>> > > >> > > problems I mentioned before. It's still an
> > > optional
> > > > > > > > >>>> implementation
> > > > > > > > >>>> > > >> and at
> > > > > > > > >>>> > > >> > > the same time it's clogging the base
> > interface. I
> > > > > > think I
> > > > > > > > >>>> would
> > > > > > > > >>>> > still
> > > > > > > > >>>> > > >> > vote
> > > > > > > > >>>> > > >> > > soft -1 on this option, but I wouldn't block it
> > > in
> > > > > > case I
> > > > > > > > am
> > > > > > > > >>>> > > >> out-voted.
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> > > Best,
> > > > > > > > >>>> > > >> > > Piotrek
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > > > > > > > >>>> > sebastian@ververica.com>
> > > > > > > > >>>> > > >> > > napisał(a):
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> > > > Hi Becket,
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > > Thanks a lot for your fast and detailed
> > > response.
> > > > > For
> > > > > > > me,
> > > > > > > > >>>> it
> > > > > > > > >>>> > > >> converges
> > > > > > > > >>>> > > >> > > and
> > > > > > > > >>>> > > >> > > > dropping the supportsX method sounds very
> > > > > reasonable
> > > > > > to
> > > > > > > > me.
> > > > > > > > >>>> > (Side
> > > > > > > > >>>> > > >> note:
> > > > > > > > >>>> > > >> > > > With "pausable splits" enabled as "default" I
> > > > think
> > > > > > we
> > > > > > > > >>>> > > >> misunderstood.
> > > > > > > > >>>> > > >> > As
> > > > > > > > >>>> > > >> > > > you described now "default" I understand as
> > > that
> > > > it
> > > > > > > > should
> > > > > > > > >>>> be
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> new
> > > > > > > > >>>> > > >> > > > recommended way of implementation, and I
> > think
> > > > that
> > > > > > is
> > > > > > > > >>>> fully
> > > > > > > > >>>> > valid.
> > > > > > > > >>>> > > >> > > Before,
> > > > > > > > >>>> > > >> > > > I understood "default" here as the default
> > > > > > > > implementation,
> > > > > > > > >>>> i.e.,
> > > > > > > > >>>> > > >> > throwing
> > > > > > > > >>>> > > >> > > > UnsupportedOperationException, which is the
> > > exact
> > > > > > > > >>>> opposite. :) )
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > > Nevertheless: As mentioned, an open question
> > > for
> > > > me
> > > > > > is
> > > > > > > if
> > > > > > > > >>>> > watermark
> > > > > > > > >>>> > > >> > > > alignment should enforce pausable splits. For
> > > > > > > > >>>> clarification, the
> > > > > > > > >>>> > > >> > current
> > > > > > > > >>>> > > >> > > > documentation [1] says:
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning
> > > > across
> > > > > > > tasks
> > > > > > > > >>>> of the
> > > > > > > > >>>> > same
> > > > > > > > >>>> > > >> > > >> source and/or different sources. It does not
> > > > > support
> > > > > > > > >>>> aligning
> > > > > > > > >>>> > > >> > > >> splits/partitions/shards in the same task.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> In a case where there are e.g. two Kafka
> > > > > partitions
> > > > > > > that
> > > > > > > > >>>> > produce
> > > > > > > > >>>> > > >> > > >> watermarks at different pace, that get
> > > assigned
> > > > to
> > > > > > the
> > > > > > > > >>>> same
> > > > > > > > >>>> > task
> > > > > > > > >>>> > > >> > > watermark
> > > > > > > > >>>> > > >> > > >> might not behave as expected. Fortunately,
> > > worst
> > > > > > case
> > > > > > > it
> > > > > > > > >>>> > should not
> > > > > > > > >>>> > > >> > > perform
> > > > > > > > >>>> > > >> > > >> worse than without alignment.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> Given the limitation above, we suggest
> > > applying
> > > > > > > > watermark
> > > > > > > > >>>> > > >> alignment in
> > > > > > > > >>>> > > >> > > >> two situations:
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>    1. You have two different sources (e.g.
> > > Kafka
> > > > > and
> > > > > > > > >>>> File) that
> > > > > > > > >>>> > > >> > produce
> > > > > > > > >>>> > > >> > > >>    watermarks at different speeds
> > > > > > > > >>>> > > >> > > >>    2. You run your source with parallelism
> > > equal
> > > > > to
> > > > > > > the
> > > > > > > > >>>> number
> > > > > > > > >>>> > of
> > > > > > > > >>>> > > >> > > >>    splits/shards/partitions, which results
> > in
> > > > > every
> > > > > > > > >>>> subtask
> > > > > > > > >>>> > being
> > > > > > > > >>>> > > >> > > assigned a
> > > > > > > > >>>> > > >> > > >>    single unit of work.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> I personally see no issue in implementing
> > and
> > > I
> > > > > see
> > > > > > no
> > > > > > > > >>>> reason
> > > > > > > > >>>> > > >> against
> > > > > > > > >>>> > > >> > > > implementing this dependency of watermark
> > > > alignment
> > > > > > and
> > > > > > > > >>>> pausable
> > > > > > > > >>>> > > >> > splits.
> > > > > > > > >>>> > > >> > > (I
> > > > > > > > >>>> > > >> > > > think this would even be a good path towards
> > > > > shaping
> > > > > > > > >>>> watermark
> > > > > > > > >>>> > > >> > alignment
> > > > > > > > >>>> > > >> > > in
> > > > > > > > >>>> > > >> > > > 1.16.) However, "I don't see" means that I
> > > would
> > > > be
> > > > > > > happy
> > > > > > > > >>>> to
> > > > > > > > >>>> > hear
> > > > > > > > >>>> > > >> > Dawid's
> > > > > > > > >>>> > > >> > > > and Piotrek's opinions as they implemented
> > > > > watermark
> > > > > > > > >>>> alignment
> > > > > > > > >>>> > > >> based on
> > > > > > > > >>>> > > >> > > > FLIP-182 [2] and I don't want to miss
> > relevant
> > > > > > > > >>>> > rationale/background
> > > > > > > > >>>> > > >> > info
> > > > > > > > >>>> > > >> > > > from their side.
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > > Regards,
> > > > > > > > >>>> > > >> > > > Sebastian
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > > [1]
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > > > > > > >>>> > > >> > > > [2]
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > > > > > > > >>>> > becket.qin@gmail.com>
> > > > > > > > >>>> > > >> > wrote:
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > >> +dev
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> Hi Sebastian,
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> Thank you for the summary. Please see the
> > > > detailed
> > > > > > > > replies
> > > > > > > > >>>> > inline.
> > > > > > > > >>>> > > >> As
> > > > > > > > >>>> > > >> > a
> > > > > > > > >>>> > > >> > > >> recap of my suggestions.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> 1. Pausable splits API.
> > > > > > > > >>>> > > >> > > >>   a) Add default implementations to methods
> > > > > > > > >>>> > "pauseOrResumeSplits"
> > > > > > > > >>>> > > >> in
> > > > > > > > >>>> > > >> > > both
> > > > > > > > >>>> > > >> > > >> SourceReader and SplitReader where both
> > > default
> > > > > > > > >>>> implementations
> > > > > > > > >>>> > > >> throw
> > > > > > > > >>>> > > >> > > >>  UnsupportedOperationException.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> 2. User story.
> > > > > > > > >>>> > > >> > > >>     a) We tell users to enable the watermark
> > > > > > alignment
> > > > > > > > as
> > > > > > > > >>>> they
> > > > > > > > >>>> > > >> like.
> > > > > > > > >>>> > > >> > > This
> > > > > > > > >>>> > > >> > > >> is exactly what the current Flink API is.
> > > > > > > > >>>> > > >> > > >>     b) We tell the source developers, please
> > > > > > implement
> > > > > > > > >>>> pausable
> > > > > > > > >>>> > > >> > splits,
> > > > > > > > >>>> > > >> > > >> otherwise bad things may happen. Think of it
> > > > like
> > > > > > you
> > > > > > > > are
> > > > > > > > >>>> > expected
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >> implement SourceReader#snapshotState()
> > > properly,
> > > > > > > > otherwise
> > > > > > > > >>>> > > >> exceptions
> > > > > > > > >>>> > > >> > > will
> > > > > > > > >>>> > > >> > > >> be thrown when users enable checkpointing.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> Thanks,
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian
> > > > > Mattheis <
> > > > > > > > >>>> > > >> > > >> sebastian@ververica.com> wrote:
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>> Hi Becket, Hi everybody,
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> I'm sorry if I misread the messages but I
> > > could
> > > > > not
> > > > > > > > >>>> derive an
> > > > > > > > >>>> > > >> > agreement
> > > > > > > > >>>> > > >> > > >>> from the mailing list. Nevertheless, if I
> > > > > > understand
> > > > > > > > you
> > > > > > > > >>>> > right the
> > > > > > > > >>>> > > >> > > >>> suggestion is:
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> * Add default implementations to methods
> > > > > > > > >>>> > "pauseOrResumeSplits" in
> > > > > > > > >>>> > > >> > both
> > > > > > > > >>>> > > >> > > >>> SourceReader and SplitReader where both
> > > default
> > > > > > > > >>>> > implementations
> > > > > > > > >>>> > > >> throw
> > > > > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >> Yes.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the
> > > > Source
> > > > > > > > >>>> interface.
> > > > > > > > >>>> > (In
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>> following, I refer to supporting this as
> > > > > "pausable
> > > > > > > > >>>> splits".)
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >> We may no longer need this if pausable
> > splits
> > > > are
> > > > > > > > >>>> expected to
> > > > > > > > >>>> > be
> > > > > > > > >>>> > > >> > > >> implemented by the source developers, i.e.
> > > > > > > non-optional.
> > > > > > > > >>>> Having
> > > > > > > > >>>> > > >> this
> > > > > > > > >>>> > > >> > > method
> > > > > > > > >>>> > > >> > > >> would then be somewhat misleading as it
> > looks
> > > > like
> > > > > > the
> > > > > > > > >>>> sources
> > > > > > > > >>>> > > >> that do
> > > > > > > > >>>> > > >> > > not
> > > > > > > > >>>> > > >> > > >> support pausable splits are also acceptable
> > in
> > > > the
> > > > > > > long
> > > > > > > > >>>> term.
> > > > > > > > >>>> > So
> > > > > > > > >>>> > > >> API
> > > > > > > > >>>> > > >> > > wise,
> > > > > > > > >>>> > > >> > > >> I'd say maybe we should remove this for this
> > > > FLIP,
> > > > > > > > >>>> although I
> > > > > > > > >>>> > > >> believe
> > > > > > > > >>>> > > >> > > this
> > > > > > > > >>>> > > >> > > >> supportXXX pattern itself is still
> > attractive
> > > > for
> > > > > > > > optional
> > > > > > > > >>>> > > >> features.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> To make the conclusions explicit:
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> 1. The implementation of
> > pauseOrResumeSplits
> > > in
> > > > > > both
> > > > > > > > >>>> > interfaces
> > > > > > > > >>>> > > >> > > >>> SourceReader and SplitReader are optional
> > > where
> > > > > the
> > > > > > > > >>>> default is
> > > > > > > > >>>> > > >> that
> > > > > > > > >>>> > > >> > it
> > > > > > > > >>>> > > >> > > >>> doesn't support it. (--> This means that
> > the
> > > > > > > > >>>> implementation is
> > > > > > > > >>>> > > >> still
> > > > > > > > >>>> > > >> > > >>> optional for the source developer.)
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >> It is optional for backwards compatibility
> > > with
> > > > > > > existing
> > > > > > > > >>>> > sources,
> > > > > > > > >>>> > > >> as
> > > > > > > > >>>> > > >> > > they
> > > > > > > > >>>> > > >> > > >> may still compile without code change. But
> > > > > starting
> > > > > > > from
> > > > > > > > >>>> this
> > > > > > > > >>>> > FLIP,
> > > > > > > > >>>> > > >> > > Flink
> > > > > > > > >>>> > > >> > > >> will always optimistically assume that all
> > the
> > > > > > sources
> > > > > > > > >>>> support
> > > > > > > > >>>> > > >> > pausable
> > > > > > > > >>>> > > >> > > >> splits. If a source does not support
> > pausable
> > > > > > splits,
> > > > > > > it
> > > > > > > > >>>> goes
> > > > > > > > >>>> > to an
> > > > > > > > >>>> > > >> > > error
> > > > > > > > >>>> > > >> > > >> handling path when watermark alignment is
> > > > enabled
> > > > > on
> > > > > > > it.
> > > > > > > > >>>> This
> > > > > > > > >>>> > is
> > > > > > > > >>>> > > >> > > different
> > > > > > > > >>>> > > >> > > >> from a usual optional feature, where no
> > error
> > > is
> > > > > > > > expected.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>> 2. If watermark alignment is enabled in the
> > > > > > > application
> > > > > > > > >>>> code
> > > > > > > > >>>> > by
> > > > > > > > >>>> > > >> > adding
> > > > > > > > >>>> > > >> > > >>> withWatermarkAlignment to the
> > > WatermarkStrategy
> > > > > > while
> > > > > > > > >>>> > > >> SourceReader or
> > > > > > > > >>>> > > >> > > >>> SplitReader do not support pausableSplits,
> > we
> > > > > throw
> > > > > > > an
> > > > > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >> Yes.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>> 3. With regard to your statement:
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>>> [...] basically means watermark alignment
> > is
> > > > an
> > > > > > > > >>>> non-optional
> > > > > > > > >>>> > > >> feature
> > > > > > > > >>>> > > >> > > to
> > > > > > > > >>>> > > >> > > >>>> the end users.
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> You actually mean that "pausable splits"
> > are
> > > > > > > > >>>> non-optional for
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> app
> > > > > > > > >>>> > > >> > > >>> developer if watermark alignment is
> > enabled.
> > > > > > However,
> > > > > > > > >>>> > watermark
> > > > > > > > >>>> > > >> > > alignment
> > > > > > > > >>>> > > >> > > >>> is optional and can be enabled/disabled.
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >> Yes, watermark alignment can be
> > > enabled/disabled
> > > > > in
> > > > > > > > >>>> individual
> > > > > > > > >>>> > > >> sources
> > > > > > > > >>>> > > >> > > in
> > > > > > > > >>>> > > >> > > >> Flink jobs, which basically means the code
> > > > > > supporting
> > > > > > > > >>>> watermark
> > > > > > > > >>>> > > >> > > alignment
> > > > > > > > >>>> > > >> > > >> has to already be there. That again means
> > the
> > > > > Source
> > > > > > > > >>>> > developers are
> > > > > > > > >>>> > > >> > also
> > > > > > > > >>>> > > >> > > >> expected to support pausable splits by
> > > default.
> > > > So
> > > > > > > this
> > > > > > > > >>>> way we
> > > > > > > > >>>> > > >> > > essentially
> > > > > > > > >>>> > > >> > > >> tell the end users that you may enable /
> > > disable
> > > > > > this
> > > > > > > > >>>> feature
> > > > > > > > >>>> > as
> > > > > > > > >>>> > > >> you
> > > > > > > > >>>> > > >> > > wish,
> > > > > > > > >>>> > > >> > > >> and tell the source developers that you
> > SHOULD
> > > > > > > implement
> > > > > > > > >>>> this
> > > > > > > > >>>> > > >> because
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >> end users may turn it on/off at will. And if
> > > the
> > > > > > > source
> > > > > > > > >>>> does
> > > > > > > > >>>> > not
> > > > > > > > >>>> > > >> > support
> > > > > > > > >>>> > > >> > > >> pausable splits, that goes to an error
> > > handling
> > > > > path
> > > > > > > > when
> > > > > > > > >>>> > watermark
> > > > > > > > >>>> > > >> > > >> alignment is enabled on it. So users know
> > they
> > > > > have
> > > > > > to
> > > > > > > > >>>> > explicitly
> > > > > > > > >>>> > > >> > > exclude
> > > > > > > > >>>> > > >> > > >> this source.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> So far it's totally clear to me and I hope
> > > this
> > > > > is
> > > > > > > what
> > > > > > > > >>>> you
> > > > > > > > >>>> > mean.
> > > > > > > > >>>> > > >> I
> > > > > > > > >>>> > > >> > > also
> > > > > > > > >>>> > > >> > > >>> agree with both statements:
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> So making that expectation aligned with the
> > > > > source
> > > > > > > > >>>> developers
> > > > > > > > >>>> > > >> seems
> > > > > > > > >>>> > > >> > > >>>> reasonable.
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> I think this is a simple and clean solution
> > > > from
> > > > > > both
> > > > > > > > >>>> the end
> > > > > > > > >>>> > user
> > > > > > > > >>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>> source developers' standpoint.
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> However, a last conclusion derives from 3.
> > > and
> > > > is
> > > > > > an
> > > > > > > > open
> > > > > > > > >>>> > question
> > > > > > > > >>>> > > >> > for
> > > > > > > > >>>> > > >> > > >>> me:
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> 4. The feature of "pausable splits" is now
> > > > > tightly
> > > > > > > > bound
> > > > > > > > >>>> to
> > > > > > > > >>>> > > >> watermark
> > > > > > > > >>>> > > >> > > >>> alignment, i.e., if sources do not support
> > > > > > "pausable
> > > > > > > > >>>> splits"
> > > > > > > > >>>> > one
> > > > > > > > >>>> > > >> can
> > > > > > > > >>>> > > >> > > not
> > > > > > > > >>>> > > >> > > >>> enable watermark alignment for these
> > sources.
> > > > > This
> > > > > > > > >>>> dependency
> > > > > > > > >>>> > is
> > > > > > > > >>>> > > >> not
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >>> current status of watermark alignment
> > > > > > implementation
> > > > > > > > >>>> because
> > > > > > > > >>>> > it
> > > > > > > > >>>> > > >> > is/was
> > > > > > > > >>>> > > >> > > >>> implemented without pausable splits. Do we
> > > want
> > > > > to
> > > > > > > > >>>> introduce
> > > > > > > > >>>> > this
> > > > > > > > >>>> > > >> > > >>> dependency? (This is an open question. I
> > > cannot
> > > > > > judge
> > > > > > > > >>>> that.)
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >> The watermark alignment basically relies on
> > > the
> > > > > > > pausable
> > > > > > > > >>>> > splits,
> > > > > > > > >>>> > > >> > right?
> > > > > > > > >>>> > > >> > > >> So personally I found it quite reasonable
> > that
> > > > if
> > > > > > the
> > > > > > > > >>>> source
> > > > > > > > >>>> > does
> > > > > > > > >>>> > > >> not
> > > > > > > > >>>> > > >> > > >> support pausable splits, end users cannot
> > > enable
> > > > > > > > watermark
> > > > > > > > >>>> > > >> alignment
> > > > > > > > >>>> > > >> > on
> > > > > > > > >>>> > > >> > > it.
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>
> > > > > > > > >>>> > > >> > > >>> If something is wrong, please correct me.
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> Regards,
> > > > > > > > >>>> > > >> > > >>> Sebastian
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin
> > <
> > > > > > > > >>>> > becket.qin@gmail.com>
> > > > > > > > >>>> > > >> > > wrote:
> > > > > > > > >>>> > > >> > > >>>
> > > > > > > > >>>> > > >> > > >>>> Hi Sebastian,
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>> Thanks for the reply and patient
> > > discussion. I
> > > > > > agree
> > > > > > > > >>>> this is
> > > > > > > > >>>> > a
> > > > > > > > >>>> > > >> > tricky
> > > > > > > > >>>> > > >> > > >>>> decision.
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns
> > > about
> > > > > > Option
> > > > > > > > c)
> > > > > > > > >>>> > which I
> > > > > > > > >>>> > > >> see
> > > > > > > > >>>> > > >> > as
> > > > > > > > >>>> > > >> > > >>>>> follows:
> > > > > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > > > > implementation
> > > > > > > > >>>> makes the
> > > > > > > > >>>> > > >> > > >>>>> implementation optional. And in my
> > > opinion, a
> > > > > > > default
> > > > > > > > >>>> > > >> > implementation
> > > > > > > > >>>> > > >> > > is and
> > > > > > > > >>>> > > >> > > >>>>> will remain a way of making
> > implementation
> > > > > > optional
> > > > > > > > >>>> because
> > > > > > > > >>>> > > >> even in
> > > > > > > > >>>> > > >> > > future
> > > > > > > > >>>> > > >> > > >>>>> a developer can decide to implement the
> > > "old
> > > > > > > flavor"
> > > > > > > > >>>> without
> > > > > > > > >>>> > > >> > support
> > > > > > > > >>>> > > >> > > for
> > > > > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> > > > find
> > > > > it
> > > > > > > > >>>> suboptimal
> > > > > > > > >>>> > > >> that
> > > > > > > > >>>> > > >> > > with
> > > > > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no
> > > way
> > > > > to
> > > > > > > > check
> > > > > > > > >>>> at
> > > > > > > > >>>> > > >> runtime
> > > > > > > > >>>> > > >> > if
> > > > > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually
> > > support
> > > > > > > pausing.
> > > > > > > > >>>> (To
> > > > > > > > >>>> > do so,
> > > > > > > > >>>> > > >> > one
> > > > > > > > >>>> > > >> > > would
> > > > > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it
> > > again
> > > > > more
> > > > > > > > >>>> > complicated.)\
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>> Based on the last few messages in the
> > > mailing
> > > > > > list.
> > > > > > > > >>>> Piotr
> > > > > > > > >>>> > and I
> > > > > > > > >>>> > > >> > > agreed
> > > > > > > > >>>> > > >> > > >>>> that the default implementation should
> > just
> > > > > throw
> > > > > > an
> > > > > > > > >>>> > > >> > > >>>> UnsupportedOperationException if the
> > source
> > > is
> > > > > > > > >>>> unpausable. So
> > > > > > > > >>>> > > >> this
> > > > > > > > >>>> > > >> > > >>>> basically tells the Source developers that
> > > > this
> > > > > > > > feature
> > > > > > > > >>>> is
> > > > > > > > >>>> > > >> expected
> > > > > > > > >>>> > > >> > > to be
> > > > > > > > >>>> > > >> > > >>>> supported. Because we cannot prevent end
> > > users
> > > > > > from
> > > > > > > > >>>> putting
> > > > > > > > >>>> > an
> > > > > > > > >>>> > > >> > > unpausable
> > > > > > > > >>>> > > >> > > >>>> source into the watermark alignment group,
> > > > that
> > > > > > > > >>>> basically
> > > > > > > > >>>> > means
> > > > > > > > >>>> > > >> > > watermark
> > > > > > > > >>>> > > >> > > >>>> alignment is an non-optional feature to
> > the
> > > > end
> > > > > > > users.
> > > > > > > > >>>> So
> > > > > > > > >>>> > making
> > > > > > > > >>>> > > >> > that
> > > > > > > > >>>> > > >> > > >>>> expectation aligned with the source
> > > developers
> > > > > > seems
> > > > > > > > >>>> > reasonable.
> > > > > > > > >>>> > > >> > And
> > > > > > > > >>>> > > >> > > if a
> > > > > > > > >>>> > > >> > > >>>> source does not support this feature, the
> > > end
> > > > > > users
> > > > > > > > >>>> should
> > > > > > > > >>>> > > >> > explicitly
> > > > > > > > >>>> > > >> > > >>>> remove that source from the watermark
> > > > alignment
> > > > > > > group.
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>> Personally speaking I think this is a
> > simple
> > > > and
> > > > > > > clean
> > > > > > > > >>>> > solution
> > > > > > > > >>>> > > >> from
> > > > > > > > >>>> > > >> > > >>>> both the end user and source developers'
> > > > > > standpoint.
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>> Does this address your concerns?
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>> Thanks,
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian
> > > > > > Mattheis <
> > > > > > > > >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> > > > > > > > >>>> > > >> > > >>>>
> > > > > > > > >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> > > > > > > > >>>> suggestions/options
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> we
> > > > > > > > >>>> > > >> > > >>>>> would be okay either way because we find
> > > > > neither
> > > > > > > > >>>> solution is
> > > > > > > > >>>> > > >> > perfect
> > > > > > > > >>>> > > >> > > just
> > > > > > > > >>>> > > >> > > >>>>> because of the already present
> > complexity.
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> Option c) Adding methods to the
> > interfaces
> > > of
> > > > > > > > >>>> SourceReader
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > > > > >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to
> > > be
> > > > > used
> > > > > > > by
> > > > > > > > >>>> > > >> SourceReader
> > > > > > > > >>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> As of the current status (v. 12) of the
> > > FLIP
> > > > > [1],
> > > > > > > it
> > > > > > > > is
> > > > > > > > >>>> > based on
> > > > > > > > >>>> > > >> > > >>>>> Option c) which we find acceptable
> > because
> > > > the
> > > > > > > > >>>> complexity
> > > > > > > > >>>> > added
> > > > > > > > >>>> > > >> is
> > > > > > > > >>>> > > >> > > only a
> > > > > > > > >>>> > > >> > > >>>>> single method.
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns
> > > about
> > > > > > Option
> > > > > > > > c)
> > > > > > > > >>>> > which I
> > > > > > > > >>>> > > >> see
> > > > > > > > >>>> > > >> > as
> > > > > > > > >>>> > > >> > > >>>>> follows:
> > > > > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > > > > implementation
> > > > > > > > >>>> makes the
> > > > > > > > >>>> > > >> > > >>>>> implementation optional. And in my
> > > opinion, a
> > > > > > > default
> > > > > > > > >>>> > > >> > implementation
> > > > > > > > >>>> > > >> > > is and
> > > > > > > > >>>> > > >> > > >>>>> will remain a way of making
> > implementation
> > > > > > optional
> > > > > > > > >>>> because
> > > > > > > > >>>> > > >> even in
> > > > > > > > >>>> > > >> > > future
> > > > > > > > >>>> > > >> > > >>>>> a developer can decide to implement the
> > > "old
> > > > > > > flavor"
> > > > > > > > >>>> without
> > > > > > > > >>>> > > >> > support
> > > > > > > > >>>> > > >> > > for
> > > > > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> > > > find
> > > > > it
> > > > > > > > >>>> suboptimal
> > > > > > > > >>>> > > >> that
> > > > > > > > >>>> > > >> > > with
> > > > > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no
> > > way
> > > > > to
> > > > > > > > check
> > > > > > > > >>>> at
> > > > > > > > >>>> > > >> runtime
> > > > > > > > >>>> > > >> > if
> > > > > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually
> > > support
> > > > > > > pausing.
> > > > > > > > >>>> (To
> > > > > > > > >>>> > do so,
> > > > > > > > >>>> > > >> > one
> > > > > > > > >>>> > > >> > > would
> > > > > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it
> > > again
> > > > > more
> > > > > > > > >>>> > complicated.)
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> However, we haven't changed it because
> > > Option
> > > > > a)
> > > > > > is
> > > > > > > > >>>> also not
> > > > > > > > >>>> > > >> > optimal
> > > > > > > > >>>> > > >> > > >>>>> or straight-forward:
> > > > > > > > >>>> > > >> > > >>>>> (1) We need to add two distinct yet
> > similar
> > > > > > > > decorative
> > > > > > > > >>>> > > >> interfaces
> > > > > > > > >>>> > > >> > > >>>>> since, as mentioned, the signatures of
> > the
> > > > > > methods
> > > > > > > > are
> > > > > > > > >>>> > > >> different.
> > > > > > > > >>>> > > >> > For
> > > > > > > > >>>> > > >> > > >>>>> example, we would need decorative
> > > interfaces
> > > > > like
> > > > > > > > >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > > > > > > > >>>> > > >> > `SourceReaderWithPausableSplits`.
> > > > > > > > >>>> > > >> > > >>>>> (2) As a consequence, we would need to
> > > > somehow
> > > > > > > > document
> > > > > > > > >>>> > > >> how/where
> > > > > > > > >>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>> implement both interfaces and how this
> > > > relates
> > > > > to
> > > > > > > > each
> > > > > > > > >>>> > other.
> > > > > > > > >>>> > > >> This
> > > > > > > > >>>> > > >> > > we could
> > > > > > > > >>>> > > >> > > >>>>> solve by adding a note in the interface
> > of
> > > > > > > > >>>> SourceReader and
> > > > > > > > >>>> > > >> > > SplitReader and
> > > > > > > > >>>> > > >> > > >>>>> reference to the decorative interfaces
> > but
> > > it
> > > > > > still
> > > > > > > > >>>> > increases
> > > > > > > > >>>> > > >> > > complexity
> > > > > > > > >>>> > > >> > > >>>>> too.
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> In summary, we see both as acceptable and
> > > > > > preferred
> > > > > > > > >>>> over
> > > > > > > > >>>> > other
> > > > > > > > >>>> > > >> > > >>>>> options. The question is if we can find a
> > > > > > solution
> > > > > > > or
> > > > > > > > >>>> > compromise
> > > > > > > > >>>> > > >> > > that is
> > > > > > > > >>>> > > >> > > >>>>> acceptable for everybody to reach
> > > consensus.
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> Please let us know what you think because
> > > we
> > > > > > would
> > > > > > > be
> > > > > > > > >>>> happy
> > > > > > > > >>>> > if
> > > > > > > > >>>> > > >> we
> > > > > > > > >>>> > > >> > can
> > > > > > > > >>>> > > >> > > >>>>> conclude the discussion to avoid dropping
> > > the
> > > > > > > > >>>> initiative on
> > > > > > > > >>>> > this
> > > > > > > > >>>> > > >> > > FLIP.
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> Regards,
> > > > > > > > >>>> > > >> > > >>>>> Sebastian
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> [1]
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > > > > > > >>>> > > >> > > >>>>> (v. 12)
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr
> > > > Nowojski
> > > > > <
> > > > > > > > >>>> > > >> > pnowojski@apache.org
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > >>>>> wrote:
> > > > > > > > >>>> > > >> > > >>>>>
> > > > > > > > >>>> > > >> > > >>>>>> Hi Guowei,
> > > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > > >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages
> > back:
> > > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > > >>>> > > >> > > >>>>>> > This is covered in the previous
> > FLIP[1]
> > > > > which
> > > > > > > has
> > > > > > > > >>>> been
> > > > > > > > >>>> > > >> already
> > > > > > > > >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must
> > be
> > > > > > enabled
> > > > > > > > >>>> with the
> > > > > > > > >>>> > > >> > watermark
> > > > > > > > >>>> > > >> > > >>>>>> strategy which also configures drift and
> > > > > update
> > > > > > > > >>>> interval
> > > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > > >>>> > > >> > > >>>>>> So by default watermark alignment is
> > > > disabled,
> > > > > > > > >>>> regardless
> > > > > > > > >>>> > if a
> > > > > > > > >>>> > > >> > > source
> > > > > > > > >>>> > > >> > > >>>>>> supports it or not.
> > > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > > >>>> > > >> > > >>>>>> Best,
> > > > > > > > >>>> > > >> > > >>>>>> Piotrek
> > > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > > >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> > > > > > > > >>>> guowei.mgw@gmail.com>
> > > > > > > > >>>> > > >> > > napisał(a):
> > > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>> Hi,
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>> We know that in the case of Bounded
> > input
> > > > > Flink
> > > > > > > > >>>> supports
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > Batch
> > > > > > > > >>>> > > >> > > >>>>>>> execution mode. Currently in Batch
> > > > execution
> > > > > > > mode,
> > > > > > > > >>>> flink
> > > > > > > > >>>> > is
> > > > > > > > >>>> > > >> > > executed
> > > > > > > > >>>> > > >> > > >>>>>>> on a
> > > > > > > > >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way,
> > > perhaps
> > > > > > > > watermark
> > > > > > > > >>>> > alignment
> > > > > > > > >>>> > > >> > > might
> > > > > > > > >>>> > > >> > > >>>>>>> not
> > > > > > > > >>>> > > >> > > >>>>>>> gain much.
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>> So my question is: Is watermark
> > alignment
> > > > the
> > > > > > > > default
> > > > > > > > >>>> > > >> > behavior(for
> > > > > > > > >>>> > > >> > > >>>>>>> implemented source only)? If so, have
> > you
> > > > > > > > considered
> > > > > > > > >>>> > > >> evaluating
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> impact
> > > > > > > > >>>> > > >> > > >>>>>>> of this behavior on the Batch execution
> > > > mode?
> > > > > > Or
> > > > > > > > >>>> thinks
> > > > > > > > >>>> > it is
> > > > > > > > >>>> > > >> not
> > > > > > > > >>>> > > >> > > >>>>>>> necessary.
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>> Correct me if I miss something.
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> Guowei
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr
> > > > > Nowojski <
> > > > > > > > >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > > > > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > I feel that no matter which option
> > we
> > > > > > choose
> > > > > > > > >>>> this can
> > > > > > > > >>>> > not
> > > > > > > > >>>> > > >> be
> > > > > > > > >>>> > > >> > > >>>>>>> solved
> > > > > > > > >>>> > > >> > > >>>>>>> > entirely in either of the options,
> > > > because
> > > > > of
> > > > > > > the
> > > > > > > > >>>> point
> > > > > > > > >>>> > > >> above
> > > > > > > > >>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>>>> because
> > > > > > > > >>>> > > >> > > >>>>>>> > the signature of
> > > > > > > SplitReader#pauseOrResumeSplits
> > > > > > > > >>>> and
> > > > > > > > >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are
> > > > > slightly
> > > > > > > > >>>> different
> > > > > > > > >>>> > (one
> > > > > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > > > > >>>> > > >> > > >>>>>>> > splits with splitId the other one
> > > passes
> > > > > the
> > > > > > > > splits
> > > > > > > > >>>> > > >> directly).
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case
> > > and
> > > > > for
> > > > > > > > >>>> features
> > > > > > > > >>>> > that
> > > > > > > > >>>> > > >> > need
> > > > > > > > >>>> > > >> > > >>>>>>> to be
> > > > > > > > >>>> > > >> > > >>>>>>> > implemented in more than one place.
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing
> > > reading
> > > > > > from
> > > > > > > a
> > > > > > > > >>>> split
> > > > > > > > >>>> > an
> > > > > > > > >>>> > > >> > > optional
> > > > > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > > > > >>>> > > >> > > >>>>>>> > > other than that this was not
> > included
> > > > in
> > > > > > the
> > > > > > > > >>>> original
> > > > > > > > >>>> > > >> > > interface?
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > An additional argument in favor of
> > > making
> > > > > it
> > > > > > > > >>>> optional
> > > > > > > > >>>> > is to
> > > > > > > > >>>> > > >> > > >>>>>>> simplify source
> > > > > > > > >>>> > > >> > > >>>>>>> > implementation. But on its own I'm
> > not
> > > > sure
> > > > > > if
> > > > > > > > that
> > > > > > > > >>>> > would be
> > > > > > > > >>>> > > >> > > >>>>>>> enough to
> > > > > > > > >>>> > > >> > > >>>>>>> > justify making this feature optional.
> > > > > Maybe.
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > > > > clearer
> > > > > > > to
> > > > > > > > >>>> just
> > > > > > > > >>>> > let
> > > > > > > > >>>> > > >> end
> > > > > > > > >>>> > > >> > > >>>>>>> users and
> > > > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> > > > implement
> > > > > > this
> > > > > > > > >>>> feature.
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > As I wrote above that would be an
> > > > > interesting
> > > > > > > > >>>> choice to
> > > > > > > > >>>> > make
> > > > > > > > >>>> > > >> > > (ease
> > > > > > > > >>>> > > >> > > >>>>>>> of
> > > > > > > > >>>> > > >> > > >>>>>>> > implementation for new users, vs
> > system
> > > > > > > > >>>> consistency).
> > > > > > > > >>>> > > >> > Regardless
> > > > > > > > >>>> > > >> > > >>>>>>> of that,
> > > > > > > > >>>> > > >> > > >>>>>>> > yes, for me the main argument is the
> > > API
> > > > > > > backward
> > > > > > > > >>>> > > >> > compatibility.
> > > > > > > > >>>> > > >> > > >>>>>>> But let's
> > > > > > > > >>>> > > >> > > >>>>>>> > clear a couple of points:
> > > > > > > > >>>> > > >> > > >>>>>>> > - The current proposal adding methods
> > > to
> > > > > the
> > > > > > > base
> > > > > > > > >>>> > interface
> > > > > > > > >>>> > > >> > with
> > > > > > > > >>>> > > >> > > >>>>>>> default
> > > > > > > > >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL
> > feature.
> > > > > Same
> > > > > > as
> > > > > > > > the
> > > > > > > > >>>> > > >> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> version
> > > > > > > > >>>> > > >> > > >>>>>>> > would be.
> > > > > > > > >>>> > > >> > > >>>>>>> > - Decorative version could implement
> > > > "throw
> > > > > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> > > > > > > > >>>> > > >> > > >>>>>>> > if user enabled watermark alignment
> > > just
> > > > as
> > > > > > > well
> > > > > > > > >>>> and I
> > > > > > > > >>>> > agree
> > > > > > > > >>>> > > >> > > >>>>>>> that's a
> > > > > > > > >>>> > > >> > > >>>>>>> > better option compared to logging a
> > > > > warning.
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > Piotrek
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > > > > > > > >>>> > becket.qin@gmail.com>
> > > > > > > > >>>> > > >> > > >>>>>>> napisał(a):
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient
> > > > > > discussion,
> > > > > > > > >>>> Piotr and
> > > > > > > > >>>> > > >> Dawid.
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing
> > > reading
> > > > > > from
> > > > > > > a
> > > > > > > > >>>> split
> > > > > > > > >>>> > an
> > > > > > > > >>>> > > >> > > optional
> > > > > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > > > > >>>> > > >> > > >>>>>>> > > other than that this was not
> > included
> > > > in
> > > > > > the
> > > > > > > > >>>> original
> > > > > > > > >>>> > > >> > > interface?
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > To be honest I am really worried
> > > about
> > > > > the
> > > > > > > > >>>> complexity
> > > > > > > > >>>> > of
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> user story
> > > > > > > > >>>> > > >> > > >>>>>>> > > here. Optional features like this
> > > have
> > > > a
> > > > > > high
> > > > > > > > >>>> > overhead.
> > > > > > > > >>>> > > >> > Imagine
> > > > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > > > >>>> > > >> > > >>>>>>> > > feature is optional, now a user
> > > enabled
> > > > > > > > watermark
> > > > > > > > >>>> > > >> alignment
> > > > > > > > >>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>>>> defined a
> > > > > > > > >>>> > > >> > > >>>>>>> > > few watermark groups. Would it
> > work?
> > > > Hmm,
> > > > > > > that
> > > > > > > > >>>> > depends on
> > > > > > > > >>>> > > >> > > >>>>>>> whether the
> > > > > > > > >>>> > > >> > > >>>>>>> > > involved Source has implmemented
> > this
> > > > > > > feature.
> > > > > > > > >>>> If the
> > > > > > > > >>>> > > >> Sources
> > > > > > > > >>>> > > >> > > >>>>>>> are well
> > > > > > > > >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise
> > end
> > > > > users
> > > > > > > may
> > > > > > > > >>>> have to
> > > > > > > > >>>> > > >> look
> > > > > > > > >>>> > > >> > > into
> > > > > > > > >>>> > > >> > > >>>>>>> the code
> > > > > > > > >>>> > > >> > > >>>>>>> > > of the Source to see whether the
> > > > feature
> > > > > is
> > > > > > > > >>>> supported.
> > > > > > > > >>>> > > >> Which
> > > > > > > > >>>> > > >> > is
> > > > > > > > >>>> > > >> > > >>>>>>> something
> > > > > > > > >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > > > > clearer
> > > > > > > to
> > > > > > > > >>>> just
> > > > > > > > >>>> > let
> > > > > > > > >>>> > > >> end
> > > > > > > > >>>> > > >> > > >>>>>>> users and
> > > > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> > > > implement
> > > > > > this
> > > > > > > > >>>> feature.
> > > > > > > > >>>> > > >> After
> > > > > > > > >>>> > > >> > > all
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to
> > > the
> > > > > end
> > > > > > > > >>>> users. If
> > > > > > > > >>>> > in
> > > > > > > > >>>> > > >> some
> > > > > > > > >>>> > > >> > > >>>>>>> rare cases,
> > > > > > > > >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a
> > > > clear
> > > > > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException
> > > > > > > > >>>> > > >> > > >>>>>>> > will
> > > > > > > > >>>> > > >> > > >>>>>>> > > be thrown to tell users to
> > explicitly
> > > > > > remove
> > > > > > > > this
> > > > > > > > >>>> > Source
> > > > > > > > >>>> > > >> from
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > watermark
> > > > > > > > >>>> > > >> > > >>>>>>> > > group. I don't think we should
> > have a
> > > > > > warning
> > > > > > > > >>>> message
> > > > > > > > >>>> > > >> here,
> > > > > > > > >>>> > > >> > as
> > > > > > > > >>>> > > >> > > >>>>>>> they tend
> > > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do
> > > > this,
> > > > > we
> > > > > > > > >>>> don't even
> > > > > > > > >>>> > > >> need
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > supportXXX
> > > > > > > > >>>> > > >> > > >>>>>>> > > method in the Source for this
> > > feature.
> > > > In
> > > > > > > fact
> > > > > > > > >>>> this is
> > > > > > > > >>>> > > >> > exactly
> > > > > > > > >>>> > > >> > > >>>>>>> how many
> > > > > > > > >>>> > > >> > > >>>>>>> > > interfaces works today. For
> > example,
> > > > > > > > >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > > > > > > >>>> > > >> > > >>>>>>> > not
> > > > > > > > >>>> > > >> > > >>>>>>> > > supported by Pravega source because
> > > it
> > > > > does
> > > > > > > not
> > > > > > > > >>>> > support
> > > > > > > > >>>> > > >> > partial
> > > > > > > > >>>> > > >> > > >>>>>>> failover.
> > > > > > > > >>>> > > >> > > >>>>>>> > > In that case, it simply throws an
> > > > > exception
> > > > > > > to
> > > > > > > > >>>> > trigger a
> > > > > > > > >>>> > > >> > global
> > > > > > > > >>>> > > >> > > >>>>>>> recovery.
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > The reason we add a default
> > > > > implementation
> > > > > > in
> > > > > > > > >>>> this
> > > > > > > > >>>> > case
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > > >>>>>>> just for
> > > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so
> > > the
> > > > > old
> > > > > > > > >>>> source can
> > > > > > > > >>>> > > >> still
> > > > > > > > >>>> > > >> > > >>>>>>> compile.
> > > > > > > > >>>> > > >> > > >>>>>>> > Sure,
> > > > > > > > >>>> > > >> > > >>>>>>> > > in short term, this feature might
> > not
> > > > be
> > > > > > > > >>>> supported by
> > > > > > > > >>>> > many
> > > > > > > > >>>> > > >> > > >>>>>>> existing
> > > > > > > > >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is
> > quite
> > > > > > visible
> > > > > > > to
> > > > > > > > >>>> the
> > > > > > > > >>>> > source
> > > > > > > > >>>> > > >> > > >>>>>>> developers
> > > > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > > > >>>> > > >> > > >>>>>>> > > they did not override the default
> > > impl
> > > > > > which
> > > > > > > > >>>> throws an
> > > > > > > > >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > @Dawid,
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX()
> > > method
> > > > > in
> > > > > > > the
> > > > > > > > >>>> Source
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> the single
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to
> > > > > > implement
> > > > > > > > >>>> this
> > > > > > > > >>>> > > >> feature.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > I also don't find it entirely true.
> > > > Half
> > > > > of
> > > > > > > the
> > > > > > > > >>>> > classes
> > > > > > > > >>>> > > >> are
> > > > > > > > >>>> > > >> > > >>>>>>> theoretically
> > > > > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes
> > > from
> > > > > the
> > > > > > > > >>>> point of
> > > > > > > > >>>> > view
> > > > > > > > >>>> > > >> how
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically
> > users
> > > do
> > > > > not
> > > > > > > > need
> > > > > > > > >>>> to
> > > > > > > > >>>> > use
> > > > > > > > >>>> > > >> any
> > > > > > > > >>>> > > >> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader.
> > > Would
> > > > > be
> > > > > > > > weird
> > > > > > > > >>>> to
> > > > > > > > >>>> > list
> > > > > > > > >>>> > > >> > their
> > > > > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java
> > > docs
> > > > is
> > > > > > to
> > > > > > > > >>>> guide
> > > > > > > > >>>> > users
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> implement the
> > > > > > > > >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the
> > > > > > preferred
> > > > > > > > way
> > > > > > > > >>>> to
> > > > > > > > >>>> > > >> > implement a
> > > > > > > > >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth
> > > mentioning
> > > > > > that.
> > > > > > > > >>>> Even the
> > > > > > > > >>>> > > >> Java
> > > > > > > > >>>> > > >> > > >>>>>>> language
> > > > > > > > >>>> > > >> > > >>>>>>> > > documentation interfaces lists the
> > > > konwn
> > > > > > > > >>>> > implementations
> > > > > > > > >>>> > > >> [1]
> > > > > > > > >>>> > > >> > so
> > > > > > > > >>>> > > >> > > >>>>>>> people
> > > > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > > > >>>> > > >> > > >>>>>>> > > leverage them. But for this
> > > particular
> > > > > > case,
> > > > > > > if
> > > > > > > > >>>> we
> > > > > > > > >>>> > make
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> feature
> > > > > > > > >>>> > > >> > > >>>>>>> > > non-optional, we don't even need
> > the
> > > > > > > > supportXXX()
> > > > > > > > >>>> > method
> > > > > > > > >>>> > > >> for
> > > > > > > > >>>> > > >> > > now.
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > Thanks,
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM
> > Dawid
> > > > > > > > Wysakowicz <
> > > > > > > > >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > First of all, let me say I am
> > happy
> > > > > with
> > > > > > > > >>>> whichever
> > > > > > > > >>>> > > >> option
> > > > > > > > >>>> > > >> > is
> > > > > > > > >>>> > > >> > > >>>>>>> agreed in
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > discussion.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points
> > > from
> > > > > the
> > > > > > > > >>>> discussion
> > > > > > > > >>>> > > >> > though:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > @Becket:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > The main argument for adding the
> > > > > methods
> > > > > > to
> > > > > > > > the
> > > > > > > > >>>> > > >> > SourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> is that
> > > > > > > > >>>> > > >> > > >>>>>>> > > these
> > > > > > > > >>>> > > >> > > >>>>>>> > > > methods are effectively
> > > NON-OPTIONAL
> > > > to
> > > > > > > > >>>> SourceReader
> > > > > > > > >>>> > > >> impl,
> > > > > > > > >>>> > > >> > > i.e.
> > > > > > > > >>>> > > >> > > >>>>>>> > starting
> > > > > > > > >>>> > > >> > > >>>>>>> > > > from this FLIP, all the
> > > SourceReaders
> > > > > > impl
> > > > > > > > are
> > > > > > > > >>>> > expected
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> support this
> > > > > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > > > > implementations
> > > > > > > may
> > > > > > > > >>>> not
> > > > > > > > >>>> > have
> > > > > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> > > > distinguish
> > > > > > the
> > > > > > > > new
> > > > > > > > >>>> > features
> > > > > > > > >>>> > > >> > from
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > > > > >>>> > > >> > > >>>>>>> > > > features. While the public
> > > decorative
> > > > > > > > >>>> interface is a
> > > > > > > > >>>> > > >> > solution
> > > > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not
> > > use
> > > > it
> > > > > > for
> > > > > > > > the
> > > > > > > > >>>> > features
> > > > > > > > >>>> > > >> > that
> > > > > > > > >>>> > > >> > > >>>>>>> are
> > > > > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > I don't think that this feature
> > is
> > > > > > > > >>>> NON-OPTIONAL.
> > > > > > > > >>>> > Even
> > > > > > > > >>>> > > >> > though
> > > > > > > > >>>> > > >> > > >>>>>>> > preferred, I
> > > > > > > > >>>> > > >> > > >>>>>>> > > > still think it can be simply
> > > > optional.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX()
> > > > method
> > > > > > in
> > > > > > > > the
> > > > > > > > >>>> > Source
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > > be
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > single
> > > > > > > > >>>> > > >> > > >>>>>>> > > > source of truth regarding how to
> > > > > > implement
> > > > > > > > this
> > > > > > > > >>>> > feature.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > I also don't find it entirely
> > true.
> > > > > Half
> > > > > > of
> > > > > > > > the
> > > > > > > > >>>> > classes
> > > > > > > > >>>> > > >> are
> > > > > > > > >>>> > > >> > > >>>>>>> > theoretically
> > > > > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes
> > > from
> > > > > the
> > > > > > > > >>>> point of
> > > > > > > > >>>> > view
> > > > > > > > >>>> > > >> how
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically
> > users
> > > do
> > > > > not
> > > > > > > > need
> > > > > > > > >>>> to
> > > > > > > > >>>> > use
> > > > > > > > >>>> > > >> any
> > > > > > > > >>>> > > >> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader.
> > > Would
> > > > > be
> > > > > > > > weird
> > > > > > > > >>>> to
> > > > > > > > >>>> > list
> > > > > > > > >>>> > > >> > their
> > > > > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > @Piotr
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > If we have all of the methods
> > with
> > > > > > default
> > > > > > > > >>>> > > >> implementation
> > > > > > > > >>>> > > >> > in
> > > > > > > > >>>> > > >> > > >>>>>>> the base
> > > > > > > > >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give
> > any
> > > > > clue
> > > > > > to
> > > > > > > > >>>> the user
> > > > > > > > >>>> > > >> which
> > > > > > > > >>>> > > >> > > set
> > > > > > > > >>>> > > >> > > >>>>>>> of
> > > > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > > > >>>> > > >> > > >>>>>>> > > > are required to be implemented at
> > > the
> > > > > > same
> > > > > > > > >>>> time.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > I feel that no matter which
> > option
> > > we
> > > > > > > choose
> > > > > > > > >>>> this
> > > > > > > > >>>> > can
> > > > > > > > >>>> > > >> not
> > > > > > > > >>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> solved
> > > > > > > > >>>> > > >> > > >>>>>>> > > > entirely in either of the
> > options,
> > > > > > because
> > > > > > > of
> > > > > > > > >>>> the
> > > > > > > > >>>> > point
> > > > > > > > >>>> > > >> > above
> > > > > > > > >>>> > > >> > > >>>>>>> and
> > > > > > > > >>>> > > >> > > >>>>>>> > because
> > > > > > > > >>>> > > >> > > >>>>>>> > > > the signature of
> > > > > > > > >>>> SplitReader#pauseOrResumeSplits and
> > > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits
> > > are
> > > > > > > slightly
> > > > > > > > >>>> > different
> > > > > > > > >>>> > > >> > (one
> > > > > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > > > > >>>> > > >> > > >>>>>>> > > > splits with splitId the other one
> > > > > passes
> > > > > > > the
> > > > > > > > >>>> splits
> > > > > > > > >>>> > > >> > > directly).
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Dawid
> > > > > > > > >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin
> > > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against
> > > the
> > > > > > > > >>>> decorative
> > > > > > > > >>>> > > >> > interfaces,
> > > > > > > > >>>> > > >> > > >>>>>>> but I do
> > > > > > > > >>>> > > >> > > >>>>>>> > > > think we should use them with
> > > > caution.
> > > > > > The
> > > > > > > > main
> > > > > > > > >>>> > argument
> > > > > > > > >>>> > > >> > for
> > > > > > > > >>>> > > >> > > >>>>>>> adding the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is
> > that
> > > > > these
> > > > > > > > >>>> methods
> > > > > > > > >>>> > are
> > > > > > > > >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to
> > > > > SourceReader
> > > > > > > > impl,
> > > > > > > > >>>> i.e.
> > > > > > > > >>>> > > >> > starting
> > > > > > > > >>>> > > >> > > >>>>>>> from this
> > > > > > > > >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl
> > > are
> > > > > > > expected
> > > > > > > > >>>> to
> > > > > > > > >>>> > support
> > > > > > > > >>>> > > >> > this
> > > > > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > > > > implementations
> > > > > > > may
> > > > > > > > >>>> not
> > > > > > > > >>>> > have
> > > > > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> > > > distinguish
> > > > > > the
> > > > > > > > new
> > > > > > > > >>>> > features
> > > > > > > > >>>> > > >> > from
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > > > > >>>> > > >> > > >>>>>>> > > > features. While the public
> > > decorative
> > > > > > > > >>>> interface is a
> > > > > > > > >>>> > > >> > solution
> > > > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not
> > > use
> > > > it
> > > > > > for
> > > > > > > > the
> > > > > > > > >>>> > features
> > > > > > > > >>>> > > >> > that
> > > > > > > > >>>> > > >> > > >>>>>>> are
> > > > > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > That said, this feature is
> > optional
> > > > for
> > > > > > > > >>>> > SplitReaders.
> > > > > > > > >>>> > > >> > > Arguably
> > > > > > > > >>>> > > >> > > >>>>>>> we can
> > > > > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > > > > >>>> > > >> > > >>>>>>> > > > a decorative interface for that,
> > > but
> > > > > for
> > > > > > > > >>>> simplicity
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> symmetry of the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > interface, personally I prefer
> > just
> > > > > > adding
> > > > > > > a
> > > > > > > > >>>> new
> > > > > > > > >>>> > method.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Regarding the advantages you
> > > > mentioned
> > > > > > > about
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> interfaces,
> > > > > > > > >>>> > > >> > > >>>>>>> > > > they would make sense if:
> > > > > > > > >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > > > > > > > >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative
> > > > > interface
> > > > > > > > >>>> involved
> > > > > > > > >>>> > for a
> > > > > > > > >>>> > > >> > > >>>>>>> feature.
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all
> > the
> > > > > > methods
> > > > > > > > are
> > > > > > > > >>>> > grouped
> > > > > > > > >>>> > > >> > > >>>>>>> together will
> > > > > > > > >>>> > > >> > > >>>>>>> > not
> > > > > > > > >>>> > > >> > > >>>>>>> > > > stand.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Compared with that, I think the
> > > > current
> > > > > > > > >>>> solution
> > > > > > > > >>>> > works
> > > > > > > > >>>> > > >> fine
> > > > > > > > >>>> > > >> > > in
> > > > > > > > >>>> > > >> > > >>>>>>> all
> > > > > > > > >>>> > > >> > > >>>>>>> > cases,
> > > > > > > > >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method
> > in
> > > > > > Source,
> > > > > > > > and
> > > > > > > > >>>> > default
> > > > > > > > >>>> > > >> > > >>>>>>> methods /
> > > > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces in base
> > > > > > interfaces.".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > The advantages are:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement
> > base
> > > > > > > interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > In the current approach, the Java
> > > doc
> > > > > of
> > > > > > > the
> > > > > > > > >>>> > > >> SupportXXX()
> > > > > > > > >>>> > > >> > > >>>>>>> method in the
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Source would be the single source
> > > of
> > > > > > truth
> > > > > > > > >>>> regarding
> > > > > > > > >>>> > > >> how to
> > > > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > > > > >>>> > > >> > > >>>>>>> > > > feature. It lists the method that
> > > has
> > > > > to
> > > > > > be
> > > > > > > > >>>> > implemented
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> support this
> > > > > > > > >>>> > > >> > > >>>>>>> > > > feature, regardless of how many
> > > > > classes /
> > > > > > > > >>>> > interfaces are
> > > > > > > > >>>> > > >> > > >>>>>>> involved.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > When implementing the base
> > > interface,
> > > > > > users
> > > > > > > > do
> > > > > > > > >>>> not
> > > > > > > > >>>> > need
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> implement a
> > > > > > > > >>>> > > >> > > >>>>>>> > > > method with default
> > implementation.
> > > > If
> > > > > > they
> > > > > > > > are
> > > > > > > > >>>> > curious
> > > > > > > > >>>> > > >> > what
> > > > > > > > >>>> > > >> > > >>>>>>> the method
> > > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > > for, the java doc of that method
> > > > simply
> > > > > > > > points
> > > > > > > > >>>> > users to
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > SupportXXX()
> > > > > > > > >>>> > > >> > > >>>>>>> > > > method in the Source. It seems
> > not
> > > > > adding
> > > > > > > > work
> > > > > > > > >>>> to
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> users
> > > > > > > > >>>> > > >> > > >>>>>>> compared
> > > > > > > > >>>> > > >> > > >>>>>>> > with
> > > > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives
> > > much
> > > > > > > better
> > > > > > > > >>>> > > >> > discoverability.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > - all of the methods from a
> > single
> > > > > > feature
> > > > > > > > are
> > > > > > > > >>>> > grouped
> > > > > > > > >>>> > > >> in a
> > > > > > > > >>>> > > >> > > >>>>>>> single
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together
> > with
> > > > > their
> > > > > > > > >>>> dedicated
> > > > > > > > >>>> > java
> > > > > > > > >>>> > > >> > doc.
> > > > > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> > > > using
> > > > > > the
> > > > > > > > >>>> > decorator
> > > > > > > > >>>> > > >> name
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > - if an optional feature requires
> > > two
> > > > > > > methods
> > > > > > > > >>>> to be
> > > > > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > These two points are not true
> > when
> > > > > > multiple
> > > > > > > > >>>> > components
> > > > > > > > >>>> > > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> classes are
> > > > > > > > >>>> > > >> > > >>>>>>> > > > involved collaboratively to
> > > provide a
> > > > > > > > feature.
> > > > > > > > >>>> In
> > > > > > > > >>>> > our
> > > > > > > > >>>> > > >> case,
> > > > > > > > >>>> > > >> > > we
> > > > > > > > >>>> > > >> > > >>>>>>> have
> > > > > > > > >>>> > > >> > > >>>>>>> > both
> > > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader
> > > > involved.
> > > > > > And
> > > > > > > > >>>> there
> > > > > > > > >>>> > might
> > > > > > > > >>>> > > >> be
> > > > > > > > >>>> > > >> > > other
> > > > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> > > > on the JM side involved for some
> > > > future
> > > > > > > > >>>> features.
> > > > > > > > >>>> > So the
> > > > > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > > > >>>> > > >> > > >>>>>>> > > > can actually be scattered over
> > the
> > > > > > places.
> > > > > > > > That
> > > > > > > > >>>> > said, we
> > > > > > > > >>>> > > >> > may
> > > > > > > > >>>> > > >> > > >>>>>>> still use
> > > > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces for each
> > > > > component,
> > > > > > > if
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> feature is
> > > > > > > > >>>> > > >> > > >>>>>>> optional,
> > > > > > > > >>>> > > >> > > >>>>>>> > > given
> > > > > > > > >>>> > > >> > > >>>>>>> > > > there is a single source of truth
> > > for
> > > > > the
> > > > > > > > >>>> feature.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Here I would strongly lean
> > towards
> > > > > making
> > > > > > > > life
> > > > > > > > >>>> > easier
> > > > > > > > >>>> > > >> for
> > > > > > > > >>>> > > >> > new
> > > > > > > > >>>> > > >> > > >>>>>>> users,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at
> > the
> > > > > (imo)
> > > > > > > > >>>> slight
> > > > > > > > >>>> > expense
> > > > > > > > >>>> > > >> > for
> > > > > > > > >>>> > > >> > > >>>>>>> the power
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > I actually think the current
> > > approach
> > > > > is
> > > > > > > > >>>> simpler,
> > > > > > > > >>>> > more
> > > > > > > > >>>> > > >> > > >>>>>>> extensible and
> > > > > > > > >>>> > > >> > > >>>>>>> > > more
> > > > > > > > >>>> > > >> > > >>>>>>> > > > general for all the users. Can
> > you
> > > > > > > articulate
> > > > > > > > >>>> a bit
> > > > > > > > >>>> > > >> more on
> > > > > > > > >>>> > > >> > > >>>>>>> which part
> > > > > > > > >>>> > > >> > > >>>>>>> > > you
> > > > > > > > >>>> > > >> > > >>>>>>> > > > think makes users harder to
> > > > understand?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > There is another benefit of the
> > > > > > decorative
> > > > > > > > >>>> > interfaces
> > > > > > > > >>>> > > >> which
> > > > > > > > >>>> > > >> > > is
> > > > > > > > >>>> > > >> > > >>>>>>> not
> > > > > > > > >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth
> > > > > considering
> > > > > > > > here.
> > > > > > > > >>>> > Usually
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > > interfaces give slightly better
> > > > > backwards
> > > > > > > > >>>> > compatibility
> > > > > > > > >>>> > > >> > than
> > > > > > > > >>>> > > >> > > >>>>>>> the new
> > > > > > > > >>>> > > >> > > >>>>>>> > > > default method in the interfaces.
> > > > That
> > > > > is
> > > > > > > > when
> > > > > > > > >>>> > users are
> > > > > > > > >>>> > > >> > > using
> > > > > > > > >>>> > > >> > > >>>>>>> a jar
> > > > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > > > >>>> > > >> > > >>>>>>> > > > was compiled with an older
> > version
> > > of
> > > > > > Flink
> > > > > > > > >>>> which
> > > > > > > > >>>> > does
> > > > > > > > >>>> > > >> not
> > > > > > > > >>>> > > >> > > >>>>>>> have the
> > > > > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > > > > >>>> > > >> > > >>>>>>> > > > method in the interfaces in
> > > > question. A
> > > > > > > > >>>> decorative
> > > > > > > > >>>> > > >> > interface
> > > > > > > > >>>> > > >> > > >>>>>>> may still
> > > > > > > > >>>> > > >> > > >>>>>>> > > > provide backwards compatibility
> > in
> > > > that
> > > > > > > case,
> > > > > > > > >>>> while
> > > > > > > > >>>> > > >> default
> > > > > > > > >>>> > > >> > > >>>>>>> method impl
> > > > > > > > >>>> > > >> > > >>>>>>> > > > cannot.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do
> > > not
> > > > > > > > guarantee
> > > > > > > > >>>> > custom
> > > > > > > > >>>> > > >> > > >>>>>>> components
> > > > > > > > >>>> > > >> > > >>>>>>> > > compiled
> > > > > > > > >>>> > > >> > > >>>>>>> > > > with an older version can run
> > with
> > > a
> > > > > > newer
> > > > > > > > >>>> version
> > > > > > > > >>>> > of
> > > > > > > > >>>> > > >> > Flink.
> > > > > > > > >>>> > > >> > > A
> > > > > > > > >>>> > > >> > > >>>>>>> > recompile
> > > > > > > > >>>> > > >> > > >>>>>>> > > > with a newer version would be
> > > > required.
> > > > > > > That
> > > > > > > > >>>> said,
> > > > > > > > >>>> > if
> > > > > > > > >>>> > > >> we do
> > > > > > > > >>>> > > >> > > >>>>>>> care about
> > > > > > > > >>>> > > >> > > >>>>>>> > > > this, we can just change the
> > > > > > "supportXXX()"
> > > > > > > > >>>> method
> > > > > > > > >>>> > in
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> Source
> > > > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and
> > > > leave
> > > > > > the
> > > > > > > > >>>> other
> > > > > > > > >>>> > parts
> > > > > > > > >>>> > > >> > > >>>>>>> unchanged.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Thanks,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM
> > > Piotr
> > > > > > > > Nowojski <
> > > > > > > > >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Hi,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late,
> > > but I
> > > > > was
> > > > > > > OoO
> > > > > > > > >>>> for
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> last
> > > > > > > > >>>> > > >> > > two
> > > > > > > > >>>> > > >> > > >>>>>>> weeks.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I
> > would
> > > be
> > > > > > > > actually
> > > > > > > > >>>> > against
> > > > > > > > >>>> > > >> > adding
> > > > > > > > >>>> > > >> > > >>>>>>> those
> > > > > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the
> > > > reasons
> > > > > > > > >>>> mentioned
> > > > > > > > >>>> > above.
> > > > > > > > >>>> > > >> > > >>>>>>> Clogging the
> > > > > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> interface for new users with
> > tons
> > > of
> > > > > > > methods
> > > > > > > > >>>> that
> > > > > > > > >>>> > they
> > > > > > > > >>>> > > >> do
> > > > > > > > >>>> > > >> > > not
> > > > > > > > >>>> > > >> > > >>>>>>> need, do
> > > > > > > > >>>> > > >> > > >>>>>>> > > not
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> understand and do not know what
> > to
> > > > do
> > > > > > with
> > > > > > > > >>>> them.
> > > > > > > > >>>> > > >> Moreover,
> > > > > > > > >>>> > > >> > > >>>>>>> such
> > > > > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem
> > > if
> > > > a
> > > > > > > > feature
> > > > > > > > >>>> > requires
> > > > > > > > >>>> > > >> two
> > > > > > > > >>>> > > >> > > or
> > > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to be implemented at the same
> > > time.
> > > > If
> > > > > > we
> > > > > > > > >>>> have all
> > > > > > > > >>>> > of
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> methods with
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> default implementation in the
> > base
> > > > > > > > interface,
> > > > > > > > >>>> the
> > > > > > > > >>>> > API
> > > > > > > > >>>> > > >> > > doesn't
> > > > > > > > >>>> > > >> > > >>>>>>> give any
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> clue
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to the user which set of methods
> > > are
> > > > > > > > required
> > > > > > > > >>>> to be
> > > > > > > > >>>> > > >> > > >>>>>>> implemented at the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> time.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback
> > > of
> > > > > > > > decorative
> > > > > > > > >>>> > > >> interfaces
> > > > > > > > >>>> > > >> > is
> > > > > > > > >>>> > > >> > > >>>>>>> which
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > they can decorate and which
> > > > > > combinations
> > > > > > > > of
> > > > > > > > >>>> > multiple
> > > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a
> > new
> > > > > > feature
> > > > > > > > >>>> added
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or
> > pre-partitioned
> > > > data
> > > > > > > > >>>> aware), are
> > > > > > > > >>>> > we
> > > > > > > > >>>> > > >> > going
> > > > > > > > >>>> > > >> > > >>>>>>> to create
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> another
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such
> > as
> > > > > > > > >>>> > SortedSplitReader or
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I
> > think
> > > > the
> > > > > > > > >>>> additional
> > > > > > > > >>>> > > >> > decorative
> > > > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually
> > > > > increases
> > > > > > > the
> > > > > > > > >>>> > > >> understanding
> > > > > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > > > > >>>> > > >> > > >>>>>>> > users
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > because they have to know what
> > > > > > > decorative
> > > > > > > > >>>> > interfaces
> > > > > > > > >>>> > > >> are
> > > > > > > > >>>> > > >> > > >>>>>>> there,
> > > > > > > > >>>> > > >> > > >>>>>>> > which
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > interface they can decorate
> > and
> > > > > which
> > > > > > > > >>>> > combinations of
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which
> > > are
> > > > > > not.
> > > > > > > > >>>> Ideally
> > > > > > > > >>>> > we
> > > > > > > > >>>> > > >> want
> > > > > > > > >>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> avoid
> > > > > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how
> > > > > > embedding
> > > > > > > > >>>> default
> > > > > > > > >>>> > > >> methods
> > > > > > > > >>>> > > >> > > in
> > > > > > > > >>>> > > >> > > >>>>>>> the base
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> interface is solving the
> > problem:
> > > > what
> > > > > > can
> > > > > > > > be
> > > > > > > > >>>> > combined
> > > > > > > > >>>> > > >> or
> > > > > > > > >>>> > > >> > > >>>>>>> not? If
> > > > > > > > >>>> > > >> > > >>>>>>> > there
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> are
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> two conflicting features, having
> > > > > > > decorative
> > > > > > > > >>>> > interfaces
> > > > > > > > >>>> > > >> > that
> > > > > > > > >>>> > > >> > > >>>>>>> can not be
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> mixed together actually makes
> > much
> > > > > more
> > > > > > > > sense
> > > > > > > > >>>> to me
> > > > > > > > >>>> > > >> rather
> > > > > > > > >>>> > > >> > > >>>>>>> than having
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> them
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> all in one base class. How would
> > > you
> > > > > > allow
> > > > > > > > >>>> users to
> > > > > > > > >>>> > > >> > > implement
> > > > > > > > >>>> > > >> > > >>>>>>> only one
> > > > > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> those two features?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes,
> > > > there
> > > > > > are
> > > > > > > > >>>> > drawbacks:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user discover what
> > > > > > decorators
> > > > > > > > are
> > > > > > > > >>>> > there?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user know where the
> > > > > > decorator
> > > > > > > > can
> > > > > > > > >>>> be
> > > > > > > > >>>> > > >> applied
> > > > > > > > >>>> > > >> > to?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for
> > > more
> > > > > > power
> > > > > > > > >>>> users,
> > > > > > > > >>>> > that
> > > > > > > > >>>> > > >> can
> > > > > > > > >>>> > > >> > > be
> > > > > > > > >>>> > > >> > > >>>>>>> > mitigated
> > > > > > > > >>>> > > >> > > >>>>>>> > > by
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the documentation. For example
> > > > listing
> > > > > > all
> > > > > > > > of
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> > decorators
> > > > > > > > >>>> > > >> > > >>>>>>> with
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the
> > > > docs
> > > > > > and
> > > > > > > in
> > > > > > > > >>>> the
> > > > > > > > >>>> > java
> > > > > > > > >>>> > > >> > docs.
> > > > > > > > >>>> > > >> > > >>>>>>> More
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> experienced users will be able
> > to
> > > > deal
> > > > > > > with
> > > > > > > > >>>> those
> > > > > > > > >>>> > > >> issues
> > > > > > > > >>>> > > >> > > >>>>>>> easier, as
> > > > > > > > >>>> > > >> > > >>>>>>> > they
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> will already have some basic
> > > > > > understanding
> > > > > > > > of
> > > > > > > > >>>> > Flink.
> > > > > > > > >>>> > > >> Also
> > > > > > > > >>>> > > >> > if
> > > > > > > > >>>> > > >> > > >>>>>>> user has
> > > > > > > > >>>> > > >> > > >>>>>>> > a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve,
> > he
> > > > > will
> > > > > > > > google
> > > > > > > > >>>> > search a
> > > > > > > > >>>> > > >> > > >>>>>>> potential
> > > > > > > > >>>> > > >> > > >>>>>>> > > solution
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while
> > > > doing
> > > > > > > that
> > > > > > > > >>>> he is
> > > > > > > > >>>> > very
> > > > > > > > >>>> > > >> > > likely
> > > > > > > > >>>> > > >> > > >>>>>>> to
> > > > > > > > >>>> > > >> > > >>>>>>> > > discover
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the decorator that he needs
> > anyway
> > > > in
> > > > > > the
> > > > > > > > >>>> docs.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> The advantages are:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement
> > base
> > > > > > > interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> - all of the methods from a
> > single
> > > > > > feature
> > > > > > > > are
> > > > > > > > >>>> > grouped
> > > > > > > > >>>> > > >> in
> > > > > > > > >>>> > > >> > a
> > > > > > > > >>>> > > >> > > >>>>>>> single
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together
> > with
> > > > > their
> > > > > > > > >>>> dedicated
> > > > > > > > >>>> > java
> > > > > > > > >>>> > > >> > doc.
> > > > > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> > > > using
> > > > > > the
> > > > > > > > >>>> > decorator
> > > > > > > > >>>> > > >> name
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> - if an optional feature
> > requires
> > > > two
> > > > > > > > methods
> > > > > > > > >>>> to be
> > > > > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean
> > towards
> > > > > > making
> > > > > > > > life
> > > > > > > > >>>> > easier
> > > > > > > > >>>> > > >> for
> > > > > > > > >>>> > > >> > > new
> > > > > > > > >>>> > > >> > > >>>>>>> users,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at
> > the
> > > > > (imo)
> > > > > > > > >>>> slight
> > > > > > > > >>>> > expense
> > > > > > > > >>>> > > >> > for
> > > > > > > > >>>> > > >> > > >>>>>>> the power
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Piotrek
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket
> > > Qin
> > > > <
> > > > > > > > >>>> > > >> becket.qin@gmail.com
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > napisał(a):
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian
> > > and
> > > > > > > Dawid,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good
> > > > > summary.
> > > > > > > This
> > > > > > > > >>>> is a
> > > > > > > > >>>> > > >> really
> > > > > > > > >>>> > > >> > > >>>>>>> helpful
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> discussion.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel
> > that
> > > > it
> > > > > > > might
> > > > > > > > >>>> still
> > > > > > > > >>>> > be
> > > > > > > > >>>> > > >> > better
> > > > > > > > >>>> > > >> > > >>>>>>> to add the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the
> > > Source
> > > > > > > rather
> > > > > > > > >>>> than
> > > > > > > > >>>> > > >> > > SourceReader.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we
> > are
> > > > > trying
> > > > > > > to
> > > > > > > > do
> > > > > > > > >>>> > here is
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > let
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > framework know what the Source
> > > is
> > > > > > > capable
> > > > > > > > >>>> of. In
> > > > > > > > >>>> > this
> > > > > > > > >>>> > > >> > > FLIP,
> > > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > > >>>> > > >> > > >>>>>>> > happens
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > be the capability that only
> > > > involves
> > > > > > > > >>>> > SourceReader.
> > > > > > > > >>>> > > >> But
> > > > > > > > >>>> > > >> > in
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > future,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> it is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > possible that another
> > > > functionality
> > > > > > > > involves
> > > > > > > > >>>> > both the
> > > > > > > > >>>> > > >> > > >>>>>>> > SplitEnumerator
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case,
> > > > > following
> > > > > > > the
> > > > > > > > >>>> current
> > > > > > > > >>>> > > >> > > approach,
> > > > > > > > >>>> > > >> > > >>>>>>> we
> > > > > > > > >>>> > > >> > > >>>>>>> > should
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> put
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in
> > > both
> > > > > > > > >>>> > SplitEnumerator
> > > > > > > > >>>> > > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> SourceReader.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in
> > > the
> > > > > > > > >>>> SourceReader,
> > > > > > > > >>>> > then
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> JM would
> > > > > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order
> > > to
> > > > > know
> > > > > > > > >>>> whether
> > > > > > > > >>>> > this
> > > > > > > > >>>> > > >> > > feature
> > > > > > > > >>>> > > >> > > >>>>>>> is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> supported,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if
> > > we
> > > > > put
> > > > > > > the
> > > > > > > > >>>> > > >> > "supportsXXX()"
> > > > > > > > >>>> > > >> > > >>>>>>> method in
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Source, we will break the
> > > > > "symmetric"
> > > > > > > > design
> > > > > > > > >>>> > because
> > > > > > > > >>>> > > >> > this
> > > > > > > > >>>> > > >> > > >>>>>>> FLIP
> > > > > > > > >>>> > > >> > > >>>>>>> > chose a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > different way.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > This is also why I think
> > > > > supportsXXX()
> > > > > > > > >>>> method
> > > > > > > > >>>> > seems a
> > > > > > > > >>>> > > >> > good
> > > > > > > > >>>> > > >> > > >>>>>>> thing to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> have,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > because when there are a few
> > > > > > interfaces
> > > > > > > /
> > > > > > > > >>>> methods
> > > > > > > > >>>> > > >> that
> > > > > > > > >>>> > > >> > are
> > > > > > > > >>>> > > >> > > >>>>>>> expected
> > > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> be
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > implemented at the same time
> > in
> > > > > order
> > > > > > to
> > > > > > > > >>>> deliver
> > > > > > > > >>>> > a
> > > > > > > > >>>> > > >> > > feature,
> > > > > > > > >>>> > > >> > > >>>>>>> it is
> > > > > > > > >>>> > > >> > > >>>>>>> > > always
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > good to have a single source
> > of
> > > > > truth
> > > > > > to
> > > > > > > > >>>> tell the
> > > > > > > > >>>> > > >> > > framework
> > > > > > > > >>>> > > >> > > >>>>>>> what to
> > > > > > > > >>>> > > >> > > >>>>>>> > > do,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> so
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > the framework can do
> > consistent
> > > > > things
> > > > > > > in
> > > > > > > > >>>> > different
> > > > > > > > >>>> > > >> > parts.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> > > > > > > > >>>> sebastian@ververica.com>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b),
> > > > i.e.
> > > > > > > > >>>> > > >> AlignedSourceReader
> > > > > > > > >>>> > > >> > +
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I
> > feel
> > > > > > awkward
> > > > > > > > >>>> about is
> > > > > > > > >>>> > > >> that we
> > > > > > > > >>>> > > >> > > are
> > > > > > > > >>>> > > >> > > >>>>>>> > > essentially
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > expecting almost all the
> > > > > SourceReader
> > > > > > > > >>>> > > >> implementations to
> > > > > > > > >>>> > > >> > > >>>>>>> extend
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which
> > > > effectively
> > > > > > > makes
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> > SourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > without the pausing support
> > > > useless.
> > > > > > So
> > > > > > > > this
> > > > > > > > >>>> > > >> indicates
> > > > > > > > >>>> > > >> > > that
> > > > > > > > >>>> > > >> > > >>>>>>> public
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or
> > > > > > sub-interfaces
> > > > > > > > >>>> for the
> > > > > > > > >>>> > same
> > > > > > > > >>>> > > >> > > >>>>>>> purpose) only
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > make sense if the original
> > > > interface
> > > > > > is
> > > > > > > > also
> > > > > > > > >>>> > > >> expected to
> > > > > > > > >>>> > > >> > > be
> > > > > > > > >>>> > > >> > > >>>>>>> used.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more
> > > > sense
> > > > > > to
> > > > > > > > add
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> method
> > > > > > > > >>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > original
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > interface itself.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Cheers,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05
> > PM
> > > > > Dawid
> > > > > > > > >>>> Wysakowicz
> > > > > > > > >>>> > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the
> > nice
> > > > > > > summary.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I think most of your points
> > > > > aligned
> > > > > > > with
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> > suggestions
> > > > > > > > >>>> > > >> > > >>>>>>> I made to
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing
> > > > your
> > > > > > > reply
> > > > > > > > (I
> > > > > > > > >>>> > believe
> > > > > > > > >>>> > > >> we
> > > > > > > > >>>> > > >> > > hit
> > > > > > > > >>>> > > >> > > >>>>>>> enter
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> nearly at
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced
> > > > offline
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > > > > > > > >>>> > > >> supportsWatermarksSplitAlignment
> > > > > > > > >>>> > > >> > > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to
> > > express
> > > > > the
> > > > > > > > >>>> general
> > > > > > > > >>>> > > >> > capability
> > > > > > > > >>>> > > >> > > of
> > > > > > > > >>>> > > >> > > >>>>>>> > pausing.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > > > > > > > >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > (option
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> b)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the
> > > > > methods
> > > > > > > > >>>> (option
> > > > > > > > >>>> > c), I
> > > > > > > > >>>> > > >> > > >>>>>>> suggest to
> > > > > > > > >>>> > > >> > > >>>>>>> > simply
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt
> > this
> > > > is
> > > > > > much
> > > > > > > > >>>> > preferred
> > > > > > > > >>>> > > >> > > approach
> > > > > > > > >>>> > > >> > > >>>>>>> Becket,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless
> > > > there
> > > > > > is
> > > > > > > an
> > > > > > > > >>>> > opposition
> > > > > > > > >>>> > > >> > > let's
> > > > > > > > >>>> > > >> > > >>>>>>> go with
> > > > > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > option c.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Dawid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06,
> > Sebastian
> > > > > > > Mattheis
> > > > > > > > >>>> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit
> > silent.
> > > > Many
> > > > > > > > thanks
> > > > > > > > >>>> for
> > > > > > > > >>>> > all
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> input and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit
> > > new, I
> > > > > > > needed
> > > > > > > > >>>> some
> > > > > > > > >>>> > time
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> catch up and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > structure
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion
> > > and
> > > > I
> > > > > > > wanted
> > > > > > > > >>>> to
> > > > > > > > >>>> > find a
> > > > > > > > >>>> > > >> way
> > > > > > > > >>>> > > >> > > to
> > > > > > > > >>>> > > >> > > >>>>>>> > structure
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I
> > > had
> > > > > the
> > > > > > > > >>>> feeling
> > > > > > > > >>>> > that
> > > > > > > > >>>> > > >> some
> > > > > > > > >>>> > > >> > > >>>>>>> concerns
> > > > > > > > >>>> > > >> > > >>>>>>> > got
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> lost
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my
> > > > > attempt
> > > > > > > and
> > > > > > > > >>>> please
> > > > > > > > >>>> > > >> correct
> > > > > > > > >>>> > > >> > > me
> > > > > > > > >>>> > > >> > > >>>>>>> if
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> something is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I
> > > tried
> > > > to
> > > > > > > > >>>> collect and
> > > > > > > > >>>> > > >> > assemble
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > opinions,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions
> > > (to
> > > > > the
> > > > > > > > best
> > > > > > > > >>>> of my
> > > > > > > > >>>> > > >> > > >>>>>>> knowledge):
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split
> > > alignment
> > > > > > > > >>>> (pause/resume
> > > > > > > > >>>> > > >> > behavior)
> > > > > > > > >>>> > > >> > > >>>>>>> be a
> > > > > > > > >>>> > > >> > > >>>>>>> > general
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > capability?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any
> > > > reason
> > > > > no
> > > > > > > to
> > > > > > > > >>>> have
> > > > > > > > >>>> > it a
> > > > > > > > >>>> > > >> > > general
> > > > > > > > >>>> > > >> > > >>>>>>> > capability
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit
> > > > method
> > > > > it
> > > > > > > is
> > > > > > > > >>>> > actually
> > > > > > > > >>>> > > >> > > >>>>>>> independent of
> > > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here
> > > to
> > > > > have
> > > > > > > it
> > > > > > > > a
> > > > > > > > >>>> > general
> > > > > > > > >>>> > > >> > > >>>>>>> capability, we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > also agree on the right
> > > wording.
> > > > > > Does
> > > > > > > > >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to
> > what
> > > is
> > > > > > then
> > > > > > > > >>>> actually
> > > > > > > > >>>> > > >> meant?
> > > > > > > > >>>> > > >> > (I
> > > > > > > > >>>> > > >> > > >>>>>>> see it as
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> okay.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea
> > > > whilst
> > > > > > > Arvid
> > > > > > > > >>>> > suggested
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be
> > possible
> > > > do
> > > > > > > > >>>> > enable/disable
> > > > > > > > >>>> > > >> split
> > > > > > > > >>>> > > >> > > >>>>>>> alignment?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I would personally not
> > disable
> > > > the
> > > > > > > split
> > > > > > > > >>>> > alignment
> > > > > > > > >>>> > > >> on
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> reader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > side because if split
> > > alignment
> > > > is
> > > > > > > used
> > > > > > > > >>>> for
> > > > > > > > >>>> > some
> > > > > > > > >>>> > > >> other
> > > > > > > > >>>> > > >> > > >>>>>>> use case
> > > > > > > > >>>> > > >> > > >>>>>>> > (see
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> A)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > could have nasty side
> > effects
> > > on
> > > > > > > > >>>> other/future
> > > > > > > > >>>> > use
> > > > > > > > >>>> > > >> > cases.
> > > > > > > > >>>> > > >> > > >>>>>>> Instead,
> > > > > > > > >>>> > > >> > > >>>>>>> > I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split
> > > > > alignment"
> > > > > > > > where
> > > > > > > > >>>> I
> > > > > > > > >>>> > think
> > > > > > > > >>>> > > >> it
> > > > > > > > >>>> > > >> > > >>>>>>> should
> > > > > > > > >>>> > > >> > > >>>>>>> > disable
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger
> > > for
> > > > > > split
> > > > > > > > >>>> > alignment.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a
> > > > supportsX
> > > > > > > > method?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to
> > define
> > > > the
> > > > > > > scope
> > > > > > > > >>>> of a
> > > > > > > > >>>> > > >> supportsX
> > > > > > > > >>>> > > >> > > >>>>>>> method
> > > > > > > > >>>> > > >> > > >>>>>>> > w.r.t.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the following questions: a)
> > > > Where
> > > > > is
> > > > > > > it
> > > > > > > > >>>> used?
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> b)
> > > > > > > > >>>> > > >> > > What
> > > > > > > > >>>> > > >> > > >>>>>>> is the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> expected
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not
> > > > > > > straight-forward
> > > > > > > > >>>> to
> > > > > > > > >>>> > > >> provide a
> > > > > > > > >>>> > > >> > > >>>>>>> meaningful
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> output,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader
> > supports
> > > > > split
> > > > > > > > >>>> alignment
> > > > > > > > >>>> > but
> > > > > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > not.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> This
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because with the current
> > > > > > > implementation,
> > > > > > > > >>>> we can
> > > > > > > > >>>> > > >> > > determine
> > > > > > > > >>>> > > >> > > >>>>>>> whether
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> split
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported
> > > > only
> > > > > > > during
> > > > > > > > >>>> > runtime
> > > > > > > > >>>> > > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> specifically
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > actually
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > only when calling
> > alignSplits
> > > > down
> > > > > > the
> > > > > > > > >>>> call
> > > > > > > > >>>> > > >> hierarchy
> > > > > > > > >>>> > > >> > up
> > > > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > > > >>>> > > >> > > >>>>>>> > > actual
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest
> > to
> > > > > either
> > > > > > > > >>>> raise an
> > > > > > > > >>>> > > >> error or
> > > > > > > > >>>> > > >> > > >>>>>>> warning if
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not
> > > > > > supported
> > > > > > > at
> > > > > > > > >>>> some
> > > > > > > > >>>> > > >> point. I
> > > > > > > > >>>> > > >> > > >>>>>>> know we
> > > > > > > > >>>> > > >> > > >>>>>>> > > should
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > carefully think about when
> > > this
> > > > > > could
> > > > > > > be
> > > > > > > > >>>> the
> > > > > > > > >>>> > case
> > > > > > > > >>>> > > >> > > because
> > > > > > > > >>>> > > >> > > >>>>>>> we don't
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such
> > > > warnings.
> > > > > > > > However,
> > > > > > > > >>>> > warnings
> > > > > > > > >>>> > > >> > > could
> > > > > > > > >>>> > > >> > > >>>>>>> be an
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> indicator
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > for the user that for
> > > watermark
> > > > > > split
> > > > > > > > >>>> > alignment use
> > > > > > > > >>>> > > >> > case
> > > > > > > > >>>> > > >> > > >>>>>>> split
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> reading is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the
> > conclusion
> > > > to
> > > > > > > either
> > > > > > > > >>>> > disable
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> trigger for
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > watermark
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B)
> > or
> > > > to
> > > > > > > > >>>> > use/implement a
> > > > > > > > >>>> > > >> > source
> > > > > > > > >>>> > > >> > > >>>>>>> and
> > > > > > > > >>>> > > >> > > >>>>>>> > reader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > fully supports split
> > > alignment.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design
> > > > interfaces?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the
> > > > > > discussion
> > > > > > > > >>>> with the
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > > various
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> possibilities
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion
> > and
> > > > > > > emails, I
> > > > > > > > >>>> would
> > > > > > > > >>>> > > >> like to
> > > > > > > > >>>> > > >> > > >>>>>>> summarize
> > > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be
> > > > consistent
> > > > > > > > >>>> > ("symmetric"),
> > > > > > > > >>>> > > >> i.e.,
> > > > > > > > >>>> > > >> > > >>>>>>> similar
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> semantics
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > should have similar
> > interfaces
> > > > > with
> > > > > > > > >>>> similar
> > > > > > > > >>>> > usage.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which
> > > > > > implementations
> > > > > > > > >>>> implement
> > > > > > > > >>>> > > >> > > >>>>>>> interfaces/support
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > behavior.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are
> > default
> > > > > > > > >>>> implementations
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> how
> > > > > > > > >>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired
> > > > behavior.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of
> > > the
> > > > > > > > relations
> > > > > > > > >>>> > between
> > > > > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > > > > >>>> > > >> > > >>>>>>> > classes
> > > > > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> > > > > > > > >>>> SourceReaderBase
> > > > > > > > >>>> > > >> (Internal)
> > > > > > > > >>>> > > >> > > >>>>>>> <|-- ..
> > > > > > > > >>>> > > >> > > >>>>>>> > <|--
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>--
> > > > > > > SplitFetcherManager
> > > > > > > > >>>> > (Internal)
> > > > > > > > >>>> > > >> > <>--
> > > > > > > > >>>> > > >> > > >>>>>>> > SplitFetcher
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader
> > > > > (Public)
> > > > > > > > <|--
> > > > > > > > >>>> > > >> > MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from
> > A;
> > > A
> > > > > <>--
> > > > > > > B:
> > > > > > > > A
> > > > > > > > >>>> "has
> > > > > > > > >>>> > a"
> > > > > > > > >>>> > > >> B)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase
> > and
> > > > > > > > >>>> > SplitFetcherManager
> > > > > > > > >>>> > > >> > > >>>>>>> implement most
> > > > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment
> > > just
> > > > > > > because
> > > > > > > > >>>> we
> > > > > > > > >>>> > wanted
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > > > >>>> > > >> > > >>>>>>> > > split
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to
> > be
> > > > > > > available
> > > > > > > > as
> > > > > > > > >>>> > kind of
> > > > > > > > >>>> > > >> a
> > > > > > > > >>>> > > >> > > >>>>>>> default. As a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a
> > > "default
> > > > > > > > >>>> > implementation" for
> > > > > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic
> > > for
> > > > > > split
> > > > > > > > >>>> > alignment.
> > > > > > > > >>>> > > >> For
> > > > > > > > >>>> > > >> > > that
> > > > > > > > >>>> > > >> > > >>>>>>> reason,
> > > > > > > > >>>> > > >> > > >>>>>>> > I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> find
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a
> > NOOP
> > > > > > default
> > > > > > > > >>>> > > >> implementation
> > > > > > > > >>>> > > >> > in
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> for
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a
> > > > > consequence,
> > > > > > > > >>>> interface
> > > > > > > > >>>> > > >> strategy
> > > > > > > > >>>> > > >> > > c)
> > > > > > > > >>>> > > >> > > >>>>>>> is
> > > > > > > > >>>> > > >> > > >>>>>>> > > difficult
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because this would require
> > > NOOP
> > > > > > > default
> > > > > > > > >>>> > > >> > implementations
> > > > > > > > >>>> > > >> > > >>>>>>> in the
> > > > > > > > >>>> > > >> > > >>>>>>> > > public
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader
> > and
> > > > > > > > >>>> SplitReader.
> > > > > > > > >>>> > This is
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> same for
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> strategy
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > d) because it would require
> > > NOOP
> > > > > > > default
> > > > > > > > >>>> > > >> > implementation
> > > > > > > > >>>> > > >> > > >>>>>>> in the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as
> > > Dawid
> > > > > > > > described
> > > > > > > > >>>> > method
> > > > > > > > >>>> > > >> > > >>>>>>> signatures of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> alignSplit
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and
> > > SplitReader
> > > > > > > differ
> > > > > > > > >>>> and it
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> extremely
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > difficult
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the
> > > same
> > > > > > (with
> > > > > > > > even
> > > > > > > > >>>> > > >> potential
> > > > > > > > >>>> > > >> > > >>>>>>> performance
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> impact
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because of additional
> > loop-ups
> > > > of
> > > > > > > split
> > > > > > > > >>>> ids).
> > > > > > > > >>>> > > >> > Therefore,
> > > > > > > > >>>> > > >> > > >>>>>>> having a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > symmetric
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of
> > > > > strategy
> > > > > > a)
> > > > > > > > is
> > > > > > > > >>>> > actually
> > > > > > > > >>>> > > >> not
> > > > > > > > >>>> > > >> > > >>>>>>> possible
> > > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > having
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces
> > with
> > > > > > > different
> > > > > > > > >>>> method
> > > > > > > > >>>> > > >> > > >>>>>>> signatures is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> confusing.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > My
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are
> > best
> > > > > with
> > > > > > > > >>>> strategy b)
> > > > > > > > >>>> > > >> which
> > > > > > > > >>>> > > >> > > >>>>>>> means to
> > > > > > > > >>>> > > >> > > >>>>>>> > have
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces
> > > that
> > > > > > > inherit
> > > > > > > > >>>> from
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > parent
> > > > > > > > >>>> > > >> > > >>>>>>> > interface:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|--
> > > > > > AlignedSourceReader,
> > > > > > > > >>>> > SplitReader
> > > > > > > > >>>> > > >> > <|--
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not
> > 100%
> > > > > sure
> > > > > > > what
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> > > implications
> > > > > > > > >>>> > > >> > > >>>>>>> are and
> > > > > > > > >>>> > > >> > > >>>>>>> > if
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> this
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would
> > > suggest
> > > > > > that
> > > > > > > > >>>> Dawid
> > > > > > > > >>>> > and I
> > > > > > > > >>>> > > >> just
> > > > > > > > >>>> > > >> > > >>>>>>> try to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> implement
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Summary
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me
> > > > know
> > > > > > your
> > > > > > > > >>>> > > >> perspectives.
> > > > > > > > >>>> > > >> > > >>>>>>> Please
> > > > > > > > >>>> > > >> > > >>>>>>> > correct
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> me,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > if
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I
> > > > > > > misunderstood
> > > > > > > > >>>> > > >> something. My
> > > > > > > > >>>> > > >> > > >>>>>>> perspective
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > be:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable
> > > > watermark
> > > > > > > > trigger
> > > > > > > > >>>> for
> > > > > > > > >>>> > split
> > > > > > > > >>>> > > >> > > >>>>>>> alignment)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55
> > > AM
> > > > > > Dawid
> > > > > > > > >>>> > Wysakowicz <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > > > > capability
> > > > > > > > >>>> > approach, I
> > > > > > > > >>>> > > >> > fear
> > > > > > > > >>>> > > >> > > >>>>>>> that it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use
> > case:
> > > > > > Sources
> > > > > > > > can
> > > > > > > > >>>> > always
> > > > > > > > >>>> > > >> be
> > > > > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about
> > intra-task
> > > > > > > > alignment.
> > > > > > > > >>>> So
> > > > > > > > >>>> > it's
> > > > > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group
> > even
> > > > > though
> > > > > > > > they
> > > > > > > > >>>> do
> > > > > > > > >>>> > not
> > > > > > > > >>>> > > >> use
> > > > > > > > >>>> > > >> > any
> > > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They
> > should
> > > > just
> > > > > > > > issue a
> > > > > > > > >>>> > > >> warning, if
> > > > > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> > > > section).
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX"
> > > > method
> > > > > > > would
> > > > > > > > >>>> be for
> > > > > > > > >>>> > > >> telling
> > > > > > > > >>>> > > >> > > if
> > > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > > >>>> > > >> > > >>>>>>> > > supports
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment.
> > > > Cross-task
> > > > > > > > >>>> alignment
> > > > > > > > >>>> > would
> > > > > > > > >>>> > > >> > always
> > > > > > > > >>>> > > >> > > >>>>>>> be
> > > > > > > > >>>> > > >> > > >>>>>>> > > supported.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to
> > what
> > > I
> > > > > > > believe
> > > > > > > > >>>> to be
> > > > > > > > >>>> > > >> closest
> > > > > > > > >>>> > > >> > > to a
> > > > > > > > >>>> > > >> > > >>>>>>> > consensus
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> between all participants.
> > Do
> > > > you
> > > > > > mind
> > > > > > > > >>>> taking a
> > > > > > > > >>>> > > >> look?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind
> > > > addressing
> > > > > > the
> > > > > > > > >>>> nits?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid
> > > > Heise
> > > > > > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this
> > > effort.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of
> > > > 1b).
> > > > > I
> > > > > > > > fully
> > > > > > > > >>>> agree
> > > > > > > > >>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> decorator
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm
> > > also
> > > > > not
> > > > > > a
> > > > > > > > big
> > > > > > > > >>>> fan
> > > > > > > > >>>> > of
> > > > > > > > >>>> > > >> > > >>>>>>> overloading the
> > > > > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard
> > to
> > > > > > > implement
> > > > > > > > as
> > > > > > > > >>>> > is). The
> > > > > > > > >>>> > > >> > > usual
> > > > > > > > >>>> > > >> > > >>>>>>> feedback
> > > > > > > > >>>> > > >> > > >>>>>>> > > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces
> > are
> > > > > > always
> > > > > > > > that
> > > > > > > > >>>> > they are
> > > > > > > > >>>> > > >> > > >>>>>>> overwhelming
> > > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> too
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However,
> > > I'd
> > > > > > also
> > > > > > > > not
> > > > > > > > >>>> > oppose
> > > > > > > > >>>> > > >> 1c)
> > > > > > > > >>>> > > >> > as
> > > > > > > > >>>> > > >> > > >>>>>>> scattered
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd
> > just
> > > > > > dislike
> > > > > > > > >>>> 1a) and
> > > > > > > > >>>> > 1d).
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > > > > capability
> > > > > > > > >>>> > approach, I
> > > > > > > > >>>> > > >> > fear
> > > > > > > > >>>> > > >> > > >>>>>>> that it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use
> > case:
> > > > > > Sources
> > > > > > > > can
> > > > > > > > >>>> > always
> > > > > > > > >>>> > > >> be
> > > > > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about
> > intra-task
> > > > > > > > alignment.
> > > > > > > > >>>> So
> > > > > > > > >>>> > it's
> > > > > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group
> > even
> > > > > though
> > > > > > > > they
> > > > > > > > >>>> do
> > > > > > > > >>>> > not
> > > > > > > > >>>> > > >> use
> > > > > > > > >>>> > > >> > any
> > > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They
> > should
> > > > just
> > > > > > > > issue a
> > > > > > > > >>>> > > >> warning, if
> > > > > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> > > > section).
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think renaming
> > alignSplits
> > > to
> > > > > > > > >>>> facilitate
> > > > > > > > >>>> > future
> > > > > > > > >>>> > > >> use
> > > > > > > > >>>> > > >> > > >>>>>>> cases makes
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> sense
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > but
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c)
> > > is
> > > > > > > chosen)
> > > > > > > > >>>> should
> > > > > > > > >>>> > be
> > > > > > > > >>>> > > >> > > adjusted
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> accordingly.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could
> > be
> > > > > > > > >>>> > PausingSourceReader
> > > > > > > > >>>> > > >> and
> > > > > > > > >>>> > > >> > > I'd
> > > > > > > > >>>> > > >> > > >>>>>>> go for
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits
> > (Becket's
> > > > > > > proposal
> > > > > > > > >>>> > afaik). We
> > > > > > > > >>>> > > >> > could
> > > > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > > > >>>> > > >> > > >>>>>>> > split
> > > > > > > > >>>> > > >> > > >>>>>>> > > it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > into
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit.
> > > > While
> > > > > > > > >>>> > > >> pauseOrResumeSplits
> > > > > > > > >>>> > > >> > > >>>>>>> may allow
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Sources
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2
> > > library
> > > > > > calls
> > > > > > > > (as
> > > > > > > > >>>> > written
> > > > > > > > >>>> > > >> in
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > Javadoc),
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> both
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use
> > it
> > > > and
> > > > > > I'm
> > > > > > > > not
> > > > > > > > >>>> > sure if
> > > > > > > > >>>> > > >> > there
> > > > > > > > >>>> > > >> > > >>>>>>> is a
> > > > > > > > >>>> > > >> > > >>>>>>> > system
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> can.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop"
> > with
> > > > > > "pause".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it
> > > in
> > > > > the
> > > > > > > > >>>> capability
> > > > > > > > >>>> > > >> > section:
> > > > > > > > >>>> > > >> > > >>>>>>> Sources
> > > > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > adopt
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be
> > used
> > > > in
> > > > > > > > earlier
> > > > > > > > >>>> > > >> versions. So
> > > > > > > > >>>> > > >> > > it
> > > > > > > > >>>> > > >> > > >>>>>>> feels
> > > > > > > > >>>> > > >> > > >>>>>>> > like
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > are
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible
> > (old
> > > > > > sources
> > > > > > > > can
> > > > > > > > >>>> be
> > > > > > > > >>>> > used
> > > > > > > > >>>> > > >> > after
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > change);
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any
> > API
> > > > > > > addition.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add
> > what
> > > > > > happens
> > > > > > > > >>>> when all
> > > > > > > > >>>> > > >> splits
> > > > > > > > >>>> > > >> > > are
> > > > > > > > >>>> > > >> > > >>>>>>> paused.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe
> > > how
> > > > > the
> > > > > > 3
> > > > > > > > >>>> flavors
> > > > > > > > >>>> > of
> > > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > interact
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes
> > > > sense
> > > > > to
> > > > > > > > >>>> include
> > > > > > > > >>>> > Kafka
> > > > > > > > >>>> > > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> Pulsar in
> > > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > FLIP.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather
> > > > immediate
> > > > > > > > >>>> follow-up
> > > > > > > > >>>> > work.
> > > > > > > > >>>> > > >> > (could
> > > > > > > > >>>> > > >> > > >>>>>>> be in the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at
> > 12:52
> > > > PM
> > > > > > > Dawid
> > > > > > > > >>>> > Wysakowicz
> > > > > > > > >>>> > > >> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader
> > implements
> > > > > > > > >>>> SourceReader,
> > > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader
> > > implements
> > > > > > > > >>>> SplitReader,
> > > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader
> > implements
> > > > > > > > >>>> > AlignedSourceReader"
> > > > > > > > >>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements
> > > AlignedSplitReader",
> > > > > or
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader
> > implements
> > > > > > > > >>>> SourceReader"
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > > > > according
> > > > > > > > to
> > > > > > > > >>>> Dawid
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > > be:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader
> > implements
> > > > > > > > >>>> SourceReader"
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> > > > although
> > > > > > > > >>>> personally
> > > > > > > > >>>> > > >> > speaking I
> > > > > > > > >>>> > > >> > > >>>>>>> think it
> > > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to
> > > the
> > > > > > split
> > > > > > > > >>>> reader
> > > > > > > > >>>> > with
> > > > > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea
> > > to
> > > > > have
> > > > > > > it
> > > > > > > > >>>> > aligned as
> > > > > > > > >>>> > > >> > much
> > > > > > > > >>>> > > >> > > as
> > > > > > > > >>>> > > >> > > >>>>>>> > possible.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> I'd
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We
> > can
> > > > > merge
> > > > > > > > >>>> > > >> > AlignedSplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> with
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP
> > shortly.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket
> > > Qin
> > > > > > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment,
> > Jark.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used
> > to
> > > > > align
> > > > > > > > other
> > > > > > > > >>>> > things in
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> future?
> > > > > > > > >>>> > > >> > > >>>>>>> > For
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > example,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting
> > global
> > > > rate
> > > > > > > > >>>> limits. This
> > > > > > > > >>>> > > >> > feature
> > > > > > > > >>>> > > >> > > >>>>>>> may also
> > > > > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> > > this
> > > > > > > > >>>> interface to
> > > > > > > > >>>> > > >> support
> > > > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > things, I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly
> > declaring
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > > > such
> > > > > as
> > > > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is a good point.
> > Naming
> > > > > wise,
> > > > > > it
> > > > > > > > >>>> would
> > > > > > > > >>>> > > >> usually
> > > > > > > > >>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> extensible
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just describe what the
> > method
> > > > > > > actually
> > > > > > > > >>>> does,
> > > > > > > > >>>> > > >> instead
> > > > > > > > >>>> > > >> > of
> > > > > > > > >>>> > > >> > > >>>>>>> assuming
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For
> > > > > example,
> > > > > > > in
> > > > > > > > >>>> this
> > > > > > > > >>>> > case,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible
> > > > because
> > > > > > this
> > > > > > > > >>>> can be
> > > > > > > > >>>> > used
> > > > > > > > >>>> > > >> for
> > > > > > > > >>>> > > >> > > >>>>>>> any kind of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> flow
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark
> > > > > alignment
> > > > > > or
> > > > > > > > >>>> simple
> > > > > > > > >>>> > rate
> > > > > > > > >>>> > > >> > > >>>>>>> limiting.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong
> > opinion
> > > > on
> > > > > > > this.
> > > > > > > > I
> > > > > > > > >>>> think
> > > > > > > > >>>> > > >> they
> > > > > > > > >>>> > > >> > > have
> > > > > > > > >>>> > > >> > > >>>>>>> their
> > > > > > > > >>>> > > >> > > >>>>>>> > own
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily
> > use
> > > > > > > Interfaces
> > > > > > > > >>>> for
> > > > > > > > >>>> > > >> extending
> > > > > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > > > > TableSource/TableSink,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces
> > > rather
> > > > > than
> > > > > > > > >>>> methods in
> > > > > > > > >>>> > > >> this
> > > > > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > > > > >>>> > > >> > > >>>>>>> > you
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > > > > ability
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > > > > Interfaces
> > > > > > > > can
> > > > > > > > >>>> help
> > > > > > > > >>>> > to
> > > > > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you
> > > want
> > > > > to
> > > > > > > have
> > > > > > > > >>>> an
> > > > > > > > >>>> > > >> ability.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative
> > > > > interfaces
> > > > > > if
> > > > > > > > >>>> this is
> > > > > > > > >>>> > a
> > > > > > > > >>>> > > >> > general
> > > > > > > > >>>> > > >> > > >>>>>>> design
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> pattern
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the other components in
> > > Flink.
> > > > > But
> > > > > > it
> > > > > > > > >>>> looks
> > > > > > > > >>>> > like
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> current API
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > proposal
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is
> > > > > essentially
> > > > > > > > >>>> > > >> "MySourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> implements
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader,
> > > > > WithSplitsAlignment",
> > > > > > > > >>>> along with
> > > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> implements
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API
> > > > symmetric?
> > > > > > I'd
> > > > > > > > >>>> > consider any
> > > > > > > > >>>> > > >> > one
> > > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> following
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > as
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader
> > implements
> > > > > > > > >>>> SourceReader,
> > > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader
> > > implements
> > > > > > > > >>>> SplitReader,
> > > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader
> > implements
> > > > > > > > >>>> > AlignedSourceReader"
> > > > > > > > >>>> > > >> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements
> > > AlignedSplitReader",
> > > > > or
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader
> > implements
> > > > > > > > >>>> SourceReader"
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > > > > according
> > > > > > > > to
> > > > > > > > >>>> Dawid
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > > be:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader
> > implements
> > > > > > > > >>>> SourceReader"
> > > > > > > > >>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> > > > although
> > > > > > > > >>>> personally
> > > > > > > > >>>> > > >> > speaking I
> > > > > > > > >>>> > > >> > > >>>>>>> think it
> > > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to
> > > the
> > > > > > split
> > > > > > > > >>>> reader
> > > > > > > > >>>> > with
> > > > > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> > > > > > > > >>>> dwysakowicz@apache.org> <
> > > > > > > > >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > > >>>> dwysakowicz@apache.org
> > > > > > > > >>>> > >,
> > > > > > > > >>>> > > >> > thanks
> > > > > > > > >>>> > > >> > > >>>>>>> for the
> > > > > > > > >>>> > > >> > > >>>>>>> > > reply.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I
> > don't
> > > > > have a
> > > > > > > > >>>> preference
> > > > > > > > >>>> > > >> and I
> > > > > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > > > > alignment
> > > > > > I
> > > > > > > am
> > > > > > > > >>>> fine
> > > > > > > > >>>> > > >> > following
> > > > > > > > >>>> > > >> > > >>>>>>> your
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader
> > > extending
> > > > > > from
> > > > > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > > > > there,
> > > > > > > not
> > > > > > > > >>>> in the
> > > > > > > > >>>> > > >> Source
> > > > > > > > >>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the
> > > > > > > "supportsXXX"
> > > > > > > > >>>> in
> > > > > > > > >>>> > Source
> > > > > > > > >>>> > > >> is
> > > > > > > > >>>> > > >> > > >>>>>>> that this
> > > > > > > > >>>> > > >> > > >>>>>>> > > allows
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > some
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For
> > > > example,
> > > > > > if a
> > > > > > > > >>>> user
> > > > > > > > >>>> > enabled
> > > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported
> > by
> > > > the
> > > > > > > > Source,
> > > > > > > > >>>> an
> > > > > > > > >>>> > > >> exception
> > > > > > > > >>>> > > >> > > >>>>>>> can be
> > > > > > > > >>>> > > >> > > >>>>>>> > thrown
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> at
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in
> > > > general
> > > > > > > > >>>> useful. That
> > > > > > > > >>>> > > >> said,
> > > > > > > > >>>> > > >> > I
> > > > > > > > >>>> > > >> > > >>>>>>> agree that
> > > > > > > > >>>> > > >> > > >>>>>>> > > API
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is
> > better
> > > > to
> > > > > > put
> > > > > > > > the
> > > > > > > > >>>> two
> > > > > > > > >>>> > > >> methods
> > > > > > > > >>>> > > >> > > >>>>>>> together.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at
> > 5:56
> > > PM
> > > > > > Jark
> > > > > > > > Wu <
> > > > > > > > >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <
> > > > imjark@gmail.com
> > > > > >
> > > > > > <
> > > > > > > > >>>> > > >> imjark@gmail.com>
> > > > > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the
> > reminder
> > > on
> > > > > > > > FLIP-182.
> > > > > > > > >>>> > Sorry I
> > > > > > > > >>>> > > >> did
> > > > > > > > >>>> > > >> > > >>>>>>> miss it.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns
> > > > then.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at
> > 15:40,
> > > > > Dawid
> > > > > > > > >>>> > Wysakowicz <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> > > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > > >>>> > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework
> > always
> > > > > align
> > > > > > > with
> > > > > > > > >>>> > watermarks
> > > > > > > > >>>> > > >> > when
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case
> > > needs
> > > > > > > > watermark
> > > > > > > > >>>> > > >> alignment
> > > > > > > > >>>> > > >> > > even
> > > > > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > > > > throughput
> > > > > > > > >>>> somehow. I
> > > > > > > > >>>> > > >> agree
> > > > > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> > > > method
> > > > > > for
> > > > > > > > >>>> users to
> > > > > > > > >>>> > > >> > > configure
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the
> > alignment.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > > > > calculate
> > > > > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > > > > >>>> maxDesiredWatermark
> > > > > > > > >>>> > will
> > > > > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause
> > and
> > > > > > resume.
> > > > > > > > Can
> > > > > > > > >>>> users
> > > > > > > > >>>> > > >> > > configure
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the
> > > previous
> > > > > > > FLIP[1]
> > > > > > > > >>>> which
> > > > > > > > >>>> > has
> > > > > > > > >>>> > > >> > been
> > > > > > > > >>>> > > >> > > >>>>>>> already
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must
> > be
> > > > > > enabled
> > > > > > > > >>>> with the
> > > > > > > > >>>> > > >> > > watermark
> > > > > > > > >>>> > > >> > > >>>>>>> strategy
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and
> > > > update
> > > > > > > > >>>> interval.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> > > this
> > > > > > > > >>>> interface to
> > > > > > > > >>>> > > >> support
> > > > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly
> > declaring
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > > > such
> > > > > as
> > > > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On
> > > the
> > > > > > other
> > > > > > > > >>>> hand
> > > > > > > > >>>> > putting
> > > > > > > > >>>> > > >> > all
> > > > > > > > >>>> > > >> > > >>>>>>> methods,
> > > > > > > > >>>> > > >> > > >>>>>>> > > even
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > with
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for
> > > > > enabling
> > > > > > > > >>>> certain
> > > > > > > > >>>> > > >> features,
> > > > > > > > >>>> > > >> > > >>>>>>> makes the
> > > > > > > > >>>> > > >> > > >>>>>>> > > entry
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new
> > > > > source
> > > > > > > > >>>> higher.
> > > > > > > > >>>> > > >> Instead of
> > > > > > > > >>>> > > >> > > >>>>>>> focusing on
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> basic
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and required properties of
> > > the
> > > > > > > Source,
> > > > > > > > >>>> the
> > > > > > > > >>>> > person
> > > > > > > > >>>> > > >> > > >>>>>>> implementing a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need
> > to
> > > > > figure
> > > > > > > out
> > > > > > > > >>>> what
> > > > > > > > >>>> > all
> > > > > > > > >>>> > > >> of
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> extra
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> features
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with
> > > > them.
> > > > > It
> > > > > > > > >>>> makes it
> > > > > > > > >>>> > also
> > > > > > > > >>>> > > >> > > harder
> > > > > > > > >>>> > > >> > > >>>>>>> to
> > > > > > > > >>>> > > >> > > >>>>>>> > > organize
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups
> > as
> > > > Jark
> > > > > > > said.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I
> > don't
> > > > > have a
> > > > > > > > >>>> preference
> > > > > > > > >>>> > > >> and I
> > > > > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > > > > alignment
> > > > > > I
> > > > > > > am
> > > > > > > > >>>> fine
> > > > > > > > >>>> > > >> > following
> > > > > > > > >>>> > > >> > > >>>>>>> your
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader
> > > extending
> > > > > > from
> > > > > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > > > > there,
> > > > > > > not
> > > > > > > > >>>> in the
> > > > > > > > >>>> > > >> Source
> > > > > > > > >>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is
> > really
> > > > > > > > unfortunate
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> "alignSplits"
> > > > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > differ
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader
> > and
> > > > > > > > >>>> SpitReader. The
> > > > > > > > >>>> > > >> reason
> > > > > > > > >>>> > > >> > > for
> > > > > > > > >>>> > > >> > > >>>>>>> that is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only
> > > > with
> > > > > > > > >>>> SplitIds,
> > > > > > > > >>>> > whereas
> > > > > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > needs
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause
> > them.
> > > I
> > > > > > found
> > > > > > > > the
> > > > > > > > >>>> > > >> discrepancy
> > > > > > > > >>>> > > >> > > >>>>>>> acceptable
> > > > > > > > >>>> > > >> > > >>>>>>> > for
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> > > > > > > > >>>> significantly,
> > > > > > > > >>>> > > >> especially
> > > > > > > > >>>> > > >> > > as
> > > > > > > > >>>> > > >> > > >>>>>>> they
> > > > > > > > >>>> > > >> > > >>>>>>> > would
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> highly
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance
> > as
> > > we
> > > > > > would
> > > > > > > > >>>> have to
> > > > > > > > >>>> > > >> perform
> > > > > > > > >>>> > > >> > > >>>>>>> additional
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > lookups.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader
> > is a
> > > > > > > secondary
> > > > > > > > >>>> > interface.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > > > >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark
> > Wu
> > > > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort,
> > Dawid
> > > > and
> > > > > > > > >>>> Sebastian!
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor
> > > > questions
> > > > > > > > (maybe I
> > > > > > > > >>>> > missed
> > > > > > > > >>>> > > >> > > >>>>>>> something).
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework
> > always
> > > > > align
> > > > > > > with
> > > > > > > > >>>> > watermarks
> > > > > > > > >>>> > > >> > when
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case
> > > needs
> > > > > > > > watermark
> > > > > > > > >>>> > > >> alignment
> > > > > > > > >>>> > > >> > > even
> > > > > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > > > > throughput
> > > > > > > > >>>> somehow. I
> > > > > > > > >>>> > > >> agree
> > > > > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> > > > method
> > > > > > for
> > > > > > > > >>>> users to
> > > > > > > > >>>> > > >> > > configure
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the
> > alignment.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > > > > calculate
> > > > > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > > > > >>>> maxDesiredWatermark
> > > > > > > > >>>> > will
> > > > > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause
> > and
> > > > > > resume.
> > > > > > > > Can
> > > > > > > > >>>> users
> > > > > > > > >>>> > > >> > > configure
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used
> > to
> > > > > align
> > > > > > > > other
> > > > > > > > >>>> > things in
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> future?
> > > > > > > > >>>> > > >> > > >>>>>>> > For
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> example,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting
> > global
> > > > rate
> > > > > > > > >>>> limits. This
> > > > > > > > >>>> > > >> > feature
> > > > > > > > >>>> > > >> > > >>>>>>> may also
> > > > > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> > > this
> > > > > > > > >>>> interface to
> > > > > > > > >>>> > > >> support
> > > > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly
> > declaring
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > > > such
> > > > > as
> > > > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong
> > opinion
> > > > on
> > > > > > > this.
> > > > > > > > I
> > > > > > > > >>>> think
> > > > > > > > >>>> > > >> they
> > > > > > > > >>>> > > >> > > have
> > > > > > > > >>>> > > >> > > >>>>>>> their
> > > > > > > > >>>> > > >> > > >>>>>>> > own
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily
> > use
> > > > > > > Interfaces
> > > > > > > > >>>> for
> > > > > > > > >>>> > > >> extending
> > > > > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > > > > TableSource/TableSink,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces
> > > rather
> > > > > than
> > > > > > > > >>>> methods in
> > > > > > > > >>>> > > >> this
> > > > > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > > > > >>>> > > >> > > >>>>>>> > you
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > > > > ability
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > > > > Interfaces
> > > > > > > > can
> > > > > > > > >>>> help
> > > > > > > > >>>> > to
> > > > > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you
> > > want
> > > > > to
> > > > > > > have
> > > > > > > > >>>> an
> > > > > > > > >>>> > > >> ability.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at
> > 18:13,
> > > > > > Becket
> > > > > > > > Qin
> > > > > > > > >>>> <
> > > > > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> > > > > > > > becket.qin@gmail.com>
> > > > > > > > >>>> <
> > > > > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com>
> > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation.
> > > > > > Apologies
> > > > > > > > >>>> that I
> > > > > > > > >>>> > > >> somehow
> > > > > > > > >>>> > > >> > > >>>>>>> misread a
> > > > > > > > >>>> > > >> > > >>>>>>> > > bunch
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they
> > were
> > > > > > > "assign".
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default
> > > > > > > implementation,
> > > > > > > > >>>> I was
> > > > > > > > >>>> > > >> > thinking
> > > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a
> > little
> > > > > > worried
> > > > > > > > >>>> about
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> proliferation of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the
> > most
> > > > > > > important
> > > > > > > > >>>> thing
> > > > > > > > >>>> > about
> > > > > > > > >>>> > > >> > > >>>>>>> interfaces is
> > > > > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> they
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In
> > > this
> > > > > > > case, I
> > > > > > > > >>>> prefer
> > > > > > > > >>>> > > >> adding
> > > > > > > > >>>> > > >> > > >>>>>>> new method
> > > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the
> > > > > > following
> > > > > > > > >>>> reasons:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest
> > > drawback
> > > > of
> > > > > > > > >>>> decorative
> > > > > > > > >>>> > > >> > interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> is which
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> > > > > > > > combinations
> > > > > > > > >>>> of
> > > > > > > > >>>> > > >> multiple
> > > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current
> > > FLIP,
> > > > > the
> > > > > > > > >>>> > > >> > withSplitsAlignment
> > > > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > only
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> applicable to the
> > > SourceReader
> > > > > > which
> > > > > > > > >>>> means it
> > > > > > > > >>>> > > >> can't
> > > > > > > > >>>> > > >> > > >>>>>>> decorate any
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> other
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface. From an
> > interface
> > > > > design
> > > > > > > > >>>> > perspective, a
> > > > > > > > >>>> > > >> > > >>>>>>> natural
> > > > > > > > >>>> > > >> > > >>>>>>> > question
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> why
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> not let
> > "AlignedSplitReader"
> > > > > extend
> > > > > > > > >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > also
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a
> > > split
> > > > > > reader
> > > > > > > > >>>> > implementing
> > > > > > > > >>>> > > >> > both
> > > > > > > > >>>> > > >> > > >>>>>>> > SplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would
> > > work,
> > > > > > > because
> > > > > > > > a
> > > > > > > > >>>> > source
> > > > > > > > >>>> > > >> > reader
> > > > > > > > >>>> > > >> > > >>>>>>> > implementing
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and
> > > > > > withSplitsAlignment
> > > > > > > > >>>> works.
> > > > > > > > >>>> > So why
> > > > > > > > >>>> > > >> > > isn't
> > > > > > > > >>>> > > >> > > >>>>>>> there an
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In
> > > the
> > > > > > > future,
> > > > > > > > if
> > > > > > > > >>>> > there
> > > > > > > > >>>> > > >> is a
> > > > > > > > >>>> > > >> > > new
> > > > > > > > >>>> > > >> > > >>>>>>> feature
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> added
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or
> > > pre-partitioned
> > > > > > data
> > > > > > > > >>>> aware),
> > > > > > > > >>>> > are
> > > > > > > > >>>> > > >> we
> > > > > > > > >>>> > > >> > > >>>>>>> going to
> > > > > > > > >>>> > > >> > > >>>>>>> > create
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> another
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader
> > such
> > > > as
> > > > > > > > >>>> > > >> SortedSplitReader or
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I
> > > > think
> > > > > > the
> > > > > > > > >>>> > additional
> > > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > like
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment
> > actually
> > > > > > > increases
> > > > > > > > >>>> the
> > > > > > > > >>>> > > >> > > understanding
> > > > > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> users
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> because they have to know
> > > what
> > > > > > > > decorative
> > > > > > > > >>>> > > >> interfaces
> > > > > > > > >>>> > > >> > > are
> > > > > > > > >>>> > > >> > > >>>>>>> there,
> > > > > > > > >>>> > > >> > > >>>>>>> > > which
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate
> > > and
> > > > > > which
> > > > > > > > >>>> > > >> combinations of
> > > > > > > > >>>> > > >> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and
> > > which
> > > > > are
> > > > > > > not.
> > > > > > > > >>>> > Ideally we
> > > > > > > > >>>> > > >> > want
> > > > > > > > >>>> > > >> > > >>>>>>> to avoid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > To
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing
> > > > > having
> > > > > > an
> > > > > > > > >>>> > interface of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it
> > > as
> > > > an
> > > > > > > > >>>> internal
> > > > > > > > >>>> > > >> interface
> > > > > > > > >>>> > > >> > > and
> > > > > > > > >>>> > > >> > > >>>>>>> let
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend
> > > it.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to
> > the
> > > > > > > > >>>> SourceReader
> > > > > > > > >>>> > with a
> > > > > > > > >>>> > > >> > > default
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> implementation
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid
> > logic
> > > > > > > branching
> > > > > > > > >>>> in the
> > > > > > > > >>>> > > >> source
> > > > > > > > >>>> > > >> > > >>>>>>> logic,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> especially
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that
> > the
> > > > vast
> > > > > > > > >>>> majority of
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not
> > all,
> > > > > would
> > > > > > > just
> > > > > > > > >>>> extend
> > > > > > > > >>>> > > >> from
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new
> > > method
> > > > to
> > > > > > the
> > > > > > > > >>>> > interface
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > > >>>>>>> effectively
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> give
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but
> > > > > simpler.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) A related design
> > principle
> > > > > that
> > > > > > > may
> > > > > > > > be
> > > > > > > > >>>> > worth
> > > > > > > > >>>> > > >> > > >>>>>>> discussing is how
> > > > > > > > >>>> > > >> > > >>>>>>> > > do
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> let
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations
> > > tell
> > > > > > Flink
> > > > > > > > >>>> what
> > > > > > > > >>>> > > >> capability
> > > > > > > > >>>> > > >> > > is
> > > > > > > > >>>> > > >> > > >>>>>>> > supported
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> what
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally
> > speaking I
> > > > > feel
> > > > > > > the
> > > > > > > > >>>> most
> > > > > > > > >>>> > > >> intuitive
> > > > > > > > >>>> > > >> > > >>>>>>> place to me
> > > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> in
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that
> > > is
> > > > > the
> > > > > > > > >>>> entrance
> > > > > > > > >>>> > of the
> > > > > > > > >>>> > > >> > > >>>>>>> entire Source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> connector
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> logic.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Based on the above
> > thoughts,
> > > I
> > > > am
> > > > > > > > >>>> wondering
> > > > > > > > >>>> > if the
> > > > > > > > >>>> > > >> > > >>>>>>> following
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be easier to
> > understand
> > > > by
> > > > > > the
> > > > > > > > >>>> users.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Change
> > > "withSplitsAlignment"
> > > > to
> > > > > > > > >>>> internal
> > > > > > > > >>>> > > >> interface,
> > > > > > > > >>>> > > >> > > >>>>>>> let both
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from
> > > it,
> > > > > > with
> > > > > > > a
> > > > > > > > >>>> default
> > > > > > > > >>>> > > >> no-op
> > > > > > > > >>>> > > >> > > >>>>>>> > > implementation.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > > > > > > > >>>> > > >> supportSplitsAlignment()"
> > > > > > > > >>>> > > >> > > to
> > > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > > >>>> > > >> > > >>>>>>> > Source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> > > > > > > > implementation
> > > > > > > > >>>> > returning
> > > > > > > > >>>> > > >> > > >>>>>>> false. Sources
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment
> > > logic
> > > > > can
> > > > > > > > >>>> change
> > > > > > > > >>>> > this to
> > > > > > > > >>>> > > >> > > >>>>>>> return true,
> > > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits()
> > > > > methods
> > > > > > in
> > > > > > > > the
> > > > > > > > >>>> > > >> > SourceReader
> > > > > > > > >>>> > > >> > > /
> > > > > > > > >>>> > > >> > > >>>>>>> > > SplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> if
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> needed.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new
> > > > > optional
> > > > > > > > >>>> feature is
> > > > > > > > >>>> > > >> going
> > > > > > > > >>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> be added
> > > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature
> > > > requires
> > > > > > the
> > > > > > > > >>>> > awareness
> > > > > > > > >>>> > > >> from
> > > > > > > > >>>> > > >> > > >>>>>>> Flink, we
> > > > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> more
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at
> > 4:05
> > > PM
> > > > > > Dawid
> > > > > > > > >>>> > Wysakowicz <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > > >>>> > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > > > > > > > >>>> > dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > > > > >>>> `AlignedSplitReader`
> > > > > > > > >>>> > > >> > > interface
> > > > > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> stop &
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > > > > implemented
> > > > > > > > for
> > > > > > > > >>>> > Kafka and
> > > > > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> correct?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know
> > > > though,
> > > > > > > those
> > > > > > > > >>>> are
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> only
> > > > > > > > >>>> > > >> > > >>>>>>> sources
> > > > > > > > >>>> > > >> > > >>>>>>> > which
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> consume
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple
> > > > splits
> > > > > > and
> > > > > > > > >>>> thus
> > > > > > > > >>>> > > >> alignment
> > > > > > > > >>>> > > >> > > >>>>>>> applies.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting"
> > > split
> > > > > > > > alignment
> > > > > > > > >>>> in
> > > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> > > > does
> > > > > > not
> > > > > > > > >>>> implement
> > > > > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > > > > WithSplitsAlignment
> > > > > > > > >>>> can
> > > > > > > > >>>> > > >> instead
> > > > > > > > >>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> added to
> > > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader
> > (i.e.
> > > > > > > > >>>> > KafkaSourceReader) to
> > > > > > > > >>>> > > >> > make
> > > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually
> > supports
> > > > it.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern.
> > > > Hmm, I
> > > > > > > think
> > > > > > > > >>>> we
> > > > > > > > >>>> > could
> > > > > > > > >>>> > > >> > > >>>>>>> actually do
> > > > > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> Given
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation
> > of
> > > > the
> > > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> rather
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to
> > the
> > > > > > > > >>>> corresponding
> > > > > > > > >>>> > method
> > > > > > > > >>>> > > >> of
> > > > > > > > >>>> > > >> > > >>>>>>> > SplitFetcher),
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the
> > > > > actual
> > > > > > > > source
> > > > > > > > >>>> > > >> > > >>>>>>> implementations. This
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> solution
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has the downside though.
> > > > Authors
> > > > > of
> > > > > > > new
> > > > > > > > >>>> > sources
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > > >>>>>>> have to do
> > > > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> things: extend from
> > > > > > > AlignedSplitReader
> > > > > > > > >>>> and
> > > > > > > > >>>> > > >> implement
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> > > > > > > > >>>> AlignedSplitReader.
> > > > > > > > >>>> > I
> > > > > > > > >>>> > > >> would
> > > > > > > > >>>> > > >> > > be
> > > > > > > > >>>> > > >> > > >>>>>>> fine
> > > > > > > > >>>> > > >> > > >>>>>>> > with
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> such
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What
> > others
> > > > > think?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> > > > motivation
> > > > > > > > >>>> section, is
> > > > > > > > >>>> > it
> > > > > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > > > > partitions
> > > > > > > > >>>> and
> > > > > > > > >>>> > one of
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> partition
> > > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In
> > this
> > > > > > > > situation,
> > > > > > > > >>>> > watermark
> > > > > > > > >>>> > > >> > for
> > > > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> > > > reading
> > > > > > > recent
> > > > > > > > >>>> data
> > > > > > > > >>>> > from
> > > > > > > > >>>> > > >> > other
> > > > > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> > > > won't
> > > > > > > hold
> > > > > > > > >>>> back
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > overall
> > > > > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that
> > > we
> > > > > may
> > > > > > > have
> > > > > > > > >>>> late
> > > > > > > > >>>> > data
> > > > > > > > >>>> > > >> for
> > > > > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the
> > > > watermark.
> > > > > > > > >>>> Watermark of
> > > > > > > > >>>> > an
> > > > > > > > >>>> > > >> > > >>>>>>> operator is the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> minimum
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all
> > > splits[1]
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question
> > about
> > > > the
> > > > > > > > >>>> restart. Say
> > > > > > > > >>>> > > >> split
> > > > > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > > > > completed.
> > > > > > > job
> > > > > > > > >>>> > failed and
> > > > > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because
> > alignment
> > > > > > > decision
> > > > > > > > >>>> is not
> > > > > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> > > > until
> > > > > > we
> > > > > > > > get
> > > > > > > > >>>> a
> > > > > > > > >>>> > cycle
> > > > > > > > >>>> > > >> of
> > > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > > > > saying
> > > > > > > this
> > > > > > > > >>>> > corner is
> > > > > > > > >>>> > > >> a
> > > > > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is
> > > correct.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark
> > > alignment
> > > > is
> > > > > > > sort
> > > > > > > > >>>> of a
> > > > > > > > >>>> > > >> general
> > > > > > > > >>>> > > >> > use
> > > > > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related
> > methods
> > > to
> > > > > > > > >>>> SourceReader
> > > > > > > > >>>> > > >> directly
> > > > > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new
> > interface
> > > > of
> > > > > > > > >>>> > > >> WithSplitAssignment?
> > > > > > > > >>>> > > >> > > We
> > > > > > > > >>>> > > >> > > >>>>>>> can
> > > > > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > > > > backwards
> > > > > > > > >>>> > > >> compatibility
> > > > > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can
> > provide
> > > a
> > > > > > > default
> > > > > > > > >>>> > > >> > implementation.
> > > > > > > > >>>> > > >> > > >>>>>>> How would
> > > > > > > > >>>> > > >> > > >>>>>>> > we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> do
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a
> > > no-op?
> > > > > Is
> > > > > > it
> > > > > > > > >>>> better
> > > > > > > > >>>> > than
> > > > > > > > >>>> > > >> > > having
> > > > > > > > >>>> > > >> > > >>>>>>> an
> > > > > > > > >>>> > > >> > > >>>>>>> > opt-in
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface? The default
> > > > > > implementation
> > > > > > > > >>>> would
> > > > > > > > >>>> > have
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> added
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> exclusively
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> in
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader
> > > > > interface.
> > > > > > By
> > > > > > > > >>>> the way
> > > > > > > > >>>> > > >> notice
> > > > > > > > >>>> > > >> > > >>>>>>> > > SourceReaderBase
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> does extend from
> > > > > > WithSplitsAlignment,
> > > > > > > > so
> > > > > > > > >>>> > > >> effectively
> > > > > > > > >>>> > > >> > > all
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> implementations
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case.
> > To
> > > > be
> > > > > > > > honest I
> > > > > > > > >>>> > think
> > > > > > > > >>>> > > >> it is
> > > > > > > > >>>> > > >> > > >>>>>>> impossible
> > > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader
> > > > > > interface
> > > > > > > > >>>> directly
> > > > > > > > >>>> > by
> > > > > > > > >>>> > > >> end
> > > > > > > > >>>> > > >> > > >>>>>>> users.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > > > > SplitReader
> > > > > > > > >>>> interface
> > > > > > > > >>>> > > >> > probably
> > > > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support
> > throttling
> > > at
> > > > > the
> > > > > > > > split
> > > > > > > > >>>> > > >> > granularity.
> > > > > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > > > > >>>> > > >> > > >>>>>>> > add
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> > > > public
> > > > > > > > >>>> interface
> > > > > > > > >>>> > > >> section as
> > > > > > > > >>>> > > >> > > >>>>>>> well?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the
> > > > > > beginning.
> > > > > > > > See
> > > > > > > > >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using
> > > the
> > > > > > method
> > > > > > > > >>>> name
> > > > > > > > >>>> > > >> > assignSplits
> > > > > > > > >>>> > > >> > > >>>>>>> here,
> > > > > > > > >>>> > > >> > > >>>>>>> > given
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing
> > the
> > > > > split
> > > > > > > > >>>> > assignments? It
> > > > > > > > >>>> > > >> > > seems
> > > > > > > > >>>> > > >> > > >>>>>>> > something
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> The method's called
> > > > > *alignSplits*,
> > > > > > > not
> > > > > > > > >>>> > assign. Do
> > > > > > > > >>>> > > >> you
> > > > > > > > >>>> > > >> > > >>>>>>> still
> > > > > > > > >>>> > > >> > > >>>>>>> > prefer
> > > > > > > > >>>> > > >> > > >>>>>>> > > a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> different name for that?
> > > > > > Personally,
> > > > > > > I
> > > > > > > > am
> > > > > > > > >>>> > open for
> > > > > > > > >>>> > > >> > > >>>>>>> suggestions
> > > > > > > > >>>> > > >> > > >>>>>>> > > here.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket
> > > Qin
> > > > > > > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the
> > > effort,
> > > > > > > > >>>> Sebastion. I
> > > > > > > > >>>> > think
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> motivation
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> makes a
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few
> > > > > > suggestions
> > > > > > > /
> > > > > > > > >>>> > questions.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark
> > > alignment
> > > > is
> > > > > > > sort
> > > > > > > > >>>> of a
> > > > > > > > >>>> > > >> general
> > > > > > > > >>>> > > >> > use
> > > > > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related
> > methods
> > > to
> > > > > > > > >>>> SourceReader
> > > > > > > > >>>> > > >> directly
> > > > > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new
> > interface
> > > > of
> > > > > > > > >>>> > > >> WithSplitAssignment?
> > > > > > > > >>>> > > >> > > We
> > > > > > > > >>>> > > >> > > >>>>>>> can
> > > > > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > > > > backwards
> > > > > > > > >>>> > > >> compatibility
> > > > > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > > > > SplitReader
> > > > > > > > >>>> interface
> > > > > > > > >>>> > > >> > probably
> > > > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support
> > throttling
> > > at
> > > > > the
> > > > > > > > split
> > > > > > > > >>>> > > >> > granularity.
> > > > > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > > > > >>>> > > >> > > >>>>>>> > add
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> > > > public
> > > > > > > > >>>> interface
> > > > > > > > >>>> > > >> section as
> > > > > > > > >>>> > > >> > > >>>>>>> well?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using
> > > the
> > > > > > method
> > > > > > > > >>>> name
> > > > > > > > >>>> > > >> > assignSplits
> > > > > > > > >>>> > > >> > > >>>>>>> here,
> > > > > > > > >>>> > > >> > > >>>>>>> > given
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing
> > the
> > > > > split
> > > > > > > > >>>> > assignments? It
> > > > > > > > >>>> > > >> > > seems
> > > > > > > > >>>> > > >> > > >>>>>>> > something
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> > 11:39
> > > > PM
> > > > > > > Steven
> > > > > > > > >>>> Wu <
> > > > > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> > > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> > > > > > > > stevenz3wu@gmail.com>
> > > > > > > > >>>> <
> > > > > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > > > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com>
> > wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> However, a single source
> > > > operator
> > > > > > may
> > > > > > > > >>>> read
> > > > > > > > >>>> > data
> > > > > > > > >>>> > > >> from
> > > > > > > > >>>> > > >> > > >>>>>>> multiple
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g.,
> > > > multiple
> > > > > > > Kafka
> > > > > > > > >>>> > > >> partitions,
> > > > > > > > >>>> > > >> > > such
> > > > > > > > >>>> > > >> > > >>>>>>> that
> > > > > > > > >>>> > > >> > > >>>>>>> > even
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> with
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the
> > > source
> > > > > > > operator
> > > > > > > > >>>> may
> > > > > > > > >>>> > need
> > > > > > > > >>>> > > >> to
> > > > > > > > >>>> > > >> > > >>>>>>> buffer
> > > > > > > > >>>> > > >> > > >>>>>>> > > excessive
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> amount
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits
> > > data
> > > > > > > faster
> > > > > > > > >>>> than
> > > > > > > > >>>> > > >> another.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> > > > motivation
> > > > > > > > >>>> section, is
> > > > > > > > >>>> > it
> > > > > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > > > > partitions
> > > > > > > > >>>> and
> > > > > > > > >>>> > one of
> > > > > > > > >>>> > > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> partition
> > > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In
> > this
> > > > > > > > situation,
> > > > > > > > >>>> > watermark
> > > > > > > > >>>> > > >> > for
> > > > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> > > > reading
> > > > > > > recent
> > > > > > > > >>>> data
> > > > > > > > >>>> > from
> > > > > > > > >>>> > > >> > other
> > > > > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> > > > won't
> > > > > > > hold
> > > > > > > > >>>> back
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > overall
> > > > > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that
> > > we
> > > > > may
> > > > > > > have
> > > > > > > > >>>> late
> > > > > > > > >>>> > data
> > > > > > > > >>>> > > >> for
> > > > > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question
> > about
> > > > the
> > > > > > > > >>>> restart. Say
> > > > > > > > >>>> > > >> split
> > > > > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > > > > completed.
> > > > > > > job
> > > > > > > > >>>> > failed and
> > > > > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because
> > alignment
> > > > > > > decision
> > > > > > > > >>>> is not
> > > > > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> > > > until
> > > > > > we
> > > > > > > > get
> > > > > > > > >>>> a
> > > > > > > > >>>> > cycle
> > > > > > > > >>>> > > >> of
> > > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > > > > saying
> > > > > > > this
> > > > > > > > >>>> > corner is
> > > > > > > > >>>> > > >> a
> > > > > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> > 8:20
> > > AM
> > > > > > > Thomas
> > > > > > > > >>>> Weise <
> > > > > > > > >>>> > > >> > > >>>>>>> thw@apache.org> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <
> > thw@apache.org
> > > >
> > > > <
> > > > > > > > >>>> thw@apache.org>
> > > > > > > > >>>> > <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting"
> > > split
> > > > > > > > alignment
> > > > > > > > >>>> in
> > > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> > > > does
> > > > > > not
> > > > > > > > >>>> implement
> > > > > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > > > > WithSplitsAlignment
> > > > > > > > >>>> can
> > > > > > > > >>>> > > >> instead
> > > > > > > > >>>> > > >> > be
> > > > > > > > >>>> > > >> > > >>>>>>> added to
> > > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader
> > (i.e.
> > > > > > > > >>>> > KafkaSourceReader) to
> > > > > > > > >>>> > > >> > make
> > > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually
> > supports
> > > > it.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> > 4:57
> > > AM
> > > > > > > > >>>> Konstantin
> > > > > > > > >>>> > Knauf <
> > > > > > > > >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <
> > > > > knaufk@apache.org
> > > > > > >
> > > > > > > <
> > > > > > > > >>>> > > >> > knaufk@apache.org
> > > > > > > > >>>> > > >> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > > > > >>>> `AlignedSplitReader`
> > > > > > > > >>>> > > >> > > interface
> > > > > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> stop
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> &
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > > > > implemented
> > > > > > > > for
> > > > > > > > >>>> > Kafka and
> > > > > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> correct?
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it
> > > is
> > > > > > > valuable
> > > > > > > > >>>> to
> > > > > > > > >>>> > > >> complete
> > > > > > > > >>>> > > >> > the
> > > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> aligned
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> > 12:36
> > > > PM
> > > > > > > Dawid
> > > > > > > > >>>> > Wysakowicz
> > > > > > > > >>>> > > >> <
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having
> > worked
> > > > on
> > > > > > it,
> > > > > > > I
> > > > > > > > >>>> > support it
> > > > > > > > >>>> > > >> ;)
> > > > > > > > >>>> > > >> > I
> > > > > > > > >>>> > > >> > > >>>>>>> think we
> > > > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread
> > soonish,
> > > as
> > > > > > there
> > > > > > > > >>>> are no
> > > > > > > > >>>> > > >> concerns
> > > > > > > > >>>> > > >> > > so
> > > > > > > > >>>> > > >> > > >>>>>>> far.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27,
> > > Sebastian
> > > > > > > Mattheis
> > > > > > > > >>>> wrote:
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a
> > > > discussion
> > > > > > on
> > > > > > > > >>>> FLIP 217
> > > > > > > > >>>> > [1]
> > > > > > > > >>>> > > >> for
> > > > > > > > >>>> > > >> > > an
> > > > > > > > >>>> > > >> > > >>>>>>> > extension
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to
> > > perform
> > > > > > > > alignment
> > > > > > > > >>>> also
> > > > > > > > >>>> > in
> > > > > > > > >>>> > > >> > > >>>>>>> SplitReaders. To
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> so,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able
> > to
> > > > > > suspend
> > > > > > > > and
> > > > > > > > >>>> > resume
> > > > > > > > >>>> > > >> > reading
> > > > > > > > >>>> > > >> > > >>>>>>> from
> > > > > > > > >>>> > > >> > > >>>>>>> > split
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator
> > > > > > coordinates
> > > > > > > > and
> > > > > > > > >>>> > controlls
> > > > > > > > >>>> > > >> > > >>>>>>> suspend and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> To
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> gather information about
> > > > current
> > > > > > > > >>>> watermarks
> > > > > > > > >>>> > of the
> > > > > > > > >>>> > > >> > > >>>>>>> SplitReaders,
> > > > > > > > >>>> > > >> > > >>>>>>> > we
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> extend
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the internal
> > > > > > > WatermarkOutputMulitplexer
> > > > > > > > >>>> and
> > > > > > > > >>>> > report
> > > > > > > > >>>> > > >> > > >>>>>>> watermarks to
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this
> > FLIP
> > > > [2],
> > > > > > > > >>>> prototyped
> > > > > > > > >>>> > by
> > > > > > > > >>>> > > >> Arvid
> > > > > > > > >>>> > > >> > > >>>>>>> Heise and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> revised
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid
> > Wysakowicz
> > > > (He
> > > > > > did
> > > > > > > > >>>> most of
> > > > > > > > >>>> > the
> > > > > > > > >>>> > > >> > work.)
> > > > > > > > >>>> > > >> > > >>>>>>> and me.
> > > > > > > > >>>> > > >> > > >>>>>>> > The
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> changes
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible
> > in a
> > > > way
> > > > > > > that
> > > > > > > > if
> > > > > > > > >>>> > affected
> > > > > > > > >>>> > > >> > > >>>>>>> components do
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> not
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the
> > > > > > behavior
> > > > > > > is
> > > > > > > > >>>> as
> > > > > > > > >>>> > before.
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [2]
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> --
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> twitter.com/snntrablehttps://github.com/knaufk
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > > >>>> > > >> > >
> > > > > > > > >>>> > > >> >
> > > > > > > > >>>> > > >>
> > > > > > > > >>>> > > >
> > > > > > > > >>>> >
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
+1 :)

śr., 29 cze 2022 o 17:23 Becket Qin <be...@gmail.com> napisał(a):

>  Thanks for the explanation, Piotr.
>
> So it looks like we have a conclusion here.
>
> 1. Regarding the supportsPausingSplits() method, I feel it brings more
> confusion while the benefit is marginal, so I prefer not having that if
> possible. It would be good to also hear @Thomas Weise <th...@apache.org>'s
> opinion as he mentioned some concern earlier.
> 2. Let's add the feature knob then. In the future we can simply ignore the
> configuration when deprecating it.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
> > Hi,
> >
> > I mean I'm fine with throwing an exception by default in Flink 1.16 in
> the
> > "Case 2", but I think we need to provide a way to workaround it for
> example
> > via a feature toggle, if it's an easy thing to do. And it seems to be a
> > simple thing.
> >
> > However this is orthogonal to the `supportsPausingSplits()` issue. I
> don't
> > have a big preference whether
> >   a) the exception should originate on JM, using `default boolean
> > supportsPausingSplits() { return false; }` (as currently proposed in the
> > FLIP),
> >   b) or on the TM from `pauseOrResumeSplits()` throwing
> > `UnsupportedOperationException` as you are proposing.
> >
> > a) fails earlier, so it's more user friendly from this perspective, but
> it
> > provides more possibilities for bugs/inconsistencies for connector
> > developers, since `supportsPausingSplits()` would have to be kept in sync
> > with `pauseOrResumeSplits()`.
> >
> > Best,
> > Piotrek
> >
> > śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com> napisał(a):
> >
> > > Hi Piotr,
> > >
> > > Just to make sure we are on the same page. There are two cases for the
> > > existing FLIP-182 users:
> > >
> > > Case 1: Each source reader only has one split assigned. This is the
> > > targeted case for FLIP-182.
> > > Case 2: Each source reader has multiple splits assigned. This is the
> > flaky
> > > case that may or may not work.
> > >
> > > With solution 1, the users of case 1 won't be impacted. The users in
> > case 2
> > > will receive an exception which they won't get at the moment.
> > >
> > > Do you mean we should not throw an exception in case 2? Personally I
> feel
> > > that is OK and could have been done in FLIP-182 itself because it's
> not a
> > > designed use case. As a user I may see a big variation of the job state
> > > sizes from time to time and I am not able to rely on this feature to
> plan
> > > my resources and uphold the SLA.
> > >
> > > That said, if you have a strong opinion on this, I am fine with having
> > the
> > > configuration like "allow.coarse-grained.watermark.alignment" with the
> > > default value set to false, given that a configuration is much easier
> to
> > > deprecate than a method.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Wed, Jun 29, 2022 at 8:02 PM Piotr Nowojski <pn...@apache.org>
> > > wrote:
> > >
> > > > Thanks for the explanation.
> > > >
> > > > > 2. It is fully compatible with FLIP-182, if we consider it as the
> > right
> > > > > thing to throw an exception for readers reading from multiple
> splits
> > > > > without supporting split pausing.
> > > >
> > > > I think that's fine. But the question is should we provide a
> workaround
> > > for
> > > > existing users? IMO if it's easy to do, we should.
> > > >
> > > > > I actually think neither solution 1 or 2 breaks FLIP-182 users,
> > > >
> > > > They do. User has currently a working Flink 1.15 deployment, where
> > > > watermark alignment maybe is not behaving ideally, but it's working
> to
> > > some
> > > > extent and you are proposing to throw them an exception after
> upgrading
> > > > Flink, without any workaround (short of implementing a feature, which
> > is
> > > a
> > > > very problematic requirement). Given that costly upgrades are one of
> > the
> > > > major complaints, I would be definitely in favor of option 2. Given
> the
> > > > most likely small actually affected user base, I would be +1 for
> > > solution 2
> > > > with throwing an exception by default.
> > > >
> > > > Best,
> > > > Piotrek
> > > >
> > > >
> > > > śr., 29 cze 2022 o 12:55 Becket Qin <be...@gmail.com>
> napisał(a):
> > > >
> > > > > Hi Piotr,
> > > > >
> > > > > Please see the reply inline below:
> > > > >
> > > > > On Wed, Jun 29, 2022 at 5:11 PM Piotr Nowojski <
> pnowojski@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Becket,
> > > > > >
> > > > > > > My main concern of having a supportsPausingSplits() knob
> > > > > >
> > > > > > What is the problem with `supportsPausingSplits()` that you see?
> > Do
> > > > you
> > > > > > want to remove it?
> > > > > >
> > > > > Just to make sure we are on the same page, I assume we are talking
> > > about
> > > > > this supportingPausingSplits() method in the Source interface. If
> we
> > go
> > > > > with the obligatory features addition path, having this method
> seems
> > > > > misleading. And also, later on at some point when we see all the
> > > sources
> > > > > have implemented this feature, we will have to worry about
> > deprecating
> > > > this
> > > > > method, which is backwards incompatible.
> > > > >
> > > > >
> > > > > > Also I don't understand your proposal for Solution 1. How do you
> > want
> > > > to
> > > > > > decide whether to throw an exception? For that we would need to
> > have
> > > > > > `supportsPausingSplits()`, right?
> > > > > >
> > > > >
> > > > > What I am thinking is the following:
> > > > >
> > > > > 1. The Flink framework always assumes split pausing is supported
> and
> > > just
> > > > > naively invokes SourceReader#pauseOrResumeSplits().
> > > > > 2. The SourceReaderBase will basically again try to ask the
> > SplitReader
> > > > to
> > > > > pause the splits.
> > > > > 3. Because the default implementation throws an
> > > > > UnsupportedOperationException, if the source developer did not
> > override
> > > > it,
> > > > > this exception will be thrown and bubbled up.
> > > > > 4. After catching this exception, the SourceReaderBase will just
> > check
> > > if
> > > > > there is only one split that is currently assigned to the split
> > reader.
> > > > If
> > > > > so, it swallows the exception, stops polling the split reader and
> > > returns
> > > > > NOTHING_AVAILABLE. This is the same as the current logic in the
> > > > > SourceOperator. If we are not comfortable with moving this logic to
> > the
> > > > > SourceReaderBase, we can also just keep the logic there and simply
> > let
> > > > > SourceOperator remember if there are more than one split assigned
> to
> > > the
> > > > > source reader, when SourceOperator.handleAddSplitsEvent() is
> invoked.
> > > > >
> > > > > This way the existing FLIP-182 users won't be impacted by this
> FLIP.
> > > For
> > > > > those source readers that only have one split assigned, it works
> fine
> > > > > without any change. For those source readers with multiple splits
> > > > assigned,
> > > > > they are already in a limp state with unpredictable side effects.
> We
> > > > might
> > > > > as well let them know this instead of pretending the
> > > > > coarse-grained watermark alignment works fine for them.
> > > > >
> > > > > The advantage of this solution is that we don't have to do anything
> > > after
> > > > > this. That would work fine as the final state, as in:
> > > > > 1. We have already done the best we can do for the Sources that do
> > not
> > > > > support split pausing.
> > > > > 2. It is fully compatible with FLIP-182, if we consider it as the
> > right
> > > > > thing to throw an exception for readers reading from multiple
> splits
> > > > > without supporting split pausing.
> > > > > 3. There is nothing to deprecate in the future.
> > > > >
> > > > >
> > > > > >
> > > > > > If so, I would prefer solution 2, to provide a graceful migration
> > > path
> > > > > for
> > > > > > any users that are already using FLIP-182 with multiple splits
> per
> > > > > > operator. I don't think there are many of those, but such a flag
> > > seems
> > > > > easy
> > > > > > to implement while making migration easier. Having said that,
> > > > technically
> > > > > > we could change the behaviour and start throwing an exception
> > always
> > > in
> > > > > > such a case, as this feature is marked as Experimental.
> > > > > >
> > > > >
> > > > > I actually think neither solution 1 or 2 breaks FLIP-182 users, but
> > > > > solution 2 needs a deprecation process for the option in the
> future.
> > > > >
> > > > >
> > > > > >
> > > > > > Best,
> > > > > > Piotrek
> > > > > >
> > > > > > śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com>
> > > napisał(a):
> > > > > >
> > > > > > > Hi Sebastian,
> > > > > > >
> > > > > > > Regarding the question,
> > > > > > >
> > > > > > > >
> > > > > > > > @Becket: I'm not sure about the intention of solution 1. Can
> > you
> > > > > > explain
> > > > > > > > that a bit more? In particular, I don't understand: "The
> > > reasoning
> > > > > > behind
> > > > > > > > this solution is that existing users should only use the
> > > > > > > > coarse watermark alignment when a source reader only reads
> > from a
> > > > > > single
> > > > > > > > split." Why should a user not use coarse watermark alignment
> > when
> > > > > > source
> > > > > > > > reader reads from multiple splits? The split alignment uses
> the
> > > > > "coarse
> > > > > > > > watermark", i.e., maxDesiredWatermark, as described in the
> FLIP
> > > for
> > > > > > > > alignment.
> > > > > > >
> > > > > > >
> > > > > > > Imagine you have a source reader reading from two splits, and
> the
> > > > > > > watermarks look like the following:
> > > > > > > 1. Watermark of Split 1: 10:00 AM Jun 29,
> > > > > > > 2. Watermark of Split 2: 11:00 AM Jun 29
> > > > > > > 3. maxDesiredWatermark:10:30 AM Jun 29
> > > > > > >
> > > > > > > At this point, the source reader's watermark is 10:00 AM which
> is
> > > > lower
> > > > > > > than the maxDesiredWatermark, so the source reader won't be
> > paused
> > > > from
> > > > > > > reading. However, because the source reader cannot specify
> which
> > > > split
> > > > > to
> > > > > > > read from, if it continues to read, the watermark gap between
> the
> > > two
> > > > > > > splits may become even bigger. This essentially fails the main
> > > > purpose
> > > > > of
> > > > > > > watermark alignment - to reduce the number of records buffered
> in
> > > the
> > > > > > > state. This does not necessarily happen, but this is not what
> > > > FLIP-182
> > > > > > was
> > > > > > > designed for to begin with. So I'd rather avoid extending the
> > > feature
> > > > > > > to that case.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jiangjie (Becket) Qin
> > > > > > >
> > > > > > > On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> > > > > > > sebastian@ververica.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > (Sorry I didn't send to the list but only to Becket. My bad
> and
> > > > > thanks
> > > > > > > > Piotr. Next attempt:)
> > > > > > > >
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > Thanks for pushing the FLIP. I would drive it and would be
> > happy
> > > to
> > > > > get
> > > > > > > > back to you, @Thomas, for reviews. (Sorry for low
> > responsiveness,
> > > > > there
> > > > > > > > were several efforts with high priority on my side ...) As
> next
> > > > > step, I
> > > > > > > > would revise the FLIP to get the discussion concluded.
> > > > > > > >
> > > > > > > > However, as Becket mentioned I feel that some things are
> still
> > > not
> > > > > > clear
> > > > > > > > yet:
> > > > > > > >
> > > > > > > > Re: Thomas
> > > > > > > >>
> > > > > > > >> However, from a user perspective, should the split level
> > > alignment
> > > > > be
> > > > > > > >>> an opt-in feature, at least for a few releases? If yes,
> then
> > we
> > > > > would
> > > > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > > > understand
> > > > > > > >>> won't be part of the revised FLIP. Such control may be
> > > > beneficial:
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> * Compare runtime behavior with split level alignment
> on/off
> > > > > > > >>> * Allow use of sources that don't implement pausing splits
> > yet
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> The second point would, from the user's perspective, be
> > > necessary
> > > > > for
> > > > > > > >>> backward compatibility? While the interface aspect and
> source
> > > > > > > >>> compatibility has been discussed in great detail, I don't
> > think
> > > > it
> > > > > > > >>> would be desirable if an application that already uses
> > > alignment
> > > > > > fails
> > > > > > > >>> after upgrading to the new Flink version, forcing users to
> > lock
> > > > > step
> > > > > > > >>> modify sources for the new non-optional split level
> > alignment.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> So I think clarification of the compatibility aspect on the
> > > FLIP
> > > > > page
> > > > > > > >>> would be necessary.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> That is a good point. Currently FLIP-182 is marked as
> > > > experimental.
> > > > > So
> > > > > > > >> technically speaking it could change. That said, I agree
> that
> > it
> > > > > would
> > > > > > > be
> > > > > > > >> good to not break the existing sources.
> > > > > > > >>
> > > > > > > >> My main concern of having a supportsPausingSplits() knob is
> > that
> > > > > this
> > > > > > > >> might be something requiring code change on future
> > derepcation.
> > > I
> > > > am
> > > > > > > >> thinking of two potential solutions:
> > > > > > > >>
> > > > > > > >> Solution 1:
> > > > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is
> > invoked,
> > > if
> > > > > the
> > > > > > > >> source reader only has one split assigned, the source reader
> > > > simply
> > > > > > > stops
> > > > > > > >> polling but just returns NOTHING_AVAILABLE. If there are
> more
> > > than
> > > > > one
> > > > > > > >> splits assigned, it throws an exception with a message such
> as
> > > > "The
> > > > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > > > alignment
> > > > > > > >> when assigned a single split. There are more than one split
> > > > assigned
> > > > > > to
> > > > > > > the
> > > > > > > >> SplitReader".
> > > > > > > >> The reasoning behind this solution is that existing users
> > should
> > > > > only
> > > > > > > use
> > > > > > > >> the coarse watermark alignment when a source reader only
> reads
> > > > from
> > > > > a
> > > > > > > >> single split. Reading from more than one split might have
> > > unwanted
> > > > > > side
> > > > > > > >> effects, so we might as well throw an exception in this
> case.
> > > > > > > >>
> > > > > > > >> Solution 2:
> > > > > > > >> Having a configuration
> > > > "enable.coarse-grained.watermark.alignment",
> > > > > > the
> > > > > > > >> default value is false. Once it is set to true, we will
> allow
> > > > > > > >> coarse-grained watermark alignment if a SplitReader is
> > pausable.
> > > > > > > >> This solution allows users to keep the current FLIP-182
> > > behavior,
> > > > > with
> > > > > > > >> the risk of side effects.
> > > > > > > >>
> > > > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > > > >> coarse-grained watermark alignment could be frustrating to
> the
> > > > users
> > > > > > > >> when more than one split is assigned. So we might as well
> not
> > > > > support
> > > > > > > it at
> > > > > > > >> all. And also there is nothing to deprecate in the future
> with
> > > > this
> > > > > > > >> solution.
> > > > > > > >>
> > > > > > > >> What do you think?
> > > > > > > >>
> > > > > > > >
> > > > > > > > @Thomas: My understanding is that you intend a simple
> > switch/knob
> > > > to
> > > > > > test
> > > > > > > > w/ and w/o (split) watermark alignment, right? Isn't the
> > > > > coarse-grained
> > > > > > > w/
> > > > > > > > vs w/o watermark alignment sufficient for that? Or do you
> think
> > > > that
> > > > > > > > switching watermark aligment explicitly on split level is
> > > required?
> > > > > > > >
> > > > > > > > @Becket: I'm not sure about the intention of solution 1. Can
> > you
> > > > > > explain
> > > > > > > > that a bit more? In particular, I don't understand: "The
> > > reasoning
> > > > > > behind
> > > > > > > > this solution is that existing users should only use the
> coarse
> > > > > > watermark
> > > > > > > > alignment when a source reader only reads from a single
> split."
> > > Why
> > > > > > > > should a user not use coarse watermark alignment when source
> > > reader
> > > > > > reads
> > > > > > > > from multiple splits? The split alignment uses the "coarse
> > > > > watermark",
> > > > > > > > i.e., maxDesiredWatermark, as described in the FLIP for
> > > alignment.
> > > > > > > >
> > > > > > > > Could you please clarify?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Sebastian
> > > > > > > >
> > > > > > > > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks for the feedback, Thomas and Steve. And thanks Piotr
> > for
> > > > the
> > > > > > > >> patient and detailed discussion.
> > > > > > > >>
> > > > > > > >> Let's move forward with option 1 then.
> > > > > > > >>
> > > > > > > >> Re: Thomas
> > > > > > > >>
> > > > > > > >> However, from a user perspective, should the split level
> > > alignment
> > > > > be
> > > > > > > >>> an opt-in feature, at least for a few releases? If yes,
> then
> > we
> > > > > would
> > > > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > > > understand
> > > > > > > >>> won't be part of the revised FLIP. Such control may be
> > > > beneficial:
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> * Compare runtime behavior with split level alignment
> on/off
> > > > > > > >>> * Allow use of sources that don't implement pausing splits
> > yet
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> The second point would, from the user's perspective, be
> > > necessary
> > > > > for
> > > > > > > >>> backward compatibility? While the interface aspect and
> source
> > > > > > > >>> compatibility has been discussed in great detail, I don't
> > think
> > > > it
> > > > > > > >>> would be desirable if an application that already uses
> > > alignment
> > > > > > fails
> > > > > > > >>> after upgrading to the new Flink version, forcing users to
> > lock
> > > > > step
> > > > > > > >>> modify sources for the new non-optional split level
> > alignment.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> So I think clarification of the compatibility aspect on the
> > > FLIP
> > > > > page
> > > > > > > >>> would be necessary.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> That is a good point. Currently FLIP-182 is marked as
> > > > experimental.
> > > > > So
> > > > > > > >> technically speaking it could change. That said, I agree
> that
> > it
> > > > > would
> > > > > > > be
> > > > > > > >> good to not break the existing sources.
> > > > > > > >>
> > > > > > > >> My main concern of having a supportsPausingSplits() knob is
> > that
> > > > > this
> > > > > > > >> might be something requiring code change on future
> > derepcation.
> > > I
> > > > am
> > > > > > > >> thinking of two potential solutions:
> > > > > > > >>
> > > > > > > >> Solution 1:
> > > > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is
> > invoked,
> > > if
> > > > > the
> > > > > > > >> source reader only has one split assigned, the source reader
> > > > simply
> > > > > > > stops
> > > > > > > >> polling but just returns NOTHING_AVAILABLE. If there are
> more
> > > than
> > > > > one
> > > > > > > >> splits assigned, it throws an exception with a message such
> as
> > > > "The
> > > > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > > > alignment
> > > > > > > when
> > > > > > > >> assigned a single split. There are more than one split
> > assigned
> > > to
> > > > > the
> > > > > > > >> SplitReader".
> > > > > > > >> The reasoning behind this solution is that existing users
> > should
> > > > > only
> > > > > > > use
> > > > > > > >> the coarse watermark alignment when a source reader only
> reads
> > > > from
> > > > > a
> > > > > > > >> single split. Reading from more than one split might have
> > > unwanted
> > > > > > side
> > > > > > > >> effects, so we might as well throw an exception in this
> case.
> > > > > > > >>
> > > > > > > >> Solution 2:
> > > > > > > >> Having a configuration
> > > > "enable.coarse-grained.watermark.alignment",
> > > > > > the
> > > > > > > >> default value is false. Once it is set to true, we will
> allow
> > > > > > > >> coarse-grained watermark alignment if a SplitReader is
> > pausable.
> > > > > > > >> This solution allows users to keep the current FLIP-182
> > > behavior,
> > > > > with
> > > > > > > >> the risk of side effects.
> > > > > > > >>
> > > > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > > > >> coarse-grained watermark alignment could be frustrating to
> the
> > > > users
> > > > > > > when
> > > > > > > >> more than one split is assigned. So we might as well not
> > support
> > > > it
> > > > > at
> > > > > > > all.
> > > > > > > >> And also there is nothing to deprecate in the future with
> this
> > > > > > solution.
> > > > > > > >>
> > > > > > > >> What do you think?
> > > > > > > >>
> > > > > > > >> Thank,
> > > > > > > >>
> > > > > > > >> Jiangjie (Becket) Qin
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <
> > > > > pnowojski@apache.org>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> Hi,
> > > > > > > >>>
> > > > > > > >>> It looks like option 1 wins overall? So let's go with that.
> > > > > > > >>>
> > > > > > > >>> Best,
> > > > > > > >>> Piotrek
> > > > > > > >>>
> > > > > > > >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com>
> > > > > > napisał(a):
> > > > > > > >>>
> > > > > > > >>>> Both option 1 (default impl in base interface) and option
> 2
> > > > > > > (decorative
> > > > > > > >>>> interface) are pretty common patterns. I would also be
> fine
> > > with
> > > > > > > either.
> > > > > > > >>>> The important thing is that an exception is thrown if a
> > source
> > > > > > doesn't
> > > > > > > >>>> support the alignment capability.
> > > > > > > >>>>
> > > > > > > >>>> The other point is that we can validate the source
> > capability
> > > if
> > > > > > > >>>> alignment
> > > > > > > >>>> is enabled in WatermarkStrategy. I believe either option
> can
> > > > > achieve
> > > > > > > >>>> this
> > > > > > > >>>> goal too.
> > > > > > > >>>> public interface WatermarkStrategy<T> {
> > > > > > > >>>>     WatermarkStrategy<T> withWatermarkAlignment(String
> > > > > > watermarkGroup,
> > > > > > > >>>> Duration maxAllowedWatermarkDrift);
> > > > > > > >>>> }
> > > > > > > >>>>
> > > > > > > >>>> If I have to pick one, I am slightly favoring option 1
> (base
> > > > > > > >>>> interface). As
> > > > > > > >>>> watermark is already an essential concept of source, maybe
> > > > > watermark
> > > > > > > >>>> alignment capability can also be a property of the base
> > > > > > source/reader
> > > > > > > >>>> interface.
> > > > > > > >>>>
> > > > > > > >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <
> > thw@apache.org>
> > > > > > wrote:
> > > > > > > >>>>
> > > > > > > >>>> > Hi everyone,
> > > > > > > >>>> >
> > > > > > > >>>> > Thank you for all the effort that went into this
> > discussion.
> > > > The
> > > > > > > split
> > > > > > > >>>> > level watermark alignment will be an important feature
> for
> > > > Flink
> > > > > > > that
> > > > > > > >>>> > will address operational problems for various use cases.
> > > From
> > > > > > > reading
> > > > > > > >>>> > through this thread it appears that not too much remains
> > to
> > > > > bring
> > > > > > > this
> > > > > > > >>>> > FLIP to acceptance and allow development to move
> forward.
> > I
> > > > > would
> > > > > > > like
> > > > > > > >>>> > to contribute if possible.
> > > > > > > >>>> >
> > > > > > > >>>> > Regarding option 1 vs. option 2: I don't have a strong
> > > > > preference,
> > > > > > > >>>> > perhaps slightly leaning towards option 1.
> > > > > > > >>>> >
> > > > > > > >>>> > However, from a user perspective, should the split level
> > > > > alignment
> > > > > > > be
> > > > > > > >>>> > an opt-in feature, at least for a few releases? If yes,
> > then
> > > > we
> > > > > > > would
> > > > > > > >>>> > require a knob similar to supportsPausingSplits(),
> which I
> > > > > > > understand
> > > > > > > >>>> > won't be part of the revised FLIP. Such control may be
> > > > > beneficial:
> > > > > > > >>>> >
> > > > > > > >>>> > * Compare runtime behavior with split level alignment
> > on/off
> > > > > > > >>>> > * Allow use of sources that don't implement pausing
> splits
> > > yet
> > > > > > > >>>> >
> > > > > > > >>>> > The second point would, from the user's perspective, be
> > > > > necessary
> > > > > > > for
> > > > > > > >>>> > backward compatibility? While the interface aspect and
> > > source
> > > > > > > >>>> > compatibility has been discussed in great detail, I
> don't
> > > > think
> > > > > it
> > > > > > > >>>> > would be desirable if an application that already uses
> > > > alignment
> > > > > > > fails
> > > > > > > >>>> > after upgrading to the new Flink version, forcing users
> to
> > > > lock
> > > > > > step
> > > > > > > >>>> > modify sources for the new non-optional split level
> > > alignment.
> > > > > > > >>>> >
> > > > > > > >>>> > So I think clarification of the compatibility aspect on
> > the
> > > > FLIP
> > > > > > > page
> > > > > > > >>>> > would be necessary.
> > > > > > > >>>> >
> > > > > > > >>>> > Thanks,
> > > > > > > >>>> > Thomas
> > > > > > > >>>> >
> > > > > > > >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> > > > > > > >>>> piotr.nowojski@gmail.com>
> > > > > > > >>>> > wrote:
> > > > > > > >>>> > >
> > > > > > > >>>> > > Hi Becket,
> > > > > > > >>>> > >
> > > > > > > >>>> > > Thanks for summing this up. Just one correction:
> > > > > > > >>>> > >
> > > > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > > > >>>> > > >   e) It is OK that the code itself in option 2
> > indicates
> > > > the
> > > > > > > >>>> developers
> > > > > > > >>>> > > that a feature is optional. We will rely on the
> > > > documentation
> > > > > to
> > > > > > > >>>> correct
> > > > > > > >>>> > > that and clarify that the feature is actually
> > obligatory.
> > > > > > > >>>> > >
> > > > > > > >>>> > > I would say based on a) and b) that feature would be
> > still
> > > > > > > >>>> optional. So
> > > > > > > >>>> > > both the implementation and the documentation would be
> > > > saying
> > > > > > > that.
> > > > > > > >>>> We
> > > > > > > >>>> > > could add a mention to the docs and release notes,
> that
> > > this
> > > > > > > >>>> feature will
> > > > > > > >>>> > > be obligatory in the next major release and plan such
> a
> > > > > release
> > > > > > > >>>> > accordingly.
> > > > > > > >>>> > >
> > > > > > > >>>> > > Re the option 1., as you mentioned:
> > > > > > > >>>> > > > As for option 1: For developers, the feature is
> still
> > > > > optional
> > > > > > > >>>> due to
> > > > > > > >>>> > the
> > > > > > > >>>> > > default implementation in the interface, regardless of
> > > what
> > > > > the
> > > > > > > >>>> default
> > > > > > > >>>> > > implementation does, because the code compiles without
> > > > > > overriding
> > > > > > > >>>> these
> > > > > > > >>>> > > methods
> > > > > > > >>>> > >
> > > > > > > >>>> > > Also importantly, the code will work in most cases.
> > > > > > > >>>> > >
> > > > > > > >>>> > > > Obligatory: Jobs may fail if these methods are not
> > > > > implemented
> > > > > > > >>>> > properly.
> > > > > > > >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a
> > common
> > > > > > pattern
> > > > > > > in
> > > > > > > >>>> > Java,
> > > > > > > >>>> > > e.g. Iterator.remove() by default throws
> > > > > > > >>>> "UnsupportedOperationException",
> > > > > > > >>>> > > informing the implementation that things may go wrong
> if
> > > > this
> > > > > > > >>>> method is
> > > > > > > >>>> > not
> > > > > > > >>>> > > implemented.
> > > > > > > >>>> > >
> > > > > > > >>>> > > For me `Iterator#remove()` is an optional feature.
> > > > > Personally, I
> > > > > > > >>>> don't
> > > > > > > >>>> > > remember if I have ever implemented it.
> > > > > > > >>>> > >
> > > > > > > >>>> > > Best,
> > > > > > > >>>> > > Piotrek
> > > > > > > >>>> > >
> > > > > > > >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > > > > >>>> napisał(a):
> > > > > > > >>>> > >
> > > > > > > >>>> > > > I had an offline discussion with Piotr and here is
> the
> > > > > > summary.
> > > > > > > >>>> Please
> > > > > > > >>>> > > > correct me if I miss something, Piotr.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > There are two things we would like to seek more
> > opinions
> > > > > from
> > > > > > > the
> > > > > > > >>>> > > > community, so we can make progress on this FLIP.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > 1. The General pattern to add obligatory features to
> > > > > existing
> > > > > > > >>>> > interfaces.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ***********************************************************************************
> > > > > > > >>>> > > > For interfaces exposed to the developers for
> > > > implementation,
> > > > > > > they
> > > > > > > >>>> are
> > > > > > > >>>> > > > either intended to be *optional* or *obligatory.
> > *While
> > > it
> > > > > is
> > > > > > > >>>> quite
> > > > > > > >>>> > clear
> > > > > > > >>>> > > > about how to convey that intention when creating the
> > > > > > interfaces,
> > > > > > > >>>> it is
> > > > > > > >>>> > not
> > > > > > > >>>> > > > as commonly agreed when we are adding new features
> to
> > an
> > > > > > > existing
> > > > > > > >>>> > > > interface. In general, Flink uses decorative
> > interfaces
> > > > when
> > > > > > > >>>> adding
> > > > > > > >>>> > > > optional features to existing interfaces. Both Piotr
> > > and I
> > > > > > agree
> > > > > > > >>>> that
> > > > > > > >>>> > looks
> > > > > > > >>>> > > > good.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > Different opinions are mainly about how to add
> > > obligatory
> > > > > > > >>>> features to
> > > > > > > >>>> > the
> > > > > > > >>>> > > > existing interfaces, probably due to different
> > > > > understandings
> > > > > > of
> > > > > > > >>>> > > > "obligatory".
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > We have discussed about four options:
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > *Option 1:*
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >    - Just add a new method to the existing
> interface.
> > > > > > > >>>> > > >    - For backwards compatibility, the method would
> > have
> > > a
> > > > > > > default
> > > > > > > >>>> > > >    implementation throwing
> > > > "UnsupportedOperationException".
> > > > > > > >>>> > > >    - In the next major version, remove the default
> > > > > > > implementation.
> > > > > > > >>>> > > >    - For the developers, any method with a default
> > > > > > > implementation
> > > > > > > >>>> > > >    throwing an "UnsupportedOperationException"
> should
> > be
> > > > > taken
> > > > > > > as
> > > > > > > >>>> > obligatory.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > *Option 2:*
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >    - Always make the features optional by adding a
> > > > > decorative
> > > > > > > >>>> > interface,
> > > > > > > >>>> > > >    just like ordinary optional features.
> > > > > > > >>>> > > >    - Inform the developers via documentation that
> this
> > > > > feature
> > > > > > > is
> > > > > > > >>>> > > >    obligatory, although it looks like optional from
> > the
> > > > > code.
> > > > > > > >>>> > > >    - In case the developers did not implement the
> > > > decorative
> > > > > > > >>>> interface,
> > > > > > > >>>> > > >    throw an exception
> > > > > > > >>>> > > >    - In the next major version, move the methods in
> > the
> > > > > > > decorative
> > > > > > > >>>> > > >    interface to the base interface, and deprecate
> the
> > > > > > decorative
> > > > > > > >>>> > interface.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > *Option 3:*
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >    - Always bump the major version when a new
> > obligatory
> > > > > > feature
> > > > > > > >>>> is
> > > > > > > >>>> > > >    added, even if we may have to do it frequently.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > *Option 4:*
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >    - Add a V2, V3... of the interface affected by
> the
> > > new
> > > > > > > >>>> obligatory
> > > > > > > >>>> > > >    feature.
> > > > > > > >>>> > > >    - In the next major versions, deprecate old
> > versions
> > > of
> > > > > the
> > > > > > > >>>> > interfaces.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > Both Piotr and me agreed that option 3 and option 4
> > > have a
> > > > > big
> > > > > > > >>>> side
> > > > > > > >>>> > effect
> > > > > > > >>>> > > > and should be avoided. We have different preference
> > > > between
> > > > > > > >>>> option 1
> > > > > > > >>>> > and
> > > > > > > >>>> > > > option 2.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > Personally I prefer option 1, the reasons are:
> > > > > > > >>>> > > >   a) simple and intuitive. Java 8 introduced the
> > default
> > > > > impl
> > > > > > in
> > > > > > > >>>> > > > interfaces exactly for interface evolving, and this
> > is a
> > > > > > common
> > > > > > > >>>> > pattern in
> > > > > > > >>>> > > > many projects.
> > > > > > > >>>> > > >   b) prominent to the developers that the feature is
> > > > > expected
> > > > > > to
> > > > > > > >>>> be
> > > > > > > >>>> > > > implemented, because it explicitly throws an
> exception
> > > in
> > > > > the
> > > > > > > >>>> default
> > > > > > > >>>> > impl.
> > > > > > > >>>> > > >   c) low maintenance overhead - the Flink framework
> > can
> > > > > always
> > > > > > > >>>> assume
> > > > > > > >>>> > the
> > > > > > > >>>> > > > method exists, so no special handling logic is
> needed.
> > > > > > > >>>> > > >   d) communicate a clear semantic boundary between
> > > > optional
> > > > > > and
> > > > > > > >>>> > obligatory
> > > > > > > >>>> > > > features in the Flink to the developers.
> > > > > > > >>>> > > >       - Optional: Jobs still run without exception
> if
> > > > these
> > > > > > > >>>> methods are
> > > > > > > >>>> > > > not implemented. e.g. all the SupportsXXXPushDown
> > > > > interfaces.
> > > > > > > >>>> > > >       - Obligatory: Jobs may fail if these methods
> are
> > > not
> > > > > > > >>>> implemented
> > > > > > > >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits().
> > This
> > > > is a
> > > > > > > >>>> common
> > > > > > > >>>> > pattern
> > > > > > > >>>> > > > in Java, e.g. Iterator.remove() by default throws
> > > > > > > >>>> > > > "UnsupportedOperationException", informing the
> > > > > implementation
> > > > > > > that
> > > > > > > >>>> > things
> > > > > > > >>>> > > > may go wrong if this method is not implemented.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > As for option 2, Although the API itself sounds
> clean,
> > > it
> > > > > > > misleads
> > > > > > > >>>> > people
> > > > > > > >>>> > > > to think of an obligatory feature to be optional -
> > from
> > > > the
> > > > > > code
> > > > > > > >>>> the
> > > > > > > >>>> > > > feature is optional, but the documents say it is
> > > > obligatory.
> > > > > > We
> > > > > > > >>>> > probably
> > > > > > > >>>> > > > should avoid such code-doc inconsistency, as people
> > will
> > > > be
> > > > > > > >>>> confused.
> > > > > > > >>>> > And I
> > > > > > > >>>> > > > would actually be bewildered that sometimes not
> > > > implementing
> > > > > > an
> > > > > > > >>>> > "optional"
> > > > > > > >>>> > > > feature is fine, but sometimes it causes the jobs to
> > > fail.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > In response to the argument that the method with a
> > > default
> > > > > > > >>>> > implementation
> > > > > > > >>>> > > > is always optional, if that is true, it actually
> means
> > > all
> > > > > the
> > > > > > > >>>> > interfaces
> > > > > > > >>>> > > > should be immutable once they are created. If we
> want
> > to
> > > > > add a
> > > > > > > >>>> method
> > > > > > > >>>> > to an
> > > > > > > >>>> > > > existing interface, for backwards compatibility, we
> > will
> > > > > have
> > > > > > to
> > > > > > > >>>> > provide a
> > > > > > > >>>> > > > default implementation. And the fact it has a
> default
> > > > > > > >>>> implementation
> > > > > > > >>>> > > > indicates the method is optional. If that method is
> > > > > optional,
> > > > > > it
> > > > > > > >>>> should
> > > > > > > >>>> > > > reside in a separate decorative interface, otherwise
> > it
> > > > > clogs
> > > > > > > that
> > > > > > > >>>> > existing
> > > > > > > >>>> > > > interface. Therefore, people should never add a
> method
> > > to
> > > > an
> > > > > > > >>>> existing
> > > > > > > >>>> > > > interface. I find this conclusion a bit extreme.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > > > >>>> > > >     a) Obligatory methods are the methods that fail
> > the
> > > > code
> > > > > > > >>>> > compilation
> > > > > > > >>>> > > > if not implemented.
> > > > > > > >>>> > > >     b) All obligatory methods should reside in the
> > base
> > > > > > > interface,
> > > > > > > >>>> > without
> > > > > > > >>>> > > > a default implementation. And all the optional
> methods
> > > > > should
> > > > > > be
> > > > > > > >>>> in
> > > > > > > >>>> > > > decorative interfaces. This is a clean API.
> > > > > > > >>>> > > >     c) due to b), there isn't a viable solution to
> add
> > > an
> > > > > > > >>>> obligatory
> > > > > > > >>>> > > > method to an existing interface in a backwards
> > > compatible
> > > > > way.
> > > > > > > >>>> Unless
> > > > > > > >>>> > we
> > > > > > > >>>> > > > are OK with breaking backwards compatibility, all
> the
> > > > > > interfaces
> > > > > > > >>>> > should be
> > > > > > > >>>> > > > treated as immutable. As a compromise, we might as
> > well
> > > > just
> > > > > > > >>>> treat all
> > > > > > > >>>> > the
> > > > > > > >>>> > > > features added later as optional features. This way
> we
> > > > keep
> > > > > > the
> > > > > > > >>>> API
> > > > > > > >>>> > clean.
> > > > > > > >>>> > > >     d) based on b) and c), option 2 has a clean API,
> > > while
> > > > > > > option
> > > > > > > >>>> 1
> > > > > > > >>>> > does
> > > > > > > >>>> > > > not.
> > > > > > > >>>> > > >     e) It is OK that the code itself in option 2
> > > indicates
> > > > > the
> > > > > > > >>>> > developers
> > > > > > > >>>> > > > that a feature is optional. We will rely on the
> > > > > documentation
> > > > > > to
> > > > > > > >>>> > correct
> > > > > > > >>>> > > > that and clarify that the feature is actually
> > > obligatory.
> > > > > > > >>>> > > >     f) Regarding the effectiveness of making people
> > > aware
> > > > > that
> > > > > > > the
> > > > > > > >>>> > feature
> > > > > > > >>>> > > > is obligatory, Option 1 and Option 2 are similar.
> For
> > > > people
> > > > > > > that
> > > > > > > >>>> do
> > > > > > > >>>> > not
> > > > > > > >>>> > > > read the release note / documentation, they will
> > mistake
> > > > the
> > > > > > > >>>> feature
> > > > > > > >>>> > to be
> > > > > > > >>>> > > > optional anyways.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > As for option 1: For developers, the feature is
> still
> > > > > optional
> > > > > > > >>>> due to
> > > > > > > >>>> > the
> > > > > > > >>>> > > > default implementation in the interface, regardless
> of
> > > > what
> > > > > > the
> > > > > > > >>>> default
> > > > > > > >>>> > > > implementation does, because the code compiles
> without
> > > > > > > overriding
> > > > > > > >>>> these
> > > > > > > >>>> > > > methods. Also, another problem of this option is
> that
> > > for
> > > > > > users
> > > > > > > >>>> that
> > > > > > > >>>> > do not
> > > > > > > >>>> > > > know about the history of the interface, they may be
> > > > > confused
> > > > > > by
> > > > > > > >>>> the
> > > > > > > >>>> > > > default implementation throwing an exception.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > 2. For this particular FLIP, should it be optional
> or
> > > not?
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ***********************************************************************************
> > > > > > > >>>> > > > As mentioned in the previous email, I feel this FLIP
> > > > should
> > > > > be
> > > > > > > >>>> > obligatory,
> > > > > > > >>>> > > > for the following reasons:
> > > > > > > >>>> > > > 1. The Flink framework exposes the watermark
> alignment
> > > API
> > > > > to
> > > > > > > the
> > > > > > > >>>> end
> > > > > > > >>>> > > > users. From the end users' perspective, the feature
> > > should
> > > > > be
> > > > > > > >>>> available
> > > > > > > >>>> > > > regardless of the implementation details in the
> > > > pluggables.
> > > > > > This
> > > > > > > >>>> is
> > > > > > > >>>> > true
> > > > > > > >>>> > > > for any other methods exposed as the Flink API.
> > > > > > > >>>> > > > 2. If a Source is not pausable, the end user should
> > > > receive
> > > > > an
> > > > > > > >>>> > exception
> > > > > > > >>>> > > > when enable the watermark alignment, (both Piotr and
> > me
> > > > > agree
> > > > > > on
> > > > > > > >>>> > this). In
> > > > > > > >>>> > > > that case, it meets my criteria of obligatory
> feature
> > > > > because
> > > > > > > not
> > > > > > > >>>> > > > implementing the feature causes a framework API to
> > throw
> > > > > > > >>>> exception and
> > > > > > > >>>> > > > fails the job.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > On the other hand, Piotr does not have a strong
> > opinion
> > > > > > > regarding
> > > > > > > >>>> > whether
> > > > > > > >>>> > > > this feature should be optional or not.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > Thanks for reading through this long email. So
> > basically
> > > > in
> > > > > > > order
> > > > > > > >>>> to
> > > > > > > >>>> > make
> > > > > > > >>>> > > > progress on this FLIP, we want to see what do people
> > > feel
> > > > > > about
> > > > > > > >>>> the
> > > > > > > >>>> > above
> > > > > > > >>>> > > > two topics.
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > Thanks,
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > Jiangjie (Becket) Qin
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >
> > > > > > > >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> > > > > > > >>>> pnowojski@apache.org>
> > > > > > > >>>> > > > wrote:
> > > > > > > >>>> > > >
> > > > > > > >>>> > > >> Hi Becket,
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> I still sustain what I wrote before:
> > > > > > > >>>> > > >> > I think I would still vote soft -1 on this
> option,
> > > but
> > > > I
> > > > > > > >>>> wouldn't
> > > > > > > >>>> > block
> > > > > > > >>>> > > >> it in case I am out-voted.
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> > I think it might be helpful to agree on the
> > > definition
> > > > of
> > > > > > > >>>> optional
> > > > > > > >>>> > in
> > > > > > > >>>> > > >> our
> > > > > > > >>>> > > >> case.
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> For me it doesn't matter whether a default method
> > > > throwing
> > > > > an
> > > > > > > >>>> > exception we
> > > > > > > >>>> > > >> call optional or non-optional. As long as we keep
> it
> > > this
> > > > > > way,
> > > > > > > >>>> the
> > > > > > > >>>> > effect
> > > > > > > >>>> > > >> is the same. It's effectively a method that a user
> > > > doesn't
> > > > > > have
> > > > > > > >>>> to
> > > > > > > >>>> > > >> implement. If interface/system allows some methods
> to
> > > be
> > > > > not
> > > > > > > >>>> > implemented,
> > > > > > > >>>> > > >> some users will do just that, regardless if we call
> > it
> > > > and
> > > > > > > >>>> document as
> > > > > > > >>>> > > >> non-optional. And at the same time it's clogging
> the
> > > base
> > > > > > > >>>> interface.
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> By the way, just the need for a
> > java-doc/documentation
> > > > > > > >>>> explaining the
> > > > > > > >>>> > > >> existence of some construct is a bad smell (code
> > should
> > > > be
> > > > > > > >>>> > > >> self-documenting
> > > > > > > >>>> > > >> and default method throwing an
> > > > > UnsupportedOperationException
> > > > > > is
> > > > > > > >>>> not).
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> > Please note that so far we do not assume whether
> > the
> > > > > > feature
> > > > > > > >>>> is in
> > > > > > > >>>> > > >> > the original API or it is added later. A newly
> > added
> > > > > > feature
> > > > > > > >>>> can
> > > > > > > >>>> > also be
> > > > > > > >>>> > > >> > non-optional, although it might take some time
> for
> > > all
> > > > > the
> > > > > > > >>>> pluggable
> > > > > > > >>>> > > >> > developers to catch up, and they should still
> work
> > if
> > > > the
> > > > > > new
> > > > > > > >>>> > feature is
> > > > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> > > never
> > > > > > > expect
> > > > > > > >>>> an
> > > > > > > >>>> > > >> optional
> > > > > > > >>>> > > >> > feature to catch up, because leaving it
> > unimplemented
> > > > is
> > > > > > also
> > > > > > > >>>> > blessed.
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > Let's take the checkpointing as an example.
> Imagine
> > > > Flink
> > > > > > did
> > > > > > > >>>> not
> > > > > > > >>>> > > >> support
> > > > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > > > trying
> > > > > to
> > > > > > > add
> > > > > > > >>>> > > >> > checkpointing to Flink. So we exposed the
> > checkpoint
> > > > > > > >>>> configuration
> > > > > > > >>>> > to
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > end users. In the meantime, will we tell the
> > > pluggable
> > > > > > (e.g.
> > > > > > > >>>> > operators,
> > > > > > > >>>> > > >> > connectors) developers that methods like
> > > > > "snapshotState()"
> > > > > > is
> > > > > > > >>>> > optional?
> > > > > > > >>>> > > >> If
> > > > > > > >>>> > > >> > we do that, the availability of checkpointing in
> > > Flink
> > > > > > would
> > > > > > > be
> > > > > > > >>>> > severely
> > > > > > > >>>> > > >> > weakened. But apparently we should still allow
> the
> > > > > existing
> > > > > > > >>>> > > >> implementations
> > > > > > > >>>> > > >> > to work without checkpointing. It looks to me
> that
> > > > adding
> > > > > > the
> > > > > > > >>>> > method to
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > pluggable interfaces with a default
> implementation
> > > > > throwing
> > > > > > > >>>> > > >> > "UnsupportedOperationException" would be the
> > solution
> > > > > here.
> > > > > > > >>>> Please
> > > > > > > >>>> > note
> > > > > > > >>>> > > >> > that in this case, having the default
> > implementation
> > > > does
> > > > > > not
> > > > > > > >>>> mean
> > > > > > > >>>> > this
> > > > > > > >>>> > > >> is
> > > > > > > >>>> > > >> > optional. It is just the technique to support
> > > backwards
> > > > > > > >>>> > compatibility in
> > > > > > > >>>> > > >> > the feature evolution. The fact that this method
> is
> > > in
> > > > > the
> > > > > > > base
> > > > > > > >>>> > > >> interface
> > > > > > > >>>> > > >> > suggests it is not optional, so the developers
> > SHOULD
> > > > > > > >>>> implement it.
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> I would soft vote -1 for having the default method
> > > > throwing
> > > > > > > >>>> > > >> UnsupportedOperationException as one of thing for
> > this
> > > > > > > (FLIP-217)
> > > > > > > >>>> > special
> > > > > > > >>>> > > >> circumstances.
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> At the moment, without thinking this over too
> much, I
> > > > would
> > > > > > > vote
> > > > > > > >>>> > harder -1
> > > > > > > >>>> > > >> for having this as a general rule when adding new
> > > > features.
> > > > > > If
> > > > > > > we
> > > > > > > >>>> > ever end
> > > > > > > >>>> > > >> up with an API that is littered with default
> methods
> > > > > throwing
> > > > > > > >>>> > > >> UnsupportedOperationException that are documented
> as
> > > "non
> > > > > > > >>>> optional" it
> > > > > > > >>>> > > >> would be IMO a big design failure. I would be
> ok-ish
> > > with
> > > > > > that,
> > > > > > > >>>> only
> > > > > > > >>>> > if
> > > > > > > >>>> > > >> that was a temporary thing and we had an aggressive
> > > plan
> > > > to
> > > > > > > >>>> release
> > > > > > > >>>> > more
> > > > > > > >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...)
> > > > breaking
> > > > > > API
> > > > > > > >>>> > > >> compatibility, that would get rid of those default
> > > > methods.
> > > > > > > >>>> Adding
> > > > > > > >>>> > > >> checkpointing and methods like "snapshotState()"
> > would
> > > > IMO
> > > > > > > easily
> > > > > > > >>>> > justify
> > > > > > > >>>> > > >> a
> > > > > > > >>>> > > >> new major Flink release. In that case we could add
> > > those
> > > > > > > methods
> > > > > > > >>>> with
> > > > > > > >>>> > > >> default implementation for some transition period,
> a
> > > one
> > > > or
> > > > > > two
> > > > > > > >>>> minor
> > > > > > > >>>> > > >> releases, followed by a clean up in a major
> release.
> > > > > However
> > > > > > I
> > > > > > > >>>> would
> > > > > > > >>>> > still
> > > > > > > >>>> > > >> argue that it would be cleaner/better to add a
> > > decorative
> > > > > > > >>>> interface
> > > > > > > >>>> > like
> > > > > > > >>>> > > >> `CheckpointedOperator` interface instead of adding
> > > those
> > > > > > > default
> > > > > > > >>>> > methods
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> the base `Operator` interface.
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> I think I can sum up our disagreement here is that
> I
> > > > would
> > > > > > like
> > > > > > > >>>> to
> > > > > > > >>>> > keep
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> interfaces simpler, with only obligatory
> > > methods/features
> > > > > on
> > > > > > > one
> > > > > > > >>>> side
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> clearly optional features on the other. While you
> > would
> > > > > like
> > > > > > to
> > > > > > > >>>> add an
> > > > > > > >>>> > > >> extra third state in between those two?
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> Best,
> > > > > > > >>>> > > >> Piotrek
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <
> > > > becket.qin@gmail.com
> > > > > >
> > > > > > > >>>> > napisał(a):
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >> > Thanks for the clarification, Piotr and
> Sebastian.
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > It looks like the key problem is still whether
> the
> > > > > > > >>>> implementation of
> > > > > > > >>>> > > >> > pausable splits in the Sources should be optional
> > or
> > > > not.
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > I think it might be helpful to agree on the
> > > definition
> > > > of
> > > > > > > >>>> optional
> > > > > > > >>>> > in
> > > > > > > >>>> > > >> our
> > > > > > > >>>> > > >> > case. To me:
> > > > > > > >>>> > > >> > Optional = "You CAN leave the method
> unimplemented,
> > > and
> > > > > > that
> > > > > > > is
> > > > > > > >>>> > fine."
> > > > > > > >>>> > > >> > Non-Optional = "You CAN leave the method
> > > unimplemented,
> > > > > but
> > > > > > > you
> > > > > > > >>>> > SHOULD
> > > > > > > >>>> > > >> NOT,
> > > > > > > >>>> > > >> > because people assume this works."
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > I think one sufficient condition of a
> Non-Optional
> > > > > feature
> > > > > > is
> > > > > > > >>>> that
> > > > > > > >>>> > if
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > feature is exposed through the framework API,
> Flink
> > > > > should
> > > > > > > >>>> expect
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > pluggables to support this feature by default.
> > > > Otherwise
> > > > > > the
> > > > > > > >>>> > > >> availability
> > > > > > > >>>> > > >> > of that feature becomes undefined.
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > Please note that so far we do not assume whether
> > the
> > > > > > feature
> > > > > > > >>>> is in
> > > > > > > >>>> > > >> > the original API or it is added later. A newly
> > added
> > > > > > feature
> > > > > > > >>>> can
> > > > > > > >>>> > also be
> > > > > > > >>>> > > >> > non-optional, although it might take some time
> for
> > > all
> > > > > the
> > > > > > > >>>> pluggable
> > > > > > > >>>> > > >> > developers to catch up, and they should still
> work
> > if
> > > > the
> > > > > > new
> > > > > > > >>>> > feature is
> > > > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> > > never
> > > > > > > expect
> > > > > > > >>>> an
> > > > > > > >>>> > > >> optional
> > > > > > > >>>> > > >> > feature to catch up, because leaving it
> > unimplemented
> > > > is
> > > > > > also
> > > > > > > >>>> > blessed.
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > Let's take the checkpointing as an example.
> Imagine
> > > > Flink
> > > > > > did
> > > > > > > >>>> not
> > > > > > > >>>> > > >> support
> > > > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > > > trying
> > > > > to
> > > > > > > add
> > > > > > > >>>> > > >> > checkpointing to Flink. So we exposed the
> > checkpoint
> > > > > > > >>>> configuration
> > > > > > > >>>> > to
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > end users. In the meantime, will we tell the
> > > pluggable
> > > > > > (e.g.
> > > > > > > >>>> > operators,
> > > > > > > >>>> > > >> > connectors) developers that methods like
> > > > > "snapshotState()"
> > > > > > is
> > > > > > > >>>> > optional?
> > > > > > > >>>> > > >> If
> > > > > > > >>>> > > >> > we do that, the availability of checkpointing in
> > > Flink
> > > > > > would
> > > > > > > be
> > > > > > > >>>> > severely
> > > > > > > >>>> > > >> > weakened. But apparently we should still allow
> the
> > > > > existing
> > > > > > > >>>> > > >> implementations
> > > > > > > >>>> > > >> > to work without checkpointing. It looks to me
> that
> > > > adding
> > > > > > the
> > > > > > > >>>> > method to
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > pluggable interfaces with a default
> implementation
> > > > > throwing
> > > > > > > >>>> > > >> > "UnsupportedOperationException" would be the
> > solution
> > > > > here.
> > > > > > > >>>> Please
> > > > > > > >>>> > note
> > > > > > > >>>> > > >> > that in this case, having the default
> > implementation
> > > > does
> > > > > > not
> > > > > > > >>>> mean
> > > > > > > >>>> > this
> > > > > > > >>>> > > >> is
> > > > > > > >>>> > > >> > optional. It is just the technique to support
> > > backwards
> > > > > > > >>>> > compatibility in
> > > > > > > >>>> > > >> > the feature evolution. The fact that this method
> is
> > > in
> > > > > the
> > > > > > > base
> > > > > > > >>>> > > >> interface
> > > > > > > >>>> > > >> > suggests it is not optional, so the developers
> > SHOULD
> > > > > > > >>>> implement it.
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > When it comes to this FLIP, I think it meets the
> > > > criteria
> > > > > > of
> > > > > > > >>>> > > >> non-optional
> > > > > > > >>>> > > >> > features, so we should just use the evolution
> path
> > of
> > > > > > > >>>> non-optional
> > > > > > > >>>> > > >> > features.
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > Thanks,
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > > > > > > >>>> > pnowojski@apache.org>
> > > > > > > >>>> > > >> > wrote:
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >> > > Hi,
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> > > Actually previously I thought about having a
> > > > decorative
> > > > > > > >>>> interface
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> > > whenever watermark alignment is enabled,
> checking
> > > > that
> > > > > > the
> > > > > > > >>>> source
> > > > > > > >>>> > > >> > > implements the decorative interface. If not,
> > > throwing
> > > > > an
> > > > > > > >>>> > exception.
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> > > The option with default methods in the source
> > > > > interfaces
> > > > > > > >>>> throwing
> > > > > > > >>>> > > >> > > `UnsupportedOperationException` I think still
> > > suffers
> > > > > > from
> > > > > > > >>>> the
> > > > > > > >>>> > same
> > > > > > > >>>> > > >> > > problems I mentioned before. It's still an
> > optional
> > > > > > > >>>> implementation
> > > > > > > >>>> > > >> and at
> > > > > > > >>>> > > >> > > the same time it's clogging the base
> interface. I
> > > > > think I
> > > > > > > >>>> would
> > > > > > > >>>> > still
> > > > > > > >>>> > > >> > vote
> > > > > > > >>>> > > >> > > soft -1 on this option, but I wouldn't block it
> > in
> > > > > case I
> > > > > > > am
> > > > > > > >>>> > > >> out-voted.
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> > > Best,
> > > > > > > >>>> > > >> > > Piotrek
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > > > > > > >>>> > sebastian@ververica.com>
> > > > > > > >>>> > > >> > > napisał(a):
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> > > > Hi Becket,
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > > Thanks a lot for your fast and detailed
> > response.
> > > > For
> > > > > > me,
> > > > > > > >>>> it
> > > > > > > >>>> > > >> converges
> > > > > > > >>>> > > >> > > and
> > > > > > > >>>> > > >> > > > dropping the supportsX method sounds very
> > > > reasonable
> > > > > to
> > > > > > > me.
> > > > > > > >>>> > (Side
> > > > > > > >>>> > > >> note:
> > > > > > > >>>> > > >> > > > With "pausable splits" enabled as "default" I
> > > think
> > > > > we
> > > > > > > >>>> > > >> misunderstood.
> > > > > > > >>>> > > >> > As
> > > > > > > >>>> > > >> > > > you described now "default" I understand as
> > that
> > > it
> > > > > > > should
> > > > > > > >>>> be
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> new
> > > > > > > >>>> > > >> > > > recommended way of implementation, and I
> think
> > > that
> > > > > is
> > > > > > > >>>> fully
> > > > > > > >>>> > valid.
> > > > > > > >>>> > > >> > > Before,
> > > > > > > >>>> > > >> > > > I understood "default" here as the default
> > > > > > > implementation,
> > > > > > > >>>> i.e.,
> > > > > > > >>>> > > >> > throwing
> > > > > > > >>>> > > >> > > > UnsupportedOperationException, which is the
> > exact
> > > > > > > >>>> opposite. :) )
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > > Nevertheless: As mentioned, an open question
> > for
> > > me
> > > > > is
> > > > > > if
> > > > > > > >>>> > watermark
> > > > > > > >>>> > > >> > > > alignment should enforce pausable splits. For
> > > > > > > >>>> clarification, the
> > > > > > > >>>> > > >> > current
> > > > > > > >>>> > > >> > > > documentation [1] says:
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning
> > > across
> > > > > > tasks
> > > > > > > >>>> of the
> > > > > > > >>>> > same
> > > > > > > >>>> > > >> > > >> source and/or different sources. It does not
> > > > support
> > > > > > > >>>> aligning
> > > > > > > >>>> > > >> > > >> splits/partitions/shards in the same task.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> In a case where there are e.g. two Kafka
> > > > partitions
> > > > > > that
> > > > > > > >>>> > produce
> > > > > > > >>>> > > >> > > >> watermarks at different pace, that get
> > assigned
> > > to
> > > > > the
> > > > > > > >>>> same
> > > > > > > >>>> > task
> > > > > > > >>>> > > >> > > watermark
> > > > > > > >>>> > > >> > > >> might not behave as expected. Fortunately,
> > worst
> > > > > case
> > > > > > it
> > > > > > > >>>> > should not
> > > > > > > >>>> > > >> > > perform
> > > > > > > >>>> > > >> > > >> worse than without alignment.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> Given the limitation above, we suggest
> > applying
> > > > > > > watermark
> > > > > > > >>>> > > >> alignment in
> > > > > > > >>>> > > >> > > >> two situations:
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>    1. You have two different sources (e.g.
> > Kafka
> > > > and
> > > > > > > >>>> File) that
> > > > > > > >>>> > > >> > produce
> > > > > > > >>>> > > >> > > >>    watermarks at different speeds
> > > > > > > >>>> > > >> > > >>    2. You run your source with parallelism
> > equal
> > > > to
> > > > > > the
> > > > > > > >>>> number
> > > > > > > >>>> > of
> > > > > > > >>>> > > >> > > >>    splits/shards/partitions, which results
> in
> > > > every
> > > > > > > >>>> subtask
> > > > > > > >>>> > being
> > > > > > > >>>> > > >> > > assigned a
> > > > > > > >>>> > > >> > > >>    single unit of work.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> I personally see no issue in implementing
> and
> > I
> > > > see
> > > > > no
> > > > > > > >>>> reason
> > > > > > > >>>> > > >> against
> > > > > > > >>>> > > >> > > > implementing this dependency of watermark
> > > alignment
> > > > > and
> > > > > > > >>>> pausable
> > > > > > > >>>> > > >> > splits.
> > > > > > > >>>> > > >> > > (I
> > > > > > > >>>> > > >> > > > think this would even be a good path towards
> > > > shaping
> > > > > > > >>>> watermark
> > > > > > > >>>> > > >> > alignment
> > > > > > > >>>> > > >> > > in
> > > > > > > >>>> > > >> > > > 1.16.) However, "I don't see" means that I
> > would
> > > be
> > > > > > happy
> > > > > > > >>>> to
> > > > > > > >>>> > hear
> > > > > > > >>>> > > >> > Dawid's
> > > > > > > >>>> > > >> > > > and Piotrek's opinions as they implemented
> > > > watermark
> > > > > > > >>>> alignment
> > > > > > > >>>> > > >> based on
> > > > > > > >>>> > > >> > > > FLIP-182 [2] and I don't want to miss
> relevant
> > > > > > > >>>> > rationale/background
> > > > > > > >>>> > > >> > info
> > > > > > > >>>> > > >> > > > from their side.
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > > Regards,
> > > > > > > >>>> > > >> > > > Sebastian
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > > [1]
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >>
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > > > > > >>>> > > >> > > > [2]
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >>
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > > > > > > >>>> > becket.qin@gmail.com>
> > > > > > > >>>> > > >> > wrote:
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > >> +dev
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> Hi Sebastian,
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> Thank you for the summary. Please see the
> > > detailed
> > > > > > > replies
> > > > > > > >>>> > inline.
> > > > > > > >>>> > > >> As
> > > > > > > >>>> > > >> > a
> > > > > > > >>>> > > >> > > >> recap of my suggestions.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> 1. Pausable splits API.
> > > > > > > >>>> > > >> > > >>   a) Add default implementations to methods
> > > > > > > >>>> > "pauseOrResumeSplits"
> > > > > > > >>>> > > >> in
> > > > > > > >>>> > > >> > > both
> > > > > > > >>>> > > >> > > >> SourceReader and SplitReader where both
> > default
> > > > > > > >>>> implementations
> > > > > > > >>>> > > >> throw
> > > > > > > >>>> > > >> > > >>  UnsupportedOperationException.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> 2. User story.
> > > > > > > >>>> > > >> > > >>     a) We tell users to enable the watermark
> > > > > alignment
> > > > > > > as
> > > > > > > >>>> they
> > > > > > > >>>> > > >> like.
> > > > > > > >>>> > > >> > > This
> > > > > > > >>>> > > >> > > >> is exactly what the current Flink API is.
> > > > > > > >>>> > > >> > > >>     b) We tell the source developers, please
> > > > > implement
> > > > > > > >>>> pausable
> > > > > > > >>>> > > >> > splits,
> > > > > > > >>>> > > >> > > >> otherwise bad things may happen. Think of it
> > > like
> > > > > you
> > > > > > > are
> > > > > > > >>>> > expected
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >> implement SourceReader#snapshotState()
> > properly,
> > > > > > > otherwise
> > > > > > > >>>> > > >> exceptions
> > > > > > > >>>> > > >> > > will
> > > > > > > >>>> > > >> > > >> be thrown when users enable checkpointing.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> Thanks,
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian
> > > > Mattheis <
> > > > > > > >>>> > > >> > > >> sebastian@ververica.com> wrote:
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>> Hi Becket, Hi everybody,
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> I'm sorry if I misread the messages but I
> > could
> > > > not
> > > > > > > >>>> derive an
> > > > > > > >>>> > > >> > agreement
> > > > > > > >>>> > > >> > > >>> from the mailing list. Nevertheless, if I
> > > > > understand
> > > > > > > you
> > > > > > > >>>> > right the
> > > > > > > >>>> > > >> > > >>> suggestion is:
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> * Add default implementations to methods
> > > > > > > >>>> > "pauseOrResumeSplits" in
> > > > > > > >>>> > > >> > both
> > > > > > > >>>> > > >> > > >>> SourceReader and SplitReader where both
> > default
> > > > > > > >>>> > implementations
> > > > > > > >>>> > > >> throw
> > > > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >> Yes.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the
> > > Source
> > > > > > > >>>> interface.
> > > > > > > >>>> > (In
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>> following, I refer to supporting this as
> > > > "pausable
> > > > > > > >>>> splits".)
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >> We may no longer need this if pausable
> splits
> > > are
> > > > > > > >>>> expected to
> > > > > > > >>>> > be
> > > > > > > >>>> > > >> > > >> implemented by the source developers, i.e.
> > > > > > non-optional.
> > > > > > > >>>> Having
> > > > > > > >>>> > > >> this
> > > > > > > >>>> > > >> > > method
> > > > > > > >>>> > > >> > > >> would then be somewhat misleading as it
> looks
> > > like
> > > > > the
> > > > > > > >>>> sources
> > > > > > > >>>> > > >> that do
> > > > > > > >>>> > > >> > > not
> > > > > > > >>>> > > >> > > >> support pausable splits are also acceptable
> in
> > > the
> > > > > > long
> > > > > > > >>>> term.
> > > > > > > >>>> > So
> > > > > > > >>>> > > >> API
> > > > > > > >>>> > > >> > > wise,
> > > > > > > >>>> > > >> > > >> I'd say maybe we should remove this for this
> > > FLIP,
> > > > > > > >>>> although I
> > > > > > > >>>> > > >> believe
> > > > > > > >>>> > > >> > > this
> > > > > > > >>>> > > >> > > >> supportXXX pattern itself is still
> attractive
> > > for
> > > > > > > optional
> > > > > > > >>>> > > >> features.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> To make the conclusions explicit:
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> 1. The implementation of
> pauseOrResumeSplits
> > in
> > > > > both
> > > > > > > >>>> > interfaces
> > > > > > > >>>> > > >> > > >>> SourceReader and SplitReader are optional
> > where
> > > > the
> > > > > > > >>>> default is
> > > > > > > >>>> > > >> that
> > > > > > > >>>> > > >> > it
> > > > > > > >>>> > > >> > > >>> doesn't support it. (--> This means that
> the
> > > > > > > >>>> implementation is
> > > > > > > >>>> > > >> still
> > > > > > > >>>> > > >> > > >>> optional for the source developer.)
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >> It is optional for backwards compatibility
> > with
> > > > > > existing
> > > > > > > >>>> > sources,
> > > > > > > >>>> > > >> as
> > > > > > > >>>> > > >> > > they
> > > > > > > >>>> > > >> > > >> may still compile without code change. But
> > > > starting
> > > > > > from
> > > > > > > >>>> this
> > > > > > > >>>> > FLIP,
> > > > > > > >>>> > > >> > > Flink
> > > > > > > >>>> > > >> > > >> will always optimistically assume that all
> the
> > > > > sources
> > > > > > > >>>> support
> > > > > > > >>>> > > >> > pausable
> > > > > > > >>>> > > >> > > >> splits. If a source does not support
> pausable
> > > > > splits,
> > > > > > it
> > > > > > > >>>> goes
> > > > > > > >>>> > to an
> > > > > > > >>>> > > >> > > error
> > > > > > > >>>> > > >> > > >> handling path when watermark alignment is
> > > enabled
> > > > on
> > > > > > it.
> > > > > > > >>>> This
> > > > > > > >>>> > is
> > > > > > > >>>> > > >> > > different
> > > > > > > >>>> > > >> > > >> from a usual optional feature, where no
> error
> > is
> > > > > > > expected.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>> 2. If watermark alignment is enabled in the
> > > > > > application
> > > > > > > >>>> code
> > > > > > > >>>> > by
> > > > > > > >>>> > > >> > adding
> > > > > > > >>>> > > >> > > >>> withWatermarkAlignment to the
> > WatermarkStrategy
> > > > > while
> > > > > > > >>>> > > >> SourceReader or
> > > > > > > >>>> > > >> > > >>> SplitReader do not support pausableSplits,
> we
> > > > throw
> > > > > > an
> > > > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >> Yes.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>> 3. With regard to your statement:
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>>> [...] basically means watermark alignment
> is
> > > an
> > > > > > > >>>> non-optional
> > > > > > > >>>> > > >> feature
> > > > > > > >>>> > > >> > > to
> > > > > > > >>>> > > >> > > >>>> the end users.
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> You actually mean that "pausable splits"
> are
> > > > > > > >>>> non-optional for
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> app
> > > > > > > >>>> > > >> > > >>> developer if watermark alignment is
> enabled.
> > > > > However,
> > > > > > > >>>> > watermark
> > > > > > > >>>> > > >> > > alignment
> > > > > > > >>>> > > >> > > >>> is optional and can be enabled/disabled.
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >> Yes, watermark alignment can be
> > enabled/disabled
> > > > in
> > > > > > > >>>> individual
> > > > > > > >>>> > > >> sources
> > > > > > > >>>> > > >> > > in
> > > > > > > >>>> > > >> > > >> Flink jobs, which basically means the code
> > > > > supporting
> > > > > > > >>>> watermark
> > > > > > > >>>> > > >> > > alignment
> > > > > > > >>>> > > >> > > >> has to already be there. That again means
> the
> > > > Source
> > > > > > > >>>> > developers are
> > > > > > > >>>> > > >> > also
> > > > > > > >>>> > > >> > > >> expected to support pausable splits by
> > default.
> > > So
> > > > > > this
> > > > > > > >>>> way we
> > > > > > > >>>> > > >> > > essentially
> > > > > > > >>>> > > >> > > >> tell the end users that you may enable /
> > disable
> > > > > this
> > > > > > > >>>> feature
> > > > > > > >>>> > as
> > > > > > > >>>> > > >> you
> > > > > > > >>>> > > >> > > wish,
> > > > > > > >>>> > > >> > > >> and tell the source developers that you
> SHOULD
> > > > > > implement
> > > > > > > >>>> this
> > > > > > > >>>> > > >> because
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >> end users may turn it on/off at will. And if
> > the
> > > > > > source
> > > > > > > >>>> does
> > > > > > > >>>> > not
> > > > > > > >>>> > > >> > support
> > > > > > > >>>> > > >> > > >> pausable splits, that goes to an error
> > handling
> > > > path
> > > > > > > when
> > > > > > > >>>> > watermark
> > > > > > > >>>> > > >> > > >> alignment is enabled on it. So users know
> they
> > > > have
> > > > > to
> > > > > > > >>>> > explicitly
> > > > > > > >>>> > > >> > > exclude
> > > > > > > >>>> > > >> > > >> this source.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> So far it's totally clear to me and I hope
> > this
> > > > is
> > > > > > what
> > > > > > > >>>> you
> > > > > > > >>>> > mean.
> > > > > > > >>>> > > >> I
> > > > > > > >>>> > > >> > > also
> > > > > > > >>>> > > >> > > >>> agree with both statements:
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> So making that expectation aligned with the
> > > > source
> > > > > > > >>>> developers
> > > > > > > >>>> > > >> seems
> > > > > > > >>>> > > >> > > >>>> reasonable.
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> I think this is a simple and clean solution
> > > from
> > > > > both
> > > > > > > >>>> the end
> > > > > > > >>>> > user
> > > > > > > >>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>> source developers' standpoint.
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> However, a last conclusion derives from 3.
> > and
> > > is
> > > > > an
> > > > > > > open
> > > > > > > >>>> > question
> > > > > > > >>>> > > >> > for
> > > > > > > >>>> > > >> > > >>> me:
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> 4. The feature of "pausable splits" is now
> > > > tightly
> > > > > > > bound
> > > > > > > >>>> to
> > > > > > > >>>> > > >> watermark
> > > > > > > >>>> > > >> > > >>> alignment, i.e., if sources do not support
> > > > > "pausable
> > > > > > > >>>> splits"
> > > > > > > >>>> > one
> > > > > > > >>>> > > >> can
> > > > > > > >>>> > > >> > > not
> > > > > > > >>>> > > >> > > >>> enable watermark alignment for these
> sources.
> > > > This
> > > > > > > >>>> dependency
> > > > > > > >>>> > is
> > > > > > > >>>> > > >> not
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >>> current status of watermark alignment
> > > > > implementation
> > > > > > > >>>> because
> > > > > > > >>>> > it
> > > > > > > >>>> > > >> > is/was
> > > > > > > >>>> > > >> > > >>> implemented without pausable splits. Do we
> > want
> > > > to
> > > > > > > >>>> introduce
> > > > > > > >>>> > this
> > > > > > > >>>> > > >> > > >>> dependency? (This is an open question. I
> > cannot
> > > > > judge
> > > > > > > >>>> that.)
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >> The watermark alignment basically relies on
> > the
> > > > > > pausable
> > > > > > > >>>> > splits,
> > > > > > > >>>> > > >> > right?
> > > > > > > >>>> > > >> > > >> So personally I found it quite reasonable
> that
> > > if
> > > > > the
> > > > > > > >>>> source
> > > > > > > >>>> > does
> > > > > > > >>>> > > >> not
> > > > > > > >>>> > > >> > > >> support pausable splits, end users cannot
> > enable
> > > > > > > watermark
> > > > > > > >>>> > > >> alignment
> > > > > > > >>>> > > >> > on
> > > > > > > >>>> > > >> > > it.
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>
> > > > > > > >>>> > > >> > > >>> If something is wrong, please correct me.
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> Regards,
> > > > > > > >>>> > > >> > > >>> Sebastian
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin
> <
> > > > > > > >>>> > becket.qin@gmail.com>
> > > > > > > >>>> > > >> > > wrote:
> > > > > > > >>>> > > >> > > >>>
> > > > > > > >>>> > > >> > > >>>> Hi Sebastian,
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>> Thanks for the reply and patient
> > discussion. I
> > > > > agree
> > > > > > > >>>> this is
> > > > > > > >>>> > a
> > > > > > > >>>> > > >> > tricky
> > > > > > > >>>> > > >> > > >>>> decision.
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns
> > about
> > > > > Option
> > > > > > > c)
> > > > > > > >>>> > which I
> > > > > > > >>>> > > >> see
> > > > > > > >>>> > > >> > as
> > > > > > > >>>> > > >> > > >>>>> follows:
> > > > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > > > implementation
> > > > > > > >>>> makes the
> > > > > > > >>>> > > >> > > >>>>> implementation optional. And in my
> > opinion, a
> > > > > > default
> > > > > > > >>>> > > >> > implementation
> > > > > > > >>>> > > >> > > is and
> > > > > > > >>>> > > >> > > >>>>> will remain a way of making
> implementation
> > > > > optional
> > > > > > > >>>> because
> > > > > > > >>>> > > >> even in
> > > > > > > >>>> > > >> > > future
> > > > > > > >>>> > > >> > > >>>>> a developer can decide to implement the
> > "old
> > > > > > flavor"
> > > > > > > >>>> without
> > > > > > > >>>> > > >> > support
> > > > > > > >>>> > > >> > > for
> > > > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> > > find
> > > > it
> > > > > > > >>>> suboptimal
> > > > > > > >>>> > > >> that
> > > > > > > >>>> > > >> > > with
> > > > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no
> > way
> > > > to
> > > > > > > check
> > > > > > > >>>> at
> > > > > > > >>>> > > >> runtime
> > > > > > > >>>> > > >> > if
> > > > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually
> > support
> > > > > > pausing.
> > > > > > > >>>> (To
> > > > > > > >>>> > do so,
> > > > > > > >>>> > > >> > one
> > > > > > > >>>> > > >> > > would
> > > > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it
> > again
> > > > more
> > > > > > > >>>> > complicated.)\
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>> Based on the last few messages in the
> > mailing
> > > > > list.
> > > > > > > >>>> Piotr
> > > > > > > >>>> > and I
> > > > > > > >>>> > > >> > > agreed
> > > > > > > >>>> > > >> > > >>>> that the default implementation should
> just
> > > > throw
> > > > > an
> > > > > > > >>>> > > >> > > >>>> UnsupportedOperationException if the
> source
> > is
> > > > > > > >>>> unpausable. So
> > > > > > > >>>> > > >> this
> > > > > > > >>>> > > >> > > >>>> basically tells the Source developers that
> > > this
> > > > > > > feature
> > > > > > > >>>> is
> > > > > > > >>>> > > >> expected
> > > > > > > >>>> > > >> > > to be
> > > > > > > >>>> > > >> > > >>>> supported. Because we cannot prevent end
> > users
> > > > > from
> > > > > > > >>>> putting
> > > > > > > >>>> > an
> > > > > > > >>>> > > >> > > unpausable
> > > > > > > >>>> > > >> > > >>>> source into the watermark alignment group,
> > > that
> > > > > > > >>>> basically
> > > > > > > >>>> > means
> > > > > > > >>>> > > >> > > watermark
> > > > > > > >>>> > > >> > > >>>> alignment is an non-optional feature to
> the
> > > end
> > > > > > users.
> > > > > > > >>>> So
> > > > > > > >>>> > making
> > > > > > > >>>> > > >> > that
> > > > > > > >>>> > > >> > > >>>> expectation aligned with the source
> > developers
> > > > > seems
> > > > > > > >>>> > reasonable.
> > > > > > > >>>> > > >> > And
> > > > > > > >>>> > > >> > > if a
> > > > > > > >>>> > > >> > > >>>> source does not support this feature, the
> > end
> > > > > users
> > > > > > > >>>> should
> > > > > > > >>>> > > >> > explicitly
> > > > > > > >>>> > > >> > > >>>> remove that source from the watermark
> > > alignment
> > > > > > group.
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>> Personally speaking I think this is a
> simple
> > > and
> > > > > > clean
> > > > > > > >>>> > solution
> > > > > > > >>>> > > >> from
> > > > > > > >>>> > > >> > > >>>> both the end user and source developers'
> > > > > standpoint.
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>> Does this address your concerns?
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>> Thanks,
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian
> > > > > Mattheis <
> > > > > > > >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> > > > > > > >>>> > > >> > > >>>>
> > > > > > > >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> > > > > > > >>>> suggestions/options
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> we
> > > > > > > >>>> > > >> > > >>>>> would be okay either way because we find
> > > > neither
> > > > > > > >>>> solution is
> > > > > > > >>>> > > >> > perfect
> > > > > > > >>>> > > >> > > just
> > > > > > > >>>> > > >> > > >>>>> because of the already present
> complexity.
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> Option c) Adding methods to the
> interfaces
> > of
> > > > > > > >>>> SourceReader
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > > > >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to
> > be
> > > > used
> > > > > > by
> > > > > > > >>>> > > >> SourceReader
> > > > > > > >>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> As of the current status (v. 12) of the
> > FLIP
> > > > [1],
> > > > > > it
> > > > > > > is
> > > > > > > >>>> > based on
> > > > > > > >>>> > > >> > > >>>>> Option c) which we find acceptable
> because
> > > the
> > > > > > > >>>> complexity
> > > > > > > >>>> > added
> > > > > > > >>>> > > >> is
> > > > > > > >>>> > > >> > > only a
> > > > > > > >>>> > > >> > > >>>>> single method.
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns
> > about
> > > > > Option
> > > > > > > c)
> > > > > > > >>>> > which I
> > > > > > > >>>> > > >> see
> > > > > > > >>>> > > >> > as
> > > > > > > >>>> > > >> > > >>>>> follows:
> > > > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > > > implementation
> > > > > > > >>>> makes the
> > > > > > > >>>> > > >> > > >>>>> implementation optional. And in my
> > opinion, a
> > > > > > default
> > > > > > > >>>> > > >> > implementation
> > > > > > > >>>> > > >> > > is and
> > > > > > > >>>> > > >> > > >>>>> will remain a way of making
> implementation
> > > > > optional
> > > > > > > >>>> because
> > > > > > > >>>> > > >> even in
> > > > > > > >>>> > > >> > > future
> > > > > > > >>>> > > >> > > >>>>> a developer can decide to implement the
> > "old
> > > > > > flavor"
> > > > > > > >>>> without
> > > > > > > >>>> > > >> > support
> > > > > > > >>>> > > >> > > for
> > > > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> > > find
> > > > it
> > > > > > > >>>> suboptimal
> > > > > > > >>>> > > >> that
> > > > > > > >>>> > > >> > > with
> > > > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no
> > way
> > > > to
> > > > > > > check
> > > > > > > >>>> at
> > > > > > > >>>> > > >> runtime
> > > > > > > >>>> > > >> > if
> > > > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually
> > support
> > > > > > pausing.
> > > > > > > >>>> (To
> > > > > > > >>>> > do so,
> > > > > > > >>>> > > >> > one
> > > > > > > >>>> > > >> > > would
> > > > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it
> > again
> > > > more
> > > > > > > >>>> > complicated.)
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> However, we haven't changed it because
> > Option
> > > > a)
> > > > > is
> > > > > > > >>>> also not
> > > > > > > >>>> > > >> > optimal
> > > > > > > >>>> > > >> > > >>>>> or straight-forward:
> > > > > > > >>>> > > >> > > >>>>> (1) We need to add two distinct yet
> similar
> > > > > > > decorative
> > > > > > > >>>> > > >> interfaces
> > > > > > > >>>> > > >> > > >>>>> since, as mentioned, the signatures of
> the
> > > > > methods
> > > > > > > are
> > > > > > > >>>> > > >> different.
> > > > > > > >>>> > > >> > For
> > > > > > > >>>> > > >> > > >>>>> example, we would need decorative
> > interfaces
> > > > like
> > > > > > > >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > > > > > > >>>> > > >> > `SourceReaderWithPausableSplits`.
> > > > > > > >>>> > > >> > > >>>>> (2) As a consequence, we would need to
> > > somehow
> > > > > > > document
> > > > > > > >>>> > > >> how/where
> > > > > > > >>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>> implement both interfaces and how this
> > > relates
> > > > to
> > > > > > > each
> > > > > > > >>>> > other.
> > > > > > > >>>> > > >> This
> > > > > > > >>>> > > >> > > we could
> > > > > > > >>>> > > >> > > >>>>> solve by adding a note in the interface
> of
> > > > > > > >>>> SourceReader and
> > > > > > > >>>> > > >> > > SplitReader and
> > > > > > > >>>> > > >> > > >>>>> reference to the decorative interfaces
> but
> > it
> > > > > still
> > > > > > > >>>> > increases
> > > > > > > >>>> > > >> > > complexity
> > > > > > > >>>> > > >> > > >>>>> too.
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> In summary, we see both as acceptable and
> > > > > preferred
> > > > > > > >>>> over
> > > > > > > >>>> > other
> > > > > > > >>>> > > >> > > >>>>> options. The question is if we can find a
> > > > > solution
> > > > > > or
> > > > > > > >>>> > compromise
> > > > > > > >>>> > > >> > > that is
> > > > > > > >>>> > > >> > > >>>>> acceptable for everybody to reach
> > consensus.
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> Please let us know what you think because
> > we
> > > > > would
> > > > > > be
> > > > > > > >>>> happy
> > > > > > > >>>> > if
> > > > > > > >>>> > > >> we
> > > > > > > >>>> > > >> > can
> > > > > > > >>>> > > >> > > >>>>> conclude the discussion to avoid dropping
> > the
> > > > > > > >>>> initiative on
> > > > > > > >>>> > this
> > > > > > > >>>> > > >> > > FLIP.
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> Regards,
> > > > > > > >>>> > > >> > > >>>>> Sebastian
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> [1]
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >>
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > > > > > >>>> > > >> > > >>>>> (v. 12)
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr
> > > Nowojski
> > > > <
> > > > > > > >>>> > > >> > pnowojski@apache.org
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > >>>>> wrote:
> > > > > > > >>>> > > >> > > >>>>>
> > > > > > > >>>> > > >> > > >>>>>> Hi Guowei,
> > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages
> back:
> > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > >>>> > > >> > > >>>>>> > This is covered in the previous
> FLIP[1]
> > > > which
> > > > > > has
> > > > > > > >>>> been
> > > > > > > >>>> > > >> already
> > > > > > > >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must
> be
> > > > > enabled
> > > > > > > >>>> with the
> > > > > > > >>>> > > >> > watermark
> > > > > > > >>>> > > >> > > >>>>>> strategy which also configures drift and
> > > > update
> > > > > > > >>>> interval
> > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > >>>> > > >> > > >>>>>> So by default watermark alignment is
> > > disabled,
> > > > > > > >>>> regardless
> > > > > > > >>>> > if a
> > > > > > > >>>> > > >> > > source
> > > > > > > >>>> > > >> > > >>>>>> supports it or not.
> > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > >>>> > > >> > > >>>>>> Best,
> > > > > > > >>>> > > >> > > >>>>>> Piotrek
> > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> > > > > > > >>>> guowei.mgw@gmail.com>
> > > > > > > >>>> > > >> > > napisał(a):
> > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > >>>> > > >> > > >>>>>>> Hi,
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>> We know that in the case of Bounded
> input
> > > > Flink
> > > > > > > >>>> supports
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > Batch
> > > > > > > >>>> > > >> > > >>>>>>> execution mode. Currently in Batch
> > > execution
> > > > > > mode,
> > > > > > > >>>> flink
> > > > > > > >>>> > is
> > > > > > > >>>> > > >> > > executed
> > > > > > > >>>> > > >> > > >>>>>>> on a
> > > > > > > >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way,
> > perhaps
> > > > > > > watermark
> > > > > > > >>>> > alignment
> > > > > > > >>>> > > >> > > might
> > > > > > > >>>> > > >> > > >>>>>>> not
> > > > > > > >>>> > > >> > > >>>>>>> gain much.
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>> So my question is: Is watermark
> alignment
> > > the
> > > > > > > default
> > > > > > > >>>> > > >> > behavior(for
> > > > > > > >>>> > > >> > > >>>>>>> implemented source only)? If so, have
> you
> > > > > > > considered
> > > > > > > >>>> > > >> evaluating
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> impact
> > > > > > > >>>> > > >> > > >>>>>>> of this behavior on the Batch execution
> > > mode?
> > > > > Or
> > > > > > > >>>> thinks
> > > > > > > >>>> > it is
> > > > > > > >>>> > > >> not
> > > > > > > >>>> > > >> > > >>>>>>> necessary.
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>> Correct me if I miss something.
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>> Best,
> > > > > > > >>>> > > >> > > >>>>>>> Guowei
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr
> > > > Nowojski <
> > > > > > > >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > > > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > > I feel that no matter which option
> we
> > > > > choose
> > > > > > > >>>> this can
> > > > > > > >>>> > not
> > > > > > > >>>> > > >> be
> > > > > > > >>>> > > >> > > >>>>>>> solved
> > > > > > > >>>> > > >> > > >>>>>>> > entirely in either of the options,
> > > because
> > > > of
> > > > > > the
> > > > > > > >>>> point
> > > > > > > >>>> > > >> above
> > > > > > > >>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>>>> because
> > > > > > > >>>> > > >> > > >>>>>>> > the signature of
> > > > > > SplitReader#pauseOrResumeSplits
> > > > > > > >>>> and
> > > > > > > >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are
> > > > slightly
> > > > > > > >>>> different
> > > > > > > >>>> > (one
> > > > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > > > >>>> > > >> > > >>>>>>> > splits with splitId the other one
> > passes
> > > > the
> > > > > > > splits
> > > > > > > >>>> > > >> directly).
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case
> > and
> > > > for
> > > > > > > >>>> features
> > > > > > > >>>> > that
> > > > > > > >>>> > > >> > need
> > > > > > > >>>> > > >> > > >>>>>>> to be
> > > > > > > >>>> > > >> > > >>>>>>> > implemented in more than one place.
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing
> > reading
> > > > > from
> > > > > > a
> > > > > > > >>>> split
> > > > > > > >>>> > an
> > > > > > > >>>> > > >> > > optional
> > > > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > > > >>>> > > >> > > >>>>>>> > > other than that this was not
> included
> > > in
> > > > > the
> > > > > > > >>>> original
> > > > > > > >>>> > > >> > > interface?
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > An additional argument in favor of
> > making
> > > > it
> > > > > > > >>>> optional
> > > > > > > >>>> > is to
> > > > > > > >>>> > > >> > > >>>>>>> simplify source
> > > > > > > >>>> > > >> > > >>>>>>> > implementation. But on its own I'm
> not
> > > sure
> > > > > if
> > > > > > > that
> > > > > > > >>>> > would be
> > > > > > > >>>> > > >> > > >>>>>>> enough to
> > > > > > > >>>> > > >> > > >>>>>>> > justify making this feature optional.
> > > > Maybe.
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > > > clearer
> > > > > > to
> > > > > > > >>>> just
> > > > > > > >>>> > let
> > > > > > > >>>> > > >> end
> > > > > > > >>>> > > >> > > >>>>>>> users and
> > > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> > > implement
> > > > > this
> > > > > > > >>>> feature.
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > As I wrote above that would be an
> > > > interesting
> > > > > > > >>>> choice to
> > > > > > > >>>> > make
> > > > > > > >>>> > > >> > > (ease
> > > > > > > >>>> > > >> > > >>>>>>> of
> > > > > > > >>>> > > >> > > >>>>>>> > implementation for new users, vs
> system
> > > > > > > >>>> consistency).
> > > > > > > >>>> > > >> > Regardless
> > > > > > > >>>> > > >> > > >>>>>>> of that,
> > > > > > > >>>> > > >> > > >>>>>>> > yes, for me the main argument is the
> > API
> > > > > > backward
> > > > > > > >>>> > > >> > compatibility.
> > > > > > > >>>> > > >> > > >>>>>>> But let's
> > > > > > > >>>> > > >> > > >>>>>>> > clear a couple of points:
> > > > > > > >>>> > > >> > > >>>>>>> > - The current proposal adding methods
> > to
> > > > the
> > > > > > base
> > > > > > > >>>> > interface
> > > > > > > >>>> > > >> > with
> > > > > > > >>>> > > >> > > >>>>>>> default
> > > > > > > >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL
> feature.
> > > > Same
> > > > > as
> > > > > > > the
> > > > > > > >>>> > > >> decorative
> > > > > > > >>>> > > >> > > >>>>>>> version
> > > > > > > >>>> > > >> > > >>>>>>> > would be.
> > > > > > > >>>> > > >> > > >>>>>>> > - Decorative version could implement
> > > "throw
> > > > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> > > > > > > >>>> > > >> > > >>>>>>> > if user enabled watermark alignment
> > just
> > > as
> > > > > > well
> > > > > > > >>>> and I
> > > > > > > >>>> > agree
> > > > > > > >>>> > > >> > > >>>>>>> that's a
> > > > > > > >>>> > > >> > > >>>>>>> > better option compared to logging a
> > > > warning.
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > Best,
> > > > > > > >>>> > > >> > > >>>>>>> > Piotrek
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > > > > > > >>>> > becket.qin@gmail.com>
> > > > > > > >>>> > > >> > > >>>>>>> napisał(a):
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient
> > > > > discussion,
> > > > > > > >>>> Piotr and
> > > > > > > >>>> > > >> Dawid.
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing
> > reading
> > > > > from
> > > > > > a
> > > > > > > >>>> split
> > > > > > > >>>> > an
> > > > > > > >>>> > > >> > > optional
> > > > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > > > >>>> > > >> > > >>>>>>> > > other than that this was not
> included
> > > in
> > > > > the
> > > > > > > >>>> original
> > > > > > > >>>> > > >> > > interface?
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > To be honest I am really worried
> > about
> > > > the
> > > > > > > >>>> complexity
> > > > > > > >>>> > of
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> user story
> > > > > > > >>>> > > >> > > >>>>>>> > > here. Optional features like this
> > have
> > > a
> > > > > high
> > > > > > > >>>> > overhead.
> > > > > > > >>>> > > >> > Imagine
> > > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > > >>>> > > >> > > >>>>>>> > > feature is optional, now a user
> > enabled
> > > > > > > watermark
> > > > > > > >>>> > > >> alignment
> > > > > > > >>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>>>> defined a
> > > > > > > >>>> > > >> > > >>>>>>> > > few watermark groups. Would it
> work?
> > > Hmm,
> > > > > > that
> > > > > > > >>>> > depends on
> > > > > > > >>>> > > >> > > >>>>>>> whether the
> > > > > > > >>>> > > >> > > >>>>>>> > > involved Source has implmemented
> this
> > > > > > feature.
> > > > > > > >>>> If the
> > > > > > > >>>> > > >> Sources
> > > > > > > >>>> > > >> > > >>>>>>> are well
> > > > > > > >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise
> end
> > > > users
> > > > > > may
> > > > > > > >>>> have to
> > > > > > > >>>> > > >> look
> > > > > > > >>>> > > >> > > into
> > > > > > > >>>> > > >> > > >>>>>>> the code
> > > > > > > >>>> > > >> > > >>>>>>> > > of the Source to see whether the
> > > feature
> > > > is
> > > > > > > >>>> supported.
> > > > > > > >>>> > > >> Which
> > > > > > > >>>> > > >> > is
> > > > > > > >>>> > > >> > > >>>>>>> something
> > > > > > > >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > > > clearer
> > > > > > to
> > > > > > > >>>> just
> > > > > > > >>>> > let
> > > > > > > >>>> > > >> end
> > > > > > > >>>> > > >> > > >>>>>>> users and
> > > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> > > implement
> > > > > this
> > > > > > > >>>> feature.
> > > > > > > >>>> > > >> After
> > > > > > > >>>> > > >> > > all
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to
> > the
> > > > end
> > > > > > > >>>> users. If
> > > > > > > >>>> > in
> > > > > > > >>>> > > >> some
> > > > > > > >>>> > > >> > > >>>>>>> rare cases,
> > > > > > > >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a
> > > clear
> > > > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException
> > > > > > > >>>> > > >> > > >>>>>>> > will
> > > > > > > >>>> > > >> > > >>>>>>> > > be thrown to tell users to
> explicitly
> > > > > remove
> > > > > > > this
> > > > > > > >>>> > Source
> > > > > > > >>>> > > >> from
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > watermark
> > > > > > > >>>> > > >> > > >>>>>>> > > group. I don't think we should
> have a
> > > > > warning
> > > > > > > >>>> message
> > > > > > > >>>> > > >> here,
> > > > > > > >>>> > > >> > as
> > > > > > > >>>> > > >> > > >>>>>>> they tend
> > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do
> > > this,
> > > > we
> > > > > > > >>>> don't even
> > > > > > > >>>> > > >> need
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> > supportXXX
> > > > > > > >>>> > > >> > > >>>>>>> > > method in the Source for this
> > feature.
> > > In
> > > > > > fact
> > > > > > > >>>> this is
> > > > > > > >>>> > > >> > exactly
> > > > > > > >>>> > > >> > > >>>>>>> how many
> > > > > > > >>>> > > >> > > >>>>>>> > > interfaces works today. For
> example,
> > > > > > > >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > > > > > >>>> > > >> > > >>>>>>> > not
> > > > > > > >>>> > > >> > > >>>>>>> > > supported by Pravega source because
> > it
> > > > does
> > > > > > not
> > > > > > > >>>> > support
> > > > > > > >>>> > > >> > partial
> > > > > > > >>>> > > >> > > >>>>>>> failover.
> > > > > > > >>>> > > >> > > >>>>>>> > > In that case, it simply throws an
> > > > exception
> > > > > > to
> > > > > > > >>>> > trigger a
> > > > > > > >>>> > > >> > global
> > > > > > > >>>> > > >> > > >>>>>>> recovery.
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > The reason we add a default
> > > > implementation
> > > > > in
> > > > > > > >>>> this
> > > > > > > >>>> > case
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > > >>>>>>> just for
> > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so
> > the
> > > > old
> > > > > > > >>>> source can
> > > > > > > >>>> > > >> still
> > > > > > > >>>> > > >> > > >>>>>>> compile.
> > > > > > > >>>> > > >> > > >>>>>>> > Sure,
> > > > > > > >>>> > > >> > > >>>>>>> > > in short term, this feature might
> not
> > > be
> > > > > > > >>>> supported by
> > > > > > > >>>> > many
> > > > > > > >>>> > > >> > > >>>>>>> existing
> > > > > > > >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is
> quite
> > > > > visible
> > > > > > to
> > > > > > > >>>> the
> > > > > > > >>>> > source
> > > > > > > >>>> > > >> > > >>>>>>> developers
> > > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > > >>>> > > >> > > >>>>>>> > > they did not override the default
> > impl
> > > > > which
> > > > > > > >>>> throws an
> > > > > > > >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > @Dawid,
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX()
> > method
> > > > in
> > > > > > the
> > > > > > > >>>> Source
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> the single
> > > > > > > >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to
> > > > > implement
> > > > > > > >>>> this
> > > > > > > >>>> > > >> feature.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > I also don't find it entirely true.
> > > Half
> > > > of
> > > > > > the
> > > > > > > >>>> > classes
> > > > > > > >>>> > > >> are
> > > > > > > >>>> > > >> > > >>>>>>> theoretically
> > > > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes
> > from
> > > > the
> > > > > > > >>>> point of
> > > > > > > >>>> > view
> > > > > > > >>>> > > >> how
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically
> users
> > do
> > > > not
> > > > > > > need
> > > > > > > >>>> to
> > > > > > > >>>> > use
> > > > > > > >>>> > > >> any
> > > > > > > >>>> > > >> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader.
> > Would
> > > > be
> > > > > > > weird
> > > > > > > >>>> to
> > > > > > > >>>> > list
> > > > > > > >>>> > > >> > their
> > > > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java
> > docs
> > > is
> > > > > to
> > > > > > > >>>> guide
> > > > > > > >>>> > users
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> implement the
> > > > > > > >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the
> > > > > preferred
> > > > > > > way
> > > > > > > >>>> to
> > > > > > > >>>> > > >> > implement a
> > > > > > > >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth
> > mentioning
> > > > > that.
> > > > > > > >>>> Even the
> > > > > > > >>>> > > >> Java
> > > > > > > >>>> > > >> > > >>>>>>> language
> > > > > > > >>>> > > >> > > >>>>>>> > > documentation interfaces lists the
> > > konwn
> > > > > > > >>>> > implementations
> > > > > > > >>>> > > >> [1]
> > > > > > > >>>> > > >> > so
> > > > > > > >>>> > > >> > > >>>>>>> people
> > > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > > >>>> > > >> > > >>>>>>> > > leverage them. But for this
> > particular
> > > > > case,
> > > > > > if
> > > > > > > >>>> we
> > > > > > > >>>> > make
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> feature
> > > > > > > >>>> > > >> > > >>>>>>> > > non-optional, we don't even need
> the
> > > > > > > supportXXX()
> > > > > > > >>>> > method
> > > > > > > >>>> > > >> for
> > > > > > > >>>> > > >> > > now.
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > Thanks,
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM
> Dawid
> > > > > > > Wysakowicz <
> > > > > > > >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > First of all, let me say I am
> happy
> > > > with
> > > > > > > >>>> whichever
> > > > > > > >>>> > > >> option
> > > > > > > >>>> > > >> > is
> > > > > > > >>>> > > >> > > >>>>>>> agreed in
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > > discussion.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points
> > from
> > > > the
> > > > > > > >>>> discussion
> > > > > > > >>>> > > >> > though:
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > @Becket:
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > The main argument for adding the
> > > > methods
> > > > > to
> > > > > > > the
> > > > > > > >>>> > > >> > SourceReader
> > > > > > > >>>> > > >> > > >>>>>>> is that
> > > > > > > >>>> > > >> > > >>>>>>> > > these
> > > > > > > >>>> > > >> > > >>>>>>> > > > methods are effectively
> > NON-OPTIONAL
> > > to
> > > > > > > >>>> SourceReader
> > > > > > > >>>> > > >> impl,
> > > > > > > >>>> > > >> > > i.e.
> > > > > > > >>>> > > >> > > >>>>>>> > starting
> > > > > > > >>>> > > >> > > >>>>>>> > > > from this FLIP, all the
> > SourceReaders
> > > > > impl
> > > > > > > are
> > > > > > > >>>> > expected
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> support this
> > > > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > > > implementations
> > > > > > may
> > > > > > > >>>> not
> > > > > > > >>>> > have
> > > > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> > > distinguish
> > > > > the
> > > > > > > new
> > > > > > > >>>> > features
> > > > > > > >>>> > > >> > from
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > > > >>>> > > >> > > >>>>>>> > > > features. While the public
> > decorative
> > > > > > > >>>> interface is a
> > > > > > > >>>> > > >> > solution
> > > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not
> > use
> > > it
> > > > > for
> > > > > > > the
> > > > > > > >>>> > features
> > > > > > > >>>> > > >> > that
> > > > > > > >>>> > > >> > > >>>>>>> are
> > > > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > I don't think that this feature
> is
> > > > > > > >>>> NON-OPTIONAL.
> > > > > > > >>>> > Even
> > > > > > > >>>> > > >> > though
> > > > > > > >>>> > > >> > > >>>>>>> > preferred, I
> > > > > > > >>>> > > >> > > >>>>>>> > > > still think it can be simply
> > > optional.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX()
> > > method
> > > > > in
> > > > > > > the
> > > > > > > >>>> > Source
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > > be
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > single
> > > > > > > >>>> > > >> > > >>>>>>> > > > source of truth regarding how to
> > > > > implement
> > > > > > > this
> > > > > > > >>>> > feature.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > I also don't find it entirely
> true.
> > > > Half
> > > > > of
> > > > > > > the
> > > > > > > >>>> > classes
> > > > > > > >>>> > > >> are
> > > > > > > >>>> > > >> > > >>>>>>> > theoretically
> > > > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes
> > from
> > > > the
> > > > > > > >>>> point of
> > > > > > > >>>> > view
> > > > > > > >>>> > > >> how
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically
> users
> > do
> > > > not
> > > > > > > need
> > > > > > > >>>> to
> > > > > > > >>>> > use
> > > > > > > >>>> > > >> any
> > > > > > > >>>> > > >> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader.
> > Would
> > > > be
> > > > > > > weird
> > > > > > > >>>> to
> > > > > > > >>>> > list
> > > > > > > >>>> > > >> > their
> > > > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > @Piotr
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > If we have all of the methods
> with
> > > > > default
> > > > > > > >>>> > > >> implementation
> > > > > > > >>>> > > >> > in
> > > > > > > >>>> > > >> > > >>>>>>> the base
> > > > > > > >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give
> any
> > > > clue
> > > > > to
> > > > > > > >>>> the user
> > > > > > > >>>> > > >> which
> > > > > > > >>>> > > >> > > set
> > > > > > > >>>> > > >> > > >>>>>>> of
> > > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > > >>>> > > >> > > >>>>>>> > > > are required to be implemented at
> > the
> > > > > same
> > > > > > > >>>> time.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > I feel that no matter which
> option
> > we
> > > > > > choose
> > > > > > > >>>> this
> > > > > > > >>>> > can
> > > > > > > >>>> > > >> not
> > > > > > > >>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> solved
> > > > > > > >>>> > > >> > > >>>>>>> > > > entirely in either of the
> options,
> > > > > because
> > > > > > of
> > > > > > > >>>> the
> > > > > > > >>>> > point
> > > > > > > >>>> > > >> > above
> > > > > > > >>>> > > >> > > >>>>>>> and
> > > > > > > >>>> > > >> > > >>>>>>> > because
> > > > > > > >>>> > > >> > > >>>>>>> > > > the signature of
> > > > > > > >>>> SplitReader#pauseOrResumeSplits and
> > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits
> > are
> > > > > > slightly
> > > > > > > >>>> > different
> > > > > > > >>>> > > >> > (one
> > > > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > > > >>>> > > >> > > >>>>>>> > > > splits with splitId the other one
> > > > passes
> > > > > > the
> > > > > > > >>>> splits
> > > > > > > >>>> > > >> > > directly).
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Dawid
> > > > > > > >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin
> > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against
> > the
> > > > > > > >>>> decorative
> > > > > > > >>>> > > >> > interfaces,
> > > > > > > >>>> > > >> > > >>>>>>> but I do
> > > > > > > >>>> > > >> > > >>>>>>> > > > think we should use them with
> > > caution.
> > > > > The
> > > > > > > main
> > > > > > > >>>> > argument
> > > > > > > >>>> > > >> > for
> > > > > > > >>>> > > >> > > >>>>>>> adding the
> > > > > > > >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is
> that
> > > > these
> > > > > > > >>>> methods
> > > > > > > >>>> > are
> > > > > > > >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to
> > > > SourceReader
> > > > > > > impl,
> > > > > > > >>>> i.e.
> > > > > > > >>>> > > >> > starting
> > > > > > > >>>> > > >> > > >>>>>>> from this
> > > > > > > >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl
> > are
> > > > > > expected
> > > > > > > >>>> to
> > > > > > > >>>> > support
> > > > > > > >>>> > > >> > this
> > > > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > > > implementations
> > > > > > may
> > > > > > > >>>> not
> > > > > > > >>>> > have
> > > > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> > > distinguish
> > > > > the
> > > > > > > new
> > > > > > > >>>> > features
> > > > > > > >>>> > > >> > from
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > > > >>>> > > >> > > >>>>>>> > > > features. While the public
> > decorative
> > > > > > > >>>> interface is a
> > > > > > > >>>> > > >> > solution
> > > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not
> > use
> > > it
> > > > > for
> > > > > > > the
> > > > > > > >>>> > features
> > > > > > > >>>> > > >> > that
> > > > > > > >>>> > > >> > > >>>>>>> are
> > > > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > That said, this feature is
> optional
> > > for
> > > > > > > >>>> > SplitReaders.
> > > > > > > >>>> > > >> > > Arguably
> > > > > > > >>>> > > >> > > >>>>>>> we can
> > > > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > > > >>>> > > >> > > >>>>>>> > > > a decorative interface for that,
> > but
> > > > for
> > > > > > > >>>> simplicity
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> symmetry of the
> > > > > > > >>>> > > >> > > >>>>>>> > > > interface, personally I prefer
> just
> > > > > adding
> > > > > > a
> > > > > > > >>>> new
> > > > > > > >>>> > method.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Regarding the advantages you
> > > mentioned
> > > > > > about
> > > > > > > >>>> the
> > > > > > > >>>> > > >> decorative
> > > > > > > >>>> > > >> > > >>>>>>> interfaces,
> > > > > > > >>>> > > >> > > >>>>>>> > > > they would make sense if:
> > > > > > > >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > > > > > > >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative
> > > > interface
> > > > > > > >>>> involved
> > > > > > > >>>> > for a
> > > > > > > >>>> > > >> > > >>>>>>> feature.
> > > > > > > >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all
> the
> > > > > methods
> > > > > > > are
> > > > > > > >>>> > grouped
> > > > > > > >>>> > > >> > > >>>>>>> together will
> > > > > > > >>>> > > >> > > >>>>>>> > not
> > > > > > > >>>> > > >> > > >>>>>>> > > > stand.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Compared with that, I think the
> > > current
> > > > > > > >>>> solution
> > > > > > > >>>> > works
> > > > > > > >>>> > > >> fine
> > > > > > > >>>> > > >> > > in
> > > > > > > >>>> > > >> > > >>>>>>> all
> > > > > > > >>>> > > >> > > >>>>>>> > cases,
> > > > > > > >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method
> in
> > > > > Source,
> > > > > > > and
> > > > > > > >>>> > default
> > > > > > > >>>> > > >> > > >>>>>>> methods /
> > > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces in base
> > > > > interfaces.".
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > The advantages are:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement
> base
> > > > > > interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > In the current approach, the Java
> > doc
> > > > of
> > > > > > the
> > > > > > > >>>> > > >> SupportXXX()
> > > > > > > >>>> > > >> > > >>>>>>> method in the
> > > > > > > >>>> > > >> > > >>>>>>> > > > Source would be the single source
> > of
> > > > > truth
> > > > > > > >>>> regarding
> > > > > > > >>>> > > >> how to
> > > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > > > >>>> > > >> > > >>>>>>> > > > feature. It lists the method that
> > has
> > > > to
> > > > > be
> > > > > > > >>>> > implemented
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> support this
> > > > > > > >>>> > > >> > > >>>>>>> > > > feature, regardless of how many
> > > > classes /
> > > > > > > >>>> > interfaces are
> > > > > > > >>>> > > >> > > >>>>>>> involved.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > When implementing the base
> > interface,
> > > > > users
> > > > > > > do
> > > > > > > >>>> not
> > > > > > > >>>> > need
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> implement a
> > > > > > > >>>> > > >> > > >>>>>>> > > > method with default
> implementation.
> > > If
> > > > > they
> > > > > > > are
> > > > > > > >>>> > curious
> > > > > > > >>>> > > >> > what
> > > > > > > >>>> > > >> > > >>>>>>> the method
> > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > >>>> > > >> > > >>>>>>> > > > for, the java doc of that method
> > > simply
> > > > > > > points
> > > > > > > >>>> > users to
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > SupportXXX()
> > > > > > > >>>> > > >> > > >>>>>>> > > > method in the Source. It seems
> not
> > > > adding
> > > > > > > work
> > > > > > > >>>> to
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> users
> > > > > > > >>>> > > >> > > >>>>>>> compared
> > > > > > > >>>> > > >> > > >>>>>>> > with
> > > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives
> > much
> > > > > > better
> > > > > > > >>>> > > >> > discoverability.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > - all of the methods from a
> single
> > > > > feature
> > > > > > > are
> > > > > > > >>>> > grouped
> > > > > > > >>>> > > >> in a
> > > > > > > >>>> > > >> > > >>>>>>> single
> > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together
> with
> > > > their
> > > > > > > >>>> dedicated
> > > > > > > >>>> > java
> > > > > > > >>>> > > >> > doc.
> > > > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> > > using
> > > > > the
> > > > > > > >>>> > decorator
> > > > > > > >>>> > > >> name
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > - if an optional feature requires
> > two
> > > > > > methods
> > > > > > > >>>> to be
> > > > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > These two points are not true
> when
> > > > > multiple
> > > > > > > >>>> > components
> > > > > > > >>>> > > >> and
> > > > > > > >>>> > > >> > > >>>>>>> classes are
> > > > > > > >>>> > > >> > > >>>>>>> > > > involved collaboratively to
> > provide a
> > > > > > > feature.
> > > > > > > >>>> In
> > > > > > > >>>> > our
> > > > > > > >>>> > > >> case,
> > > > > > > >>>> > > >> > > we
> > > > > > > >>>> > > >> > > >>>>>>> have
> > > > > > > >>>> > > >> > > >>>>>>> > both
> > > > > > > >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader
> > > involved.
> > > > > And
> > > > > > > >>>> there
> > > > > > > >>>> > might
> > > > > > > >>>> > > >> be
> > > > > > > >>>> > > >> > > other
> > > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > > >>>> > > >> > > >>>>>>> > > > on the JM side involved for some
> > > future
> > > > > > > >>>> features.
> > > > > > > >>>> > So the
> > > > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > > >>>> > > >> > > >>>>>>> > > > can actually be scattered over
> the
> > > > > places.
> > > > > > > That
> > > > > > > >>>> > said, we
> > > > > > > >>>> > > >> > may
> > > > > > > >>>> > > >> > > >>>>>>> still use
> > > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces for each
> > > > component,
> > > > > > if
> > > > > > > >>>> the
> > > > > > > >>>> > > >> feature is
> > > > > > > >>>> > > >> > > >>>>>>> optional,
> > > > > > > >>>> > > >> > > >>>>>>> > > given
> > > > > > > >>>> > > >> > > >>>>>>> > > > there is a single source of truth
> > for
> > > > the
> > > > > > > >>>> feature.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Here I would strongly lean
> towards
> > > > making
> > > > > > > life
> > > > > > > >>>> > easier
> > > > > > > >>>> > > >> for
> > > > > > > >>>> > > >> > new
> > > > > > > >>>> > > >> > > >>>>>>> users,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at
> the
> > > > (imo)
> > > > > > > >>>> slight
> > > > > > > >>>> > expense
> > > > > > > >>>> > > >> > for
> > > > > > > >>>> > > >> > > >>>>>>> the power
> > > > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > I actually think the current
> > approach
> > > > is
> > > > > > > >>>> simpler,
> > > > > > > >>>> > more
> > > > > > > >>>> > > >> > > >>>>>>> extensible and
> > > > > > > >>>> > > >> > > >>>>>>> > > more
> > > > > > > >>>> > > >> > > >>>>>>> > > > general for all the users. Can
> you
> > > > > > articulate
> > > > > > > >>>> a bit
> > > > > > > >>>> > > >> more on
> > > > > > > >>>> > > >> > > >>>>>>> which part
> > > > > > > >>>> > > >> > > >>>>>>> > > you
> > > > > > > >>>> > > >> > > >>>>>>> > > > think makes users harder to
> > > understand?
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > There is another benefit of the
> > > > > decorative
> > > > > > > >>>> > interfaces
> > > > > > > >>>> > > >> which
> > > > > > > >>>> > > >> > > is
> > > > > > > >>>> > > >> > > >>>>>>> not
> > > > > > > >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth
> > > > considering
> > > > > > > here.
> > > > > > > >>>> > Usually
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > > interfaces give slightly better
> > > > backwards
> > > > > > > >>>> > compatibility
> > > > > > > >>>> > > >> > than
> > > > > > > >>>> > > >> > > >>>>>>> the new
> > > > > > > >>>> > > >> > > >>>>>>> > > > default method in the interfaces.
> > > That
> > > > is
> > > > > > > when
> > > > > > > >>>> > users are
> > > > > > > >>>> > > >> > > using
> > > > > > > >>>> > > >> > > >>>>>>> a jar
> > > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > > >>>> > > >> > > >>>>>>> > > > was compiled with an older
> version
> > of
> > > > > Flink
> > > > > > > >>>> which
> > > > > > > >>>> > does
> > > > > > > >>>> > > >> not
> > > > > > > >>>> > > >> > > >>>>>>> have the
> > > > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > > > >>>> > > >> > > >>>>>>> > > > method in the interfaces in
> > > question. A
> > > > > > > >>>> decorative
> > > > > > > >>>> > > >> > interface
> > > > > > > >>>> > > >> > > >>>>>>> may still
> > > > > > > >>>> > > >> > > >>>>>>> > > > provide backwards compatibility
> in
> > > that
> > > > > > case,
> > > > > > > >>>> while
> > > > > > > >>>> > > >> default
> > > > > > > >>>> > > >> > > >>>>>>> method impl
> > > > > > > >>>> > > >> > > >>>>>>> > > > cannot.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do
> > not
> > > > > > > guarantee
> > > > > > > >>>> > custom
> > > > > > > >>>> > > >> > > >>>>>>> components
> > > > > > > >>>> > > >> > > >>>>>>> > > compiled
> > > > > > > >>>> > > >> > > >>>>>>> > > > with an older version can run
> with
> > a
> > > > > newer
> > > > > > > >>>> version
> > > > > > > >>>> > of
> > > > > > > >>>> > > >> > Flink.
> > > > > > > >>>> > > >> > > A
> > > > > > > >>>> > > >> > > >>>>>>> > recompile
> > > > > > > >>>> > > >> > > >>>>>>> > > > with a newer version would be
> > > required.
> > > > > > That
> > > > > > > >>>> said,
> > > > > > > >>>> > if
> > > > > > > >>>> > > >> we do
> > > > > > > >>>> > > >> > > >>>>>>> care about
> > > > > > > >>>> > > >> > > >>>>>>> > > > this, we can just change the
> > > > > "supportXXX()"
> > > > > > > >>>> method
> > > > > > > >>>> > in
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> Source
> > > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > > >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and
> > > leave
> > > > > the
> > > > > > > >>>> other
> > > > > > > >>>> > parts
> > > > > > > >>>> > > >> > > >>>>>>> unchanged.
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Thanks,
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM
> > Piotr
> > > > > > > Nowojski <
> > > > > > > >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Hi,
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late,
> > but I
> > > > was
> > > > > > OoO
> > > > > > > >>>> for
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> last
> > > > > > > >>>> > > >> > > two
> > > > > > > >>>> > > >> > > >>>>>>> weeks.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I
> would
> > be
> > > > > > > actually
> > > > > > > >>>> > against
> > > > > > > >>>> > > >> > adding
> > > > > > > >>>> > > >> > > >>>>>>> those
> > > > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the
> > > reasons
> > > > > > > >>>> mentioned
> > > > > > > >>>> > above.
> > > > > > > >>>> > > >> > > >>>>>>> Clogging the
> > > > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > > > >>>> > > >> > > >>>>>>> > > >> interface for new users with
> tons
> > of
> > > > > > methods
> > > > > > > >>>> that
> > > > > > > >>>> > they
> > > > > > > >>>> > > >> do
> > > > > > > >>>> > > >> > > not
> > > > > > > >>>> > > >> > > >>>>>>> need, do
> > > > > > > >>>> > > >> > > >>>>>>> > > not
> > > > > > > >>>> > > >> > > >>>>>>> > > >> understand and do not know what
> to
> > > do
> > > > > with
> > > > > > > >>>> them.
> > > > > > > >>>> > > >> Moreover,
> > > > > > > >>>> > > >> > > >>>>>>> such
> > > > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem
> > if
> > > a
> > > > > > > feature
> > > > > > > >>>> > requires
> > > > > > > >>>> > > >> two
> > > > > > > >>>> > > >> > > or
> > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to be implemented at the same
> > time.
> > > If
> > > > > we
> > > > > > > >>>> have all
> > > > > > > >>>> > of
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> methods with
> > > > > > > >>>> > > >> > > >>>>>>> > > >> default implementation in the
> base
> > > > > > > interface,
> > > > > > > >>>> the
> > > > > > > >>>> > API
> > > > > > > >>>> > > >> > > doesn't
> > > > > > > >>>> > > >> > > >>>>>>> give any
> > > > > > > >>>> > > >> > > >>>>>>> > > >> clue
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to the user which set of methods
> > are
> > > > > > > required
> > > > > > > >>>> to be
> > > > > > > >>>> > > >> > > >>>>>>> implemented at the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > > > >>>> > > >> > > >>>>>>> > > >> time.
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback
> > of
> > > > > > > decorative
> > > > > > > >>>> > > >> interfaces
> > > > > > > >>>> > > >> > is
> > > > > > > >>>> > > >> > > >>>>>>> which
> > > > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > they can decorate and which
> > > > > combinations
> > > > > > > of
> > > > > > > >>>> > multiple
> > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a
> new
> > > > > feature
> > > > > > > >>>> added
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or
> pre-partitioned
> > > data
> > > > > > > >>>> aware), are
> > > > > > > >>>> > we
> > > > > > > >>>> > > >> > going
> > > > > > > >>>> > > >> > > >>>>>>> to create
> > > > > > > >>>> > > >> > > >>>>>>> > > >> another
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such
> as
> > > > > > > >>>> > SortedSplitReader or
> > > > > > > >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I
> think
> > > the
> > > > > > > >>>> additional
> > > > > > > >>>> > > >> > decorative
> > > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually
> > > > increases
> > > > > > the
> > > > > > > >>>> > > >> understanding
> > > > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > > > >>>> > > >> > > >>>>>>> > users
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > because they have to know what
> > > > > > decorative
> > > > > > > >>>> > interfaces
> > > > > > > >>>> > > >> are
> > > > > > > >>>> > > >> > > >>>>>>> there,
> > > > > > > >>>> > > >> > > >>>>>>> > which
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > interface they can decorate
> and
> > > > which
> > > > > > > >>>> > combinations of
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which
> > are
> > > > > not.
> > > > > > > >>>> Ideally
> > > > > > > >>>> > we
> > > > > > > >>>> > > >> want
> > > > > > > >>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> avoid
> > > > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how
> > > > > embedding
> > > > > > > >>>> default
> > > > > > > >>>> > > >> methods
> > > > > > > >>>> > > >> > > in
> > > > > > > >>>> > > >> > > >>>>>>> the base
> > > > > > > >>>> > > >> > > >>>>>>> > > >> interface is solving the
> problem:
> > > what
> > > > > can
> > > > > > > be
> > > > > > > >>>> > combined
> > > > > > > >>>> > > >> or
> > > > > > > >>>> > > >> > > >>>>>>> not? If
> > > > > > > >>>> > > >> > > >>>>>>> > there
> > > > > > > >>>> > > >> > > >>>>>>> > > >> are
> > > > > > > >>>> > > >> > > >>>>>>> > > >> two conflicting features, having
> > > > > > decorative
> > > > > > > >>>> > interfaces
> > > > > > > >>>> > > >> > that
> > > > > > > >>>> > > >> > > >>>>>>> can not be
> > > > > > > >>>> > > >> > > >>>>>>> > > >> mixed together actually makes
> much
> > > > more
> > > > > > > sense
> > > > > > > >>>> to me
> > > > > > > >>>> > > >> rather
> > > > > > > >>>> > > >> > > >>>>>>> than having
> > > > > > > >>>> > > >> > > >>>>>>> > > >> them
> > > > > > > >>>> > > >> > > >>>>>>> > > >> all in one base class. How would
> > you
> > > > > allow
> > > > > > > >>>> users to
> > > > > > > >>>> > > >> > > implement
> > > > > > > >>>> > > >> > > >>>>>>> only one
> > > > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> those two features?
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes,
> > > there
> > > > > are
> > > > > > > >>>> > drawbacks:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user discover what
> > > > > decorators
> > > > > > > are
> > > > > > > >>>> > there?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user know where the
> > > > > decorator
> > > > > > > can
> > > > > > > >>>> be
> > > > > > > >>>> > > >> applied
> > > > > > > >>>> > > >> > to?
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for
> > more
> > > > > power
> > > > > > > >>>> users,
> > > > > > > >>>> > that
> > > > > > > >>>> > > >> can
> > > > > > > >>>> > > >> > > be
> > > > > > > >>>> > > >> > > >>>>>>> > mitigated
> > > > > > > >>>> > > >> > > >>>>>>> > > by
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the documentation. For example
> > > listing
> > > > > all
> > > > > > > of
> > > > > > > >>>> the
> > > > > > > >>>> > > >> > decorators
> > > > > > > >>>> > > >> > > >>>>>>> with
> > > > > > > >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the
> > > docs
> > > > > and
> > > > > > in
> > > > > > > >>>> the
> > > > > > > >>>> > java
> > > > > > > >>>> > > >> > docs.
> > > > > > > >>>> > > >> > > >>>>>>> More
> > > > > > > >>>> > > >> > > >>>>>>> > > >> experienced users will be able
> to
> > > deal
> > > > > > with
> > > > > > > >>>> those
> > > > > > > >>>> > > >> issues
> > > > > > > >>>> > > >> > > >>>>>>> easier, as
> > > > > > > >>>> > > >> > > >>>>>>> > they
> > > > > > > >>>> > > >> > > >>>>>>> > > >> will already have some basic
> > > > > understanding
> > > > > > > of
> > > > > > > >>>> > Flink.
> > > > > > > >>>> > > >> Also
> > > > > > > >>>> > > >> > if
> > > > > > > >>>> > > >> > > >>>>>>> user has
> > > > > > > >>>> > > >> > > >>>>>>> > a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve,
> he
> > > > will
> > > > > > > google
> > > > > > > >>>> > search a
> > > > > > > >>>> > > >> > > >>>>>>> potential
> > > > > > > >>>> > > >> > > >>>>>>> > > solution
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while
> > > doing
> > > > > > that
> > > > > > > >>>> he is
> > > > > > > >>>> > very
> > > > > > > >>>> > > >> > > likely
> > > > > > > >>>> > > >> > > >>>>>>> to
> > > > > > > >>>> > > >> > > >>>>>>> > > discover
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the decorator that he needs
> anyway
> > > in
> > > > > the
> > > > > > > >>>> docs.
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> The advantages are:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement
> base
> > > > > > interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> - all of the methods from a
> single
> > > > > feature
> > > > > > > are
> > > > > > > >>>> > grouped
> > > > > > > >>>> > > >> in
> > > > > > > >>>> > > >> > a
> > > > > > > >>>> > > >> > > >>>>>>> single
> > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together
> with
> > > > their
> > > > > > > >>>> dedicated
> > > > > > > >>>> > java
> > > > > > > >>>> > > >> > doc.
> > > > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> > > using
> > > > > the
> > > > > > > >>>> > decorator
> > > > > > > >>>> > > >> name
> > > > > > > >>>> > > >> > > >>>>>>> > > >> - if an optional feature
> requires
> > > two
> > > > > > > methods
> > > > > > > >>>> to be
> > > > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean
> towards
> > > > > making
> > > > > > > life
> > > > > > > >>>> > easier
> > > > > > > >>>> > > >> for
> > > > > > > >>>> > > >> > > new
> > > > > > > >>>> > > >> > > >>>>>>> users,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at
> the
> > > > (imo)
> > > > > > > >>>> slight
> > > > > > > >>>> > expense
> > > > > > > >>>> > > >> > for
> > > > > > > >>>> > > >> > > >>>>>>> the power
> > > > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Piotrek
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket
> > Qin
> > > <
> > > > > > > >>>> > > >> becket.qin@gmail.com
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > napisał(a):
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian
> > and
> > > > > > Dawid,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good
> > > > summary.
> > > > > > This
> > > > > > > >>>> is a
> > > > > > > >>>> > > >> really
> > > > > > > >>>> > > >> > > >>>>>>> helpful
> > > > > > > >>>> > > >> > > >>>>>>> > > >> discussion.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel
> that
> > > it
> > > > > > might
> > > > > > > >>>> still
> > > > > > > >>>> > be
> > > > > > > >>>> > > >> > better
> > > > > > > >>>> > > >> > > >>>>>>> to add the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the
> > Source
> > > > > > rather
> > > > > > > >>>> than
> > > > > > > >>>> > > >> > > SourceReader.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we
> are
> > > > trying
> > > > > > to
> > > > > > > do
> > > > > > > >>>> > here is
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > let
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > framework know what the Source
> > is
> > > > > > capable
> > > > > > > >>>> of. In
> > > > > > > >>>> > this
> > > > > > > >>>> > > >> > > FLIP,
> > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > >>>> > > >> > > >>>>>>> > happens
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > be the capability that only
> > > involves
> > > > > > > >>>> > SourceReader.
> > > > > > > >>>> > > >> But
> > > > > > > >>>> > > >> > in
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > future,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> it is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > possible that another
> > > functionality
> > > > > > > involves
> > > > > > > >>>> > both the
> > > > > > > >>>> > > >> > > >>>>>>> > SplitEnumerator
> > > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case,
> > > > following
> > > > > > the
> > > > > > > >>>> current
> > > > > > > >>>> > > >> > > approach,
> > > > > > > >>>> > > >> > > >>>>>>> we
> > > > > > > >>>> > > >> > > >>>>>>> > should
> > > > > > > >>>> > > >> > > >>>>>>> > > >> put
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in
> > both
> > > > > > > >>>> > SplitEnumerator
> > > > > > > >>>> > > >> and
> > > > > > > >>>> > > >> > > >>>>>>> SourceReader.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in
> > the
> > > > > > > >>>> SourceReader,
> > > > > > > >>>> > then
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> JM would
> > > > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order
> > to
> > > > know
> > > > > > > >>>> whether
> > > > > > > >>>> > this
> > > > > > > >>>> > > >> > > feature
> > > > > > > >>>> > > >> > > >>>>>>> is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> supported,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if
> > we
> > > > put
> > > > > > the
> > > > > > > >>>> > > >> > "supportsXXX()"
> > > > > > > >>>> > > >> > > >>>>>>> method in
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Source, we will break the
> > > > "symmetric"
> > > > > > > design
> > > > > > > >>>> > because
> > > > > > > >>>> > > >> > this
> > > > > > > >>>> > > >> > > >>>>>>> FLIP
> > > > > > > >>>> > > >> > > >>>>>>> > chose a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > different way.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > This is also why I think
> > > > supportsXXX()
> > > > > > > >>>> method
> > > > > > > >>>> > seems a
> > > > > > > >>>> > > >> > good
> > > > > > > >>>> > > >> > > >>>>>>> thing to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> have,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > because when there are a few
> > > > > interfaces
> > > > > > /
> > > > > > > >>>> methods
> > > > > > > >>>> > > >> that
> > > > > > > >>>> > > >> > are
> > > > > > > >>>> > > >> > > >>>>>>> expected
> > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> be
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > implemented at the same time
> in
> > > > order
> > > > > to
> > > > > > > >>>> deliver
> > > > > > > >>>> > a
> > > > > > > >>>> > > >> > > feature,
> > > > > > > >>>> > > >> > > >>>>>>> it is
> > > > > > > >>>> > > >> > > >>>>>>> > > always
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > good to have a single source
> of
> > > > truth
> > > > > to
> > > > > > > >>>> tell the
> > > > > > > >>>> > > >> > > framework
> > > > > > > >>>> > > >> > > >>>>>>> what to
> > > > > > > >>>> > > >> > > >>>>>>> > > do,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> so
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > the framework can do
> consistent
> > > > things
> > > > > > in
> > > > > > > >>>> > different
> > > > > > > >>>> > > >> > parts.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> > > > > > > >>>> sebastian@ververica.com>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b),
> > > i.e.
> > > > > > > >>>> > > >> AlignedSourceReader
> > > > > > > >>>> > > >> > +
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I
> feel
> > > > > awkward
> > > > > > > >>>> about is
> > > > > > > >>>> > > >> that we
> > > > > > > >>>> > > >> > > are
> > > > > > > >>>> > > >> > > >>>>>>> > > essentially
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > expecting almost all the
> > > > SourceReader
> > > > > > > >>>> > > >> implementations to
> > > > > > > >>>> > > >> > > >>>>>>> extend
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which
> > > effectively
> > > > > > makes
> > > > > > > >>>> the
> > > > > > > >>>> > > >> > SourceReader
> > > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > without the pausing support
> > > useless.
> > > > > So
> > > > > > > this
> > > > > > > >>>> > > >> indicates
> > > > > > > >>>> > > >> > > that
> > > > > > > >>>> > > >> > > >>>>>>> public
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or
> > > > > sub-interfaces
> > > > > > > >>>> for the
> > > > > > > >>>> > same
> > > > > > > >>>> > > >> > > >>>>>>> purpose) only
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > make sense if the original
> > > interface
> > > > > is
> > > > > > > also
> > > > > > > >>>> > > >> expected to
> > > > > > > >>>> > > >> > > be
> > > > > > > >>>> > > >> > > >>>>>>> used.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more
> > > sense
> > > > > to
> > > > > > > add
> > > > > > > >>>> the
> > > > > > > >>>> > > >> method
> > > > > > > >>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > original
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > interface itself.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Cheers,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05
> PM
> > > > Dawid
> > > > > > > >>>> Wysakowicz
> > > > > > > >>>> > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the
> nice
> > > > > > summary.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I think most of your points
> > > > aligned
> > > > > > with
> > > > > > > >>>> the
> > > > > > > >>>> > > >> > suggestions
> > > > > > > >>>> > > >> > > >>>>>>> I made to
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing
> > > your
> > > > > > reply
> > > > > > > (I
> > > > > > > >>>> > believe
> > > > > > > >>>> > > >> we
> > > > > > > >>>> > > >> > > hit
> > > > > > > >>>> > > >> > > >>>>>>> enter
> > > > > > > >>>> > > >> > > >>>>>>> > > >> nearly at
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced
> > > offline
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > > > > > > >>>> > > >> supportsWatermarksSplitAlignment
> > > > > > > >>>> > > >> > > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to
> > express
> > > > the
> > > > > > > >>>> general
> > > > > > > >>>> > > >> > capability
> > > > > > > >>>> > > >> > > of
> > > > > > > >>>> > > >> > > >>>>>>> > pausing.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > > > > > > >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > (option
> > > > > > > >>>> > > >> > > >>>>>>> > > >> b)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the
> > > > methods
> > > > > > > >>>> (option
> > > > > > > >>>> > c), I
> > > > > > > >>>> > > >> > > >>>>>>> suggest to
> > > > > > > >>>> > > >> > > >>>>>>> > simply
> > > > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt
> this
> > > is
> > > > > much
> > > > > > > >>>> > preferred
> > > > > > > >>>> > > >> > > approach
> > > > > > > >>>> > > >> > > >>>>>>> Becket,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless
> > > there
> > > > > is
> > > > > > an
> > > > > > > >>>> > opposition
> > > > > > > >>>> > > >> > > let's
> > > > > > > >>>> > > >> > > >>>>>>> go with
> > > > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > option c.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Dawid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06,
> Sebastian
> > > > > > Mattheis
> > > > > > > >>>> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit
> silent.
> > > Many
> > > > > > > thanks
> > > > > > > >>>> for
> > > > > > > >>>> > all
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> input and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit
> > new, I
> > > > > > needed
> > > > > > > >>>> some
> > > > > > > >>>> > time
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> catch up and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > structure
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion
> > and
> > > I
> > > > > > wanted
> > > > > > > >>>> to
> > > > > > > >>>> > find a
> > > > > > > >>>> > > >> way
> > > > > > > >>>> > > >> > > to
> > > > > > > >>>> > > >> > > >>>>>>> > structure
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I
> > had
> > > > the
> > > > > > > >>>> feeling
> > > > > > > >>>> > that
> > > > > > > >>>> > > >> some
> > > > > > > >>>> > > >> > > >>>>>>> concerns
> > > > > > > >>>> > > >> > > >>>>>>> > got
> > > > > > > >>>> > > >> > > >>>>>>> > > >> lost
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my
> > > > attempt
> > > > > > and
> > > > > > > >>>> please
> > > > > > > >>>> > > >> correct
> > > > > > > >>>> > > >> > > me
> > > > > > > >>>> > > >> > > >>>>>>> if
> > > > > > > >>>> > > >> > > >>>>>>> > > >> something is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I
> > tried
> > > to
> > > > > > > >>>> collect and
> > > > > > > >>>> > > >> > assemble
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > > opinions,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions
> > (to
> > > > the
> > > > > > > best
> > > > > > > >>>> of my
> > > > > > > >>>> > > >> > > >>>>>>> knowledge):
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split
> > alignment
> > > > > > > >>>> (pause/resume
> > > > > > > >>>> > > >> > behavior)
> > > > > > > >>>> > > >> > > >>>>>>> be a
> > > > > > > >>>> > > >> > > >>>>>>> > general
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > capability?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any
> > > reason
> > > > no
> > > > > > to
> > > > > > > >>>> have
> > > > > > > >>>> > it a
> > > > > > > >>>> > > >> > > general
> > > > > > > >>>> > > >> > > >>>>>>> > capability
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit
> > > method
> > > > it
> > > > > > is
> > > > > > > >>>> > actually
> > > > > > > >>>> > > >> > > >>>>>>> independent of
> > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here
> > to
> > > > have
> > > > > > it
> > > > > > > a
> > > > > > > >>>> > general
> > > > > > > >>>> > > >> > > >>>>>>> capability, we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > also agree on the right
> > wording.
> > > > > Does
> > > > > > > >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to
> what
> > is
> > > > > then
> > > > > > > >>>> actually
> > > > > > > >>>> > > >> meant?
> > > > > > > >>>> > > >> > (I
> > > > > > > >>>> > > >> > > >>>>>>> see it as
> > > > > > > >>>> > > >> > > >>>>>>> > > >> okay.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea
> > > whilst
> > > > > > Arvid
> > > > > > > >>>> > suggested
> > > > > > > >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be
> possible
> > > do
> > > > > > > >>>> > enable/disable
> > > > > > > >>>> > > >> split
> > > > > > > >>>> > > >> > > >>>>>>> alignment?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I would personally not
> disable
> > > the
> > > > > > split
> > > > > > > >>>> > alignment
> > > > > > > >>>> > > >> on
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >>>>>>> source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> reader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > side because if split
> > alignment
> > > is
> > > > > > used
> > > > > > > >>>> for
> > > > > > > >>>> > some
> > > > > > > >>>> > > >> other
> > > > > > > >>>> > > >> > > >>>>>>> use case
> > > > > > > >>>> > > >> > > >>>>>>> > (see
> > > > > > > >>>> > > >> > > >>>>>>> > > >> A)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > could have nasty side
> effects
> > on
> > > > > > > >>>> other/future
> > > > > > > >>>> > use
> > > > > > > >>>> > > >> > cases.
> > > > > > > >>>> > > >> > > >>>>>>> Instead,
> > > > > > > >>>> > > >> > > >>>>>>> > I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split
> > > > alignment"
> > > > > > > where
> > > > > > > >>>> I
> > > > > > > >>>> > think
> > > > > > > >>>> > > >> it
> > > > > > > >>>> > > >> > > >>>>>>> should
> > > > > > > >>>> > > >> > > >>>>>>> > disable
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger
> > for
> > > > > split
> > > > > > > >>>> > alignment.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a
> > > supportsX
> > > > > > > method?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to
> define
> > > the
> > > > > > scope
> > > > > > > >>>> of a
> > > > > > > >>>> > > >> supportsX
> > > > > > > >>>> > > >> > > >>>>>>> method
> > > > > > > >>>> > > >> > > >>>>>>> > w.r.t.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the following questions: a)
> > > Where
> > > > is
> > > > > > it
> > > > > > > >>>> used?
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> b)
> > > > > > > >>>> > > >> > > What
> > > > > > > >>>> > > >> > > >>>>>>> is the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> expected
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not
> > > > > > straight-forward
> > > > > > > >>>> to
> > > > > > > >>>> > > >> provide a
> > > > > > > >>>> > > >> > > >>>>>>> meaningful
> > > > > > > >>>> > > >> > > >>>>>>> > > >> output,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader
> supports
> > > > split
> > > > > > > >>>> alignment
> > > > > > > >>>> > but
> > > > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > not.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> This
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because with the current
> > > > > > implementation,
> > > > > > > >>>> we can
> > > > > > > >>>> > > >> > > determine
> > > > > > > >>>> > > >> > > >>>>>>> whether
> > > > > > > >>>> > > >> > > >>>>>>> > > >> split
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported
> > > only
> > > > > > during
> > > > > > > >>>> > runtime
> > > > > > > >>>> > > >> and
> > > > > > > >>>> > > >> > > >>>>>>> specifically
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > actually
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > only when calling
> alignSplits
> > > down
> > > > > the
> > > > > > > >>>> call
> > > > > > > >>>> > > >> hierarchy
> > > > > > > >>>> > > >> > up
> > > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > > >>>> > > >> > > >>>>>>> > > actual
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest
> to
> > > > either
> > > > > > > >>>> raise an
> > > > > > > >>>> > > >> error or
> > > > > > > >>>> > > >> > > >>>>>>> warning if
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not
> > > > > supported
> > > > > > at
> > > > > > > >>>> some
> > > > > > > >>>> > > >> point. I
> > > > > > > >>>> > > >> > > >>>>>>> know we
> > > > > > > >>>> > > >> > > >>>>>>> > > should
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > carefully think about when
> > this
> > > > > could
> > > > > > be
> > > > > > > >>>> the
> > > > > > > >>>> > case
> > > > > > > >>>> > > >> > > because
> > > > > > > >>>> > > >> > > >>>>>>> we don't
> > > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such
> > > warnings.
> > > > > > > However,
> > > > > > > >>>> > warnings
> > > > > > > >>>> > > >> > > could
> > > > > > > >>>> > > >> > > >>>>>>> be an
> > > > > > > >>>> > > >> > > >>>>>>> > > >> indicator
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > for the user that for
> > watermark
> > > > > split
> > > > > > > >>>> > alignment use
> > > > > > > >>>> > > >> > case
> > > > > > > >>>> > > >> > > >>>>>>> split
> > > > > > > >>>> > > >> > > >>>>>>> > > >> reading is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the
> conclusion
> > > to
> > > > > > either
> > > > > > > >>>> > disable
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> trigger for
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > watermark
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B)
> or
> > > to
> > > > > > > >>>> > use/implement a
> > > > > > > >>>> > > >> > source
> > > > > > > >>>> > > >> > > >>>>>>> and
> > > > > > > >>>> > > >> > > >>>>>>> > reader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > fully supports split
> > alignment.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design
> > > interfaces?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the
> > > > > discussion
> > > > > > > >>>> with the
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > > various
> > > > > > > >>>> > > >> > > >>>>>>> > > >> possibilities
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion
> and
> > > > > > emails, I
> > > > > > > >>>> would
> > > > > > > >>>> > > >> like to
> > > > > > > >>>> > > >> > > >>>>>>> summarize
> > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be
> > > consistent
> > > > > > > >>>> > ("symmetric"),
> > > > > > > >>>> > > >> i.e.,
> > > > > > > >>>> > > >> > > >>>>>>> similar
> > > > > > > >>>> > > >> > > >>>>>>> > > >> semantics
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > should have similar
> interfaces
> > > > with
> > > > > > > >>>> similar
> > > > > > > >>>> > usage.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which
> > > > > implementations
> > > > > > > >>>> implement
> > > > > > > >>>> > > >> > > >>>>>>> interfaces/support
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > behavior.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are
> default
> > > > > > > >>>> implementations
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> how
> > > > > > > >>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired
> > > behavior.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of
> > the
> > > > > > > relations
> > > > > > > >>>> > between
> > > > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > > > >>>> > > >> > > >>>>>>> > classes
> > > > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> > > > > > > >>>> SourceReaderBase
> > > > > > > >>>> > > >> (Internal)
> > > > > > > >>>> > > >> > > >>>>>>> <|-- ..
> > > > > > > >>>> > > >> > > >>>>>>> > <|--
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>--
> > > > > > SplitFetcherManager
> > > > > > > >>>> > (Internal)
> > > > > > > >>>> > > >> > <>--
> > > > > > > >>>> > > >> > > >>>>>>> > SplitFetcher
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader
> > > > (Public)
> > > > > > > <|--
> > > > > > > >>>> > > >> > MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from
> A;
> > A
> > > > <>--
> > > > > > B:
> > > > > > > A
> > > > > > > >>>> "has
> > > > > > > >>>> > a"
> > > > > > > >>>> > > >> B)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase
> and
> > > > > > > >>>> > SplitFetcherManager
> > > > > > > >>>> > > >> > > >>>>>>> implement most
> > > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment
> > just
> > > > > > because
> > > > > > > >>>> we
> > > > > > > >>>> > wanted
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > > >>>> > > >> > > >>>>>>> > > split
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to
> be
> > > > > > available
> > > > > > > as
> > > > > > > >>>> > kind of
> > > > > > > >>>> > > >> a
> > > > > > > >>>> > > >> > > >>>>>>> default. As a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a
> > "default
> > > > > > > >>>> > implementation" for
> > > > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic
> > for
> > > > > split
> > > > > > > >>>> > alignment.
> > > > > > > >>>> > > >> For
> > > > > > > >>>> > > >> > > that
> > > > > > > >>>> > > >> > > >>>>>>> reason,
> > > > > > > >>>> > > >> > > >>>>>>> > I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> find
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a
> NOOP
> > > > > default
> > > > > > > >>>> > > >> implementation
> > > > > > > >>>> > > >> > in
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> for
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a
> > > > consequence,
> > > > > > > >>>> interface
> > > > > > > >>>> > > >> strategy
> > > > > > > >>>> > > >> > > c)
> > > > > > > >>>> > > >> > > >>>>>>> is
> > > > > > > >>>> > > >> > > >>>>>>> > > difficult
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because this would require
> > NOOP
> > > > > > default
> > > > > > > >>>> > > >> > implementations
> > > > > > > >>>> > > >> > > >>>>>>> in the
> > > > > > > >>>> > > >> > > >>>>>>> > > public
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader
> and
> > > > > > > >>>> SplitReader.
> > > > > > > >>>> > This is
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> same for
> > > > > > > >>>> > > >> > > >>>>>>> > > >> strategy
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > d) because it would require
> > NOOP
> > > > > > default
> > > > > > > >>>> > > >> > implementation
> > > > > > > >>>> > > >> > > >>>>>>> in the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as
> > Dawid
> > > > > > > described
> > > > > > > >>>> > method
> > > > > > > >>>> > > >> > > >>>>>>> signatures of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> alignSplit
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and
> > SplitReader
> > > > > > differ
> > > > > > > >>>> and it
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> extremely
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > difficult
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the
> > same
> > > > > (with
> > > > > > > even
> > > > > > > >>>> > > >> potential
> > > > > > > >>>> > > >> > > >>>>>>> performance
> > > > > > > >>>> > > >> > > >>>>>>> > > >> impact
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > because of additional
> loop-ups
> > > of
> > > > > > split
> > > > > > > >>>> ids).
> > > > > > > >>>> > > >> > Therefore,
> > > > > > > >>>> > > >> > > >>>>>>> having a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > symmetric
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of
> > > > strategy
> > > > > a)
> > > > > > > is
> > > > > > > >>>> > actually
> > > > > > > >>>> > > >> not
> > > > > > > >>>> > > >> > > >>>>>>> possible
> > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > having
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces
> with
> > > > > > different
> > > > > > > >>>> method
> > > > > > > >>>> > > >> > > >>>>>>> signatures is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> confusing.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > My
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are
> best
> > > > with
> > > > > > > >>>> strategy b)
> > > > > > > >>>> > > >> which
> > > > > > > >>>> > > >> > > >>>>>>> means to
> > > > > > > >>>> > > >> > > >>>>>>> > have
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces
> > that
> > > > > > inherit
> > > > > > > >>>> from
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > parent
> > > > > > > >>>> > > >> > > >>>>>>> > interface:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|--
> > > > > AlignedSourceReader,
> > > > > > > >>>> > SplitReader
> > > > > > > >>>> > > >> > <|--
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not
> 100%
> > > > sure
> > > > > > what
> > > > > > > >>>> the
> > > > > > > >>>> > > >> > > implications
> > > > > > > >>>> > > >> > > >>>>>>> are and
> > > > > > > >>>> > > >> > > >>>>>>> > if
> > > > > > > >>>> > > >> > > >>>>>>> > > >> this
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would
> > suggest
> > > > > that
> > > > > > > >>>> Dawid
> > > > > > > >>>> > and I
> > > > > > > >>>> > > >> just
> > > > > > > >>>> > > >> > > >>>>>>> try to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> implement
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Summary
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me
> > > know
> > > > > your
> > > > > > > >>>> > > >> perspectives.
> > > > > > > >>>> > > >> > > >>>>>>> Please
> > > > > > > >>>> > > >> > > >>>>>>> > correct
> > > > > > > >>>> > > >> > > >>>>>>> > > >> me,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > if
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I
> > > > > > misunderstood
> > > > > > > >>>> > > >> something. My
> > > > > > > >>>> > > >> > > >>>>>>> perspective
> > > > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > be:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable
> > > watermark
> > > > > > > trigger
> > > > > > > >>>> for
> > > > > > > >>>> > split
> > > > > > > >>>> > > >> > > >>>>>>> alignment)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55
> > AM
> > > > > Dawid
> > > > > > > >>>> > Wysakowicz <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > > > capability
> > > > > > > >>>> > approach, I
> > > > > > > >>>> > > >> > fear
> > > > > > > >>>> > > >> > > >>>>>>> that it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use
> case:
> > > > > Sources
> > > > > > > can
> > > > > > > >>>> > always
> > > > > > > >>>> > > >> be
> > > > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about
> intra-task
> > > > > > > alignment.
> > > > > > > >>>> So
> > > > > > > >>>> > it's
> > > > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group
> even
> > > > though
> > > > > > > they
> > > > > > > >>>> do
> > > > > > > >>>> > not
> > > > > > > >>>> > > >> use
> > > > > > > >>>> > > >> > any
> > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They
> should
> > > just
> > > > > > > issue a
> > > > > > > >>>> > > >> warning, if
> > > > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> > > section).
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX"
> > > method
> > > > > > would
> > > > > > > >>>> be for
> > > > > > > >>>> > > >> telling
> > > > > > > >>>> > > >> > > if
> > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > >>>> > > >> > > >>>>>>> > > supports
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment.
> > > Cross-task
> > > > > > > >>>> alignment
> > > > > > > >>>> > would
> > > > > > > >>>> > > >> > always
> > > > > > > >>>> > > >> > > >>>>>>> be
> > > > > > > >>>> > > >> > > >>>>>>> > > supported.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to
> what
> > I
> > > > > > believe
> > > > > > > >>>> to be
> > > > > > > >>>> > > >> closest
> > > > > > > >>>> > > >> > > to a
> > > > > > > >>>> > > >> > > >>>>>>> > consensus
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> between all participants.
> Do
> > > you
> > > > > mind
> > > > > > > >>>> taking a
> > > > > > > >>>> > > >> look?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind
> > > addressing
> > > > > the
> > > > > > > >>>> nits?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid
> > > Heise
> > > > > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this
> > effort.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of
> > > 1b).
> > > > I
> > > > > > > fully
> > > > > > > >>>> agree
> > > > > > > >>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> decorator
> > > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm
> > also
> > > > not
> > > > > a
> > > > > > > big
> > > > > > > >>>> fan
> > > > > > > >>>> > of
> > > > > > > >>>> > > >> > > >>>>>>> overloading the
> > > > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard
> to
> > > > > > implement
> > > > > > > as
> > > > > > > >>>> > is). The
> > > > > > > >>>> > > >> > > usual
> > > > > > > >>>> > > >> > > >>>>>>> feedback
> > > > > > > >>>> > > >> > > >>>>>>> > > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces
> are
> > > > > always
> > > > > > > that
> > > > > > > >>>> > they are
> > > > > > > >>>> > > >> > > >>>>>>> overwhelming
> > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> too
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However,
> > I'd
> > > > > also
> > > > > > > not
> > > > > > > >>>> > oppose
> > > > > > > >>>> > > >> 1c)
> > > > > > > >>>> > > >> > as
> > > > > > > >>>> > > >> > > >>>>>>> scattered
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd
> just
> > > > > dislike
> > > > > > > >>>> 1a) and
> > > > > > > >>>> > 1d).
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > > > capability
> > > > > > > >>>> > approach, I
> > > > > > > >>>> > > >> > fear
> > > > > > > >>>> > > >> > > >>>>>>> that it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use
> case:
> > > > > Sources
> > > > > > > can
> > > > > > > >>>> > always
> > > > > > > >>>> > > >> be
> > > > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about
> intra-task
> > > > > > > alignment.
> > > > > > > >>>> So
> > > > > > > >>>> > it's
> > > > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group
> even
> > > > though
> > > > > > > they
> > > > > > > >>>> do
> > > > > > > >>>> > not
> > > > > > > >>>> > > >> use
> > > > > > > >>>> > > >> > any
> > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They
> should
> > > just
> > > > > > > issue a
> > > > > > > >>>> > > >> warning, if
> > > > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> > > section).
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think renaming
> alignSplits
> > to
> > > > > > > >>>> facilitate
> > > > > > > >>>> > future
> > > > > > > >>>> > > >> use
> > > > > > > >>>> > > >> > > >>>>>>> cases makes
> > > > > > > >>>> > > >> > > >>>>>>> > > >> sense
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > but
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c)
> > is
> > > > > > chosen)
> > > > > > > >>>> should
> > > > > > > >>>> > be
> > > > > > > >>>> > > >> > > adjusted
> > > > > > > >>>> > > >> > > >>>>>>> > > >> accordingly.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could
> be
> > > > > > > >>>> > PausingSourceReader
> > > > > > > >>>> > > >> and
> > > > > > > >>>> > > >> > > I'd
> > > > > > > >>>> > > >> > > >>>>>>> go for
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits
> (Becket's
> > > > > > proposal
> > > > > > > >>>> > afaik). We
> > > > > > > >>>> > > >> > could
> > > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > > >>>> > > >> > > >>>>>>> > split
> > > > > > > >>>> > > >> > > >>>>>>> > > it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > into
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit.
> > > While
> > > > > > > >>>> > > >> pauseOrResumeSplits
> > > > > > > >>>> > > >> > > >>>>>>> may allow
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Sources
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2
> > library
> > > > > calls
> > > > > > > (as
> > > > > > > >>>> > written
> > > > > > > >>>> > > >> in
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > Javadoc),
> > > > > > > >>>> > > >> > > >>>>>>> > > >> both
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use
> it
> > > and
> > > > > I'm
> > > > > > > not
> > > > > > > >>>> > sure if
> > > > > > > >>>> > > >> > there
> > > > > > > >>>> > > >> > > >>>>>>> is a
> > > > > > > >>>> > > >> > > >>>>>>> > system
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> can.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop"
> with
> > > > > "pause".
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it
> > in
> > > > the
> > > > > > > >>>> capability
> > > > > > > >>>> > > >> > section:
> > > > > > > >>>> > > >> > > >>>>>>> Sources
> > > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > adopt
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be
> used
> > > in
> > > > > > > earlier
> > > > > > > >>>> > > >> versions. So
> > > > > > > >>>> > > >> > > it
> > > > > > > >>>> > > >> > > >>>>>>> feels
> > > > > > > >>>> > > >> > > >>>>>>> > like
> > > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > are
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible
> (old
> > > > > sources
> > > > > > > can
> > > > > > > >>>> be
> > > > > > > >>>> > used
> > > > > > > >>>> > > >> > after
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > change);
> > > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any
> API
> > > > > > addition.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add
> what
> > > > > happens
> > > > > > > >>>> when all
> > > > > > > >>>> > > >> splits
> > > > > > > >>>> > > >> > > are
> > > > > > > >>>> > > >> > > >>>>>>> paused.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe
> > how
> > > > the
> > > > > 3
> > > > > > > >>>> flavors
> > > > > > > >>>> > of
> > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > interact
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes
> > > sense
> > > > to
> > > > > > > >>>> include
> > > > > > > >>>> > Kafka
> > > > > > > >>>> > > >> and
> > > > > > > >>>> > > >> > > >>>>>>> Pulsar in
> > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > FLIP.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather
> > > immediate
> > > > > > > >>>> follow-up
> > > > > > > >>>> > work.
> > > > > > > >>>> > > >> > (could
> > > > > > > >>>> > > >> > > >>>>>>> be in the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at
> 12:52
> > > PM
> > > > > > Dawid
> > > > > > > >>>> > Wysakowicz
> > > > > > > >>>> > > >> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader
> implements
> > > > > > > >>>> SourceReader,
> > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader
> > implements
> > > > > > > >>>> SplitReader,
> > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader
> implements
> > > > > > > >>>> > AlignedSourceReader"
> > > > > > > >>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements
> > AlignedSplitReader",
> > > > or
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader
> implements
> > > > > > > >>>> SourceReader"
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > > > according
> > > > > > > to
> > > > > > > >>>> Dawid
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > > be:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader
> implements
> > > > > > > >>>> SourceReader"
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> > > although
> > > > > > > >>>> personally
> > > > > > > >>>> > > >> > speaking I
> > > > > > > >>>> > > >> > > >>>>>>> think it
> > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to
> > the
> > > > > split
> > > > > > > >>>> reader
> > > > > > > >>>> > with
> > > > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea
> > to
> > > > have
> > > > > > it
> > > > > > > >>>> > aligned as
> > > > > > > >>>> > > >> > much
> > > > > > > >>>> > > >> > > as
> > > > > > > >>>> > > >> > > >>>>>>> > possible.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> I'd
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We
> can
> > > > merge
> > > > > > > >>>> > > >> > AlignedSplitReader
> > > > > > > >>>> > > >> > > >>>>>>> with
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP
> shortly.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket
> > Qin
> > > > > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment,
> Jark.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used
> to
> > > > align
> > > > > > > other
> > > > > > > >>>> > things in
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> future?
> > > > > > > >>>> > > >> > > >>>>>>> > For
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > example,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting
> global
> > > rate
> > > > > > > >>>> limits. This
> > > > > > > >>>> > > >> > feature
> > > > > > > >>>> > > >> > > >>>>>>> may also
> > > > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> > this
> > > > > > > >>>> interface to
> > > > > > > >>>> > > >> support
> > > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > things, I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly
> declaring
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > > such
> > > > as
> > > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is a good point.
> Naming
> > > > wise,
> > > > > it
> > > > > > > >>>> would
> > > > > > > >>>> > > >> usually
> > > > > > > >>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > >>>> > > >> > > >>>>>>> > > >> extensible
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just describe what the
> method
> > > > > > actually
> > > > > > > >>>> does,
> > > > > > > >>>> > > >> instead
> > > > > > > >>>> > > >> > of
> > > > > > > >>>> > > >> > > >>>>>>> assuming
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For
> > > > example,
> > > > > > in
> > > > > > > >>>> this
> > > > > > > >>>> > case,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible
> > > because
> > > > > this
> > > > > > > >>>> can be
> > > > > > > >>>> > used
> > > > > > > >>>> > > >> for
> > > > > > > >>>> > > >> > > >>>>>>> any kind of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> flow
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark
> > > > alignment
> > > > > or
> > > > > > > >>>> simple
> > > > > > > >>>> > rate
> > > > > > > >>>> > > >> > > >>>>>>> limiting.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong
> opinion
> > > on
> > > > > > this.
> > > > > > > I
> > > > > > > >>>> think
> > > > > > > >>>> > > >> they
> > > > > > > >>>> > > >> > > have
> > > > > > > >>>> > > >> > > >>>>>>> their
> > > > > > > >>>> > > >> > > >>>>>>> > own
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily
> use
> > > > > > Interfaces
> > > > > > > >>>> for
> > > > > > > >>>> > > >> extending
> > > > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > > > TableSource/TableSink,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces
> > rather
> > > > than
> > > > > > > >>>> methods in
> > > > > > > >>>> > > >> this
> > > > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > > > >>>> > > >> > > >>>>>>> > you
> > > > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > > > ability
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > > > Interfaces
> > > > > > > can
> > > > > > > >>>> help
> > > > > > > >>>> > to
> > > > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you
> > want
> > > > to
> > > > > > have
> > > > > > > >>>> an
> > > > > > > >>>> > > >> ability.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative
> > > > interfaces
> > > > > if
> > > > > > > >>>> this is
> > > > > > > >>>> > a
> > > > > > > >>>> > > >> > general
> > > > > > > >>>> > > >> > > >>>>>>> design
> > > > > > > >>>> > > >> > > >>>>>>> > > >> pattern
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the other components in
> > Flink.
> > > > But
> > > > > it
> > > > > > > >>>> looks
> > > > > > > >>>> > like
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> current API
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > proposal
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is
> > > > essentially
> > > > > > > >>>> > > >> "MySourceReader
> > > > > > > >>>> > > >> > > >>>>>>> implements
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader,
> > > > WithSplitsAlignment",
> > > > > > > >>>> along with
> > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> implements
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API
> > > symmetric?
> > > > > I'd
> > > > > > > >>>> > consider any
> > > > > > > >>>> > > >> > one
> > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> following
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > as
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader
> implements
> > > > > > > >>>> SourceReader,
> > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader
> > implements
> > > > > > > >>>> SplitReader,
> > > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader
> implements
> > > > > > > >>>> > AlignedSourceReader"
> > > > > > > >>>> > > >> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements
> > AlignedSplitReader",
> > > > or
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader
> implements
> > > > > > > >>>> SourceReader"
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > > > according
> > > > > > > to
> > > > > > > >>>> Dawid
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > > be:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader
> implements
> > > > > > > >>>> SourceReader"
> > > > > > > >>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> > > although
> > > > > > > >>>> personally
> > > > > > > >>>> > > >> > speaking I
> > > > > > > >>>> > > >> > > >>>>>>> think it
> > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to
> > the
> > > > > split
> > > > > > > >>>> reader
> > > > > > > >>>> > with
> > > > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> > > > > > > >>>> dwysakowicz@apache.org> <
> > > > > > > >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > >>>> dwysakowicz@apache.org
> > > > > > > >>>> > >,
> > > > > > > >>>> > > >> > thanks
> > > > > > > >>>> > > >> > > >>>>>>> for the
> > > > > > > >>>> > > >> > > >>>>>>> > > reply.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I
> don't
> > > > have a
> > > > > > > >>>> preference
> > > > > > > >>>> > > >> and I
> > > > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > > > alignment
> > > > > I
> > > > > > am
> > > > > > > >>>> fine
> > > > > > > >>>> > > >> > following
> > > > > > > >>>> > > >> > > >>>>>>> your
> > > > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader
> > extending
> > > > > from
> > > > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > > > there,
> > > > > > not
> > > > > > > >>>> in the
> > > > > > > >>>> > > >> Source
> > > > > > > >>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the
> > > > > > "supportsXXX"
> > > > > > > >>>> in
> > > > > > > >>>> > Source
> > > > > > > >>>> > > >> is
> > > > > > > >>>> > > >> > > >>>>>>> that this
> > > > > > > >>>> > > >> > > >>>>>>> > > allows
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > some
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For
> > > example,
> > > > > if a
> > > > > > > >>>> user
> > > > > > > >>>> > enabled
> > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported
> by
> > > the
> > > > > > > Source,
> > > > > > > >>>> an
> > > > > > > >>>> > > >> exception
> > > > > > > >>>> > > >> > > >>>>>>> can be
> > > > > > > >>>> > > >> > > >>>>>>> > thrown
> > > > > > > >>>> > > >> > > >>>>>>> > > >> at
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in
> > > general
> > > > > > > >>>> useful. That
> > > > > > > >>>> > > >> said,
> > > > > > > >>>> > > >> > I
> > > > > > > >>>> > > >> > > >>>>>>> agree that
> > > > > > > >>>> > > >> > > >>>>>>> > > API
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is
> better
> > > to
> > > > > put
> > > > > > > the
> > > > > > > >>>> two
> > > > > > > >>>> > > >> methods
> > > > > > > >>>> > > >> > > >>>>>>> together.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at
> 5:56
> > PM
> > > > > Jark
> > > > > > > Wu <
> > > > > > > >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <
> > > imjark@gmail.com
> > > > >
> > > > > <
> > > > > > > >>>> > > >> imjark@gmail.com>
> > > > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the
> reminder
> > on
> > > > > > > FLIP-182.
> > > > > > > >>>> > Sorry I
> > > > > > > >>>> > > >> did
> > > > > > > >>>> > > >> > > >>>>>>> miss it.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns
> > > then.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at
> 15:40,
> > > > Dawid
> > > > > > > >>>> > Wysakowicz <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > >>>> > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework
> always
> > > > align
> > > > > > with
> > > > > > > >>>> > watermarks
> > > > > > > >>>> > > >> > when
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case
> > needs
> > > > > > > watermark
> > > > > > > >>>> > > >> alignment
> > > > > > > >>>> > > >> > > even
> > > > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > > > throughput
> > > > > > > >>>> somehow. I
> > > > > > > >>>> > > >> agree
> > > > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> > > method
> > > > > for
> > > > > > > >>>> users to
> > > > > > > >>>> > > >> > > configure
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the
> alignment.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > > > calculate
> > > > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > > > >>>> maxDesiredWatermark
> > > > > > > >>>> > will
> > > > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause
> and
> > > > > resume.
> > > > > > > Can
> > > > > > > >>>> users
> > > > > > > >>>> > > >> > > configure
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the
> > previous
> > > > > > FLIP[1]
> > > > > > > >>>> which
> > > > > > > >>>> > has
> > > > > > > >>>> > > >> > been
> > > > > > > >>>> > > >> > > >>>>>>> already
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must
> be
> > > > > enabled
> > > > > > > >>>> with the
> > > > > > > >>>> > > >> > > watermark
> > > > > > > >>>> > > >> > > >>>>>>> strategy
> > > > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and
> > > update
> > > > > > > >>>> interval.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> > this
> > > > > > > >>>> interface to
> > > > > > > >>>> > > >> support
> > > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly
> declaring
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > > such
> > > > as
> > > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On
> > the
> > > > > other
> > > > > > > >>>> hand
> > > > > > > >>>> > putting
> > > > > > > >>>> > > >> > all
> > > > > > > >>>> > > >> > > >>>>>>> methods,
> > > > > > > >>>> > > >> > > >>>>>>> > > even
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > with
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for
> > > > enabling
> > > > > > > >>>> certain
> > > > > > > >>>> > > >> features,
> > > > > > > >>>> > > >> > > >>>>>>> makes the
> > > > > > > >>>> > > >> > > >>>>>>> > > entry
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new
> > > > source
> > > > > > > >>>> higher.
> > > > > > > >>>> > > >> Instead of
> > > > > > > >>>> > > >> > > >>>>>>> focusing on
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> basic
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and required properties of
> > the
> > > > > > Source,
> > > > > > > >>>> the
> > > > > > > >>>> > person
> > > > > > > >>>> > > >> > > >>>>>>> implementing a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need
> to
> > > > figure
> > > > > > out
> > > > > > > >>>> what
> > > > > > > >>>> > all
> > > > > > > >>>> > > >> of
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> extra
> > > > > > > >>>> > > >> > > >>>>>>> > > >> features
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with
> > > them.
> > > > It
> > > > > > > >>>> makes it
> > > > > > > >>>> > also
> > > > > > > >>>> > > >> > > harder
> > > > > > > >>>> > > >> > > >>>>>>> to
> > > > > > > >>>> > > >> > > >>>>>>> > > organize
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups
> as
> > > Jark
> > > > > > said.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I
> don't
> > > > have a
> > > > > > > >>>> preference
> > > > > > > >>>> > > >> and I
> > > > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > > > alignment
> > > > > I
> > > > > > am
> > > > > > > >>>> fine
> > > > > > > >>>> > > >> > following
> > > > > > > >>>> > > >> > > >>>>>>> your
> > > > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader
> > extending
> > > > > from
> > > > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > > > there,
> > > > > > not
> > > > > > > >>>> in the
> > > > > > > >>>> > > >> Source
> > > > > > > >>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is
> really
> > > > > > > unfortunate
> > > > > > > >>>> the
> > > > > > > >>>> > > >> > > >>>>>>> "alignSplits"
> > > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > differ
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader
> and
> > > > > > > >>>> SpitReader. The
> > > > > > > >>>> > > >> reason
> > > > > > > >>>> > > >> > > for
> > > > > > > >>>> > > >> > > >>>>>>> that is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only
> > > with
> > > > > > > >>>> SplitIds,
> > > > > > > >>>> > whereas
> > > > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > needs
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause
> them.
> > I
> > > > > found
> > > > > > > the
> > > > > > > >>>> > > >> discrepancy
> > > > > > > >>>> > > >> > > >>>>>>> acceptable
> > > > > > > >>>> > > >> > > >>>>>>> > for
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> > > > > > > >>>> significantly,
> > > > > > > >>>> > > >> especially
> > > > > > > >>>> > > >> > > as
> > > > > > > >>>> > > >> > > >>>>>>> they
> > > > > > > >>>> > > >> > > >>>>>>> > would
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> highly
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance
> as
> > we
> > > > > would
> > > > > > > >>>> have to
> > > > > > > >>>> > > >> perform
> > > > > > > >>>> > > >> > > >>>>>>> additional
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > lookups.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader
> is a
> > > > > > secondary
> > > > > > > >>>> > interface.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > > >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark
> Wu
> > > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort,
> Dawid
> > > and
> > > > > > > >>>> Sebastian!
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor
> > > questions
> > > > > > > (maybe I
> > > > > > > >>>> > missed
> > > > > > > >>>> > > >> > > >>>>>>> something).
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework
> always
> > > > align
> > > > > > with
> > > > > > > >>>> > watermarks
> > > > > > > >>>> > > >> > when
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case
> > needs
> > > > > > > watermark
> > > > > > > >>>> > > >> alignment
> > > > > > > >>>> > > >> > > even
> > > > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > > > throughput
> > > > > > > >>>> somehow. I
> > > > > > > >>>> > > >> agree
> > > > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> > > method
> > > > > for
> > > > > > > >>>> users to
> > > > > > > >>>> > > >> > > configure
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the
> alignment.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > > > calculate
> > > > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > > > >>>> maxDesiredWatermark
> > > > > > > >>>> > will
> > > > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause
> and
> > > > > resume.
> > > > > > > Can
> > > > > > > >>>> users
> > > > > > > >>>> > > >> > > configure
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used
> to
> > > > align
> > > > > > > other
> > > > > > > >>>> > things in
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> future?
> > > > > > > >>>> > > >> > > >>>>>>> > For
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> example,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting
> global
> > > rate
> > > > > > > >>>> limits. This
> > > > > > > >>>> > > >> > feature
> > > > > > > >>>> > > >> > > >>>>>>> may also
> > > > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> > this
> > > > > > > >>>> interface to
> > > > > > > >>>> > > >> support
> > > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly
> declaring
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > > such
> > > > as
> > > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong
> opinion
> > > on
> > > > > > this.
> > > > > > > I
> > > > > > > >>>> think
> > > > > > > >>>> > > >> they
> > > > > > > >>>> > > >> > > have
> > > > > > > >>>> > > >> > > >>>>>>> their
> > > > > > > >>>> > > >> > > >>>>>>> > own
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily
> use
> > > > > > Interfaces
> > > > > > > >>>> for
> > > > > > > >>>> > > >> extending
> > > > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > > > TableSource/TableSink,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces
> > rather
> > > > than
> > > > > > > >>>> methods in
> > > > > > > >>>> > > >> this
> > > > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > > > >>>> > > >> > > >>>>>>> > you
> > > > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > > > ability
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > > > Interfaces
> > > > > > > can
> > > > > > > >>>> help
> > > > > > > >>>> > to
> > > > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you
> > want
> > > > to
> > > > > > have
> > > > > > > >>>> an
> > > > > > > >>>> > > >> ability.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at
> 18:13,
> > > > > Becket
> > > > > > > Qin
> > > > > > > >>>> <
> > > > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> > > > > > > becket.qin@gmail.com>
> > > > > > > >>>> <
> > > > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com>
> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation.
> > > > > Apologies
> > > > > > > >>>> that I
> > > > > > > >>>> > > >> somehow
> > > > > > > >>>> > > >> > > >>>>>>> misread a
> > > > > > > >>>> > > >> > > >>>>>>> > > bunch
> > > > > > > >>>> > > >> > > >>>>>>> > > >> of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they
> were
> > > > > > "assign".
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default
> > > > > > implementation,
> > > > > > > >>>> I was
> > > > > > > >>>> > > >> > thinking
> > > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a
> little
> > > > > worried
> > > > > > > >>>> about
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> proliferation of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the
> most
> > > > > > important
> > > > > > > >>>> thing
> > > > > > > >>>> > about
> > > > > > > >>>> > > >> > > >>>>>>> interfaces is
> > > > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> they
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In
> > this
> > > > > > case, I
> > > > > > > >>>> prefer
> > > > > > > >>>> > > >> adding
> > > > > > > >>>> > > >> > > >>>>>>> new method
> > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the
> > > > > following
> > > > > > > >>>> reasons:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest
> > drawback
> > > of
> > > > > > > >>>> decorative
> > > > > > > >>>> > > >> > interfaces
> > > > > > > >>>> > > >> > > >>>>>>> is which
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> > > > > > > combinations
> > > > > > > >>>> of
> > > > > > > >>>> > > >> multiple
> > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current
> > FLIP,
> > > > the
> > > > > > > >>>> > > >> > withSplitsAlignment
> > > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > only
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> applicable to the
> > SourceReader
> > > > > which
> > > > > > > >>>> means it
> > > > > > > >>>> > > >> can't
> > > > > > > >>>> > > >> > > >>>>>>> decorate any
> > > > > > > >>>> > > >> > > >>>>>>> > > >> other
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface. From an
> interface
> > > > design
> > > > > > > >>>> > perspective, a
> > > > > > > >>>> > > >> > > >>>>>>> natural
> > > > > > > >>>> > > >> > > >>>>>>> > question
> > > > > > > >>>> > > >> > > >>>>>>> > > >> is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> why
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> not let
> "AlignedSplitReader"
> > > > extend
> > > > > > > >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > also
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a
> > split
> > > > > reader
> > > > > > > >>>> > implementing
> > > > > > > >>>> > > >> > both
> > > > > > > >>>> > > >> > > >>>>>>> > SplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would
> > work,
> > > > > > because
> > > > > > > a
> > > > > > > >>>> > source
> > > > > > > >>>> > > >> > reader
> > > > > > > >>>> > > >> > > >>>>>>> > implementing
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and
> > > > > withSplitsAlignment
> > > > > > > >>>> works.
> > > > > > > >>>> > So why
> > > > > > > >>>> > > >> > > isn't
> > > > > > > >>>> > > >> > > >>>>>>> there an
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In
> > the
> > > > > > future,
> > > > > > > if
> > > > > > > >>>> > there
> > > > > > > >>>> > > >> is a
> > > > > > > >>>> > > >> > > new
> > > > > > > >>>> > > >> > > >>>>>>> feature
> > > > > > > >>>> > > >> > > >>>>>>> > > >> added
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or
> > pre-partitioned
> > > > > data
> > > > > > > >>>> aware),
> > > > > > > >>>> > are
> > > > > > > >>>> > > >> we
> > > > > > > >>>> > > >> > > >>>>>>> going to
> > > > > > > >>>> > > >> > > >>>>>>> > create
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> another
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader
> such
> > > as
> > > > > > > >>>> > > >> SortedSplitReader or
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I
> > > think
> > > > > the
> > > > > > > >>>> > additional
> > > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > like
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment
> actually
> > > > > > increases
> > > > > > > >>>> the
> > > > > > > >>>> > > >> > > understanding
> > > > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> users
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> because they have to know
> > what
> > > > > > > decorative
> > > > > > > >>>> > > >> interfaces
> > > > > > > >>>> > > >> > > are
> > > > > > > >>>> > > >> > > >>>>>>> there,
> > > > > > > >>>> > > >> > > >>>>>>> > > which
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate
> > and
> > > > > which
> > > > > > > >>>> > > >> combinations of
> > > > > > > >>>> > > >> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and
> > which
> > > > are
> > > > > > not.
> > > > > > > >>>> > Ideally we
> > > > > > > >>>> > > >> > want
> > > > > > > >>>> > > >> > > >>>>>>> to avoid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > To
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing
> > > > having
> > > > > an
> > > > > > > >>>> > interface of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it
> > as
> > > an
> > > > > > > >>>> internal
> > > > > > > >>>> > > >> interface
> > > > > > > >>>> > > >> > > and
> > > > > > > >>>> > > >> > > >>>>>>> let
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend
> > it.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to
> the
> > > > > > > >>>> SourceReader
> > > > > > > >>>> > with a
> > > > > > > >>>> > > >> > > default
> > > > > > > >>>> > > >> > > >>>>>>> > > >> implementation
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid
> logic
> > > > > > branching
> > > > > > > >>>> in the
> > > > > > > >>>> > > >> source
> > > > > > > >>>> > > >> > > >>>>>>> logic,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> especially
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that
> the
> > > vast
> > > > > > > >>>> majority of
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not
> all,
> > > > would
> > > > > > just
> > > > > > > >>>> extend
> > > > > > > >>>> > > >> from
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new
> > method
> > > to
> > > > > the
> > > > > > > >>>> > interface
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > > >>>>>>> effectively
> > > > > > > >>>> > > >> > > >>>>>>> > > >> give
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but
> > > > simpler.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) A related design
> principle
> > > > that
> > > > > > may
> > > > > > > be
> > > > > > > >>>> > worth
> > > > > > > >>>> > > >> > > >>>>>>> discussing is how
> > > > > > > >>>> > > >> > > >>>>>>> > > do
> > > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> let
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations
> > tell
> > > > > Flink
> > > > > > > >>>> what
> > > > > > > >>>> > > >> capability
> > > > > > > >>>> > > >> > > is
> > > > > > > >>>> > > >> > > >>>>>>> > supported
> > > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> what
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally
> speaking I
> > > > feel
> > > > > > the
> > > > > > > >>>> most
> > > > > > > >>>> > > >> intuitive
> > > > > > > >>>> > > >> > > >>>>>>> place to me
> > > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> in
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that
> > is
> > > > the
> > > > > > > >>>> entrance
> > > > > > > >>>> > of the
> > > > > > > >>>> > > >> > > >>>>>>> entire Source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> connector
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> logic.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Based on the above
> thoughts,
> > I
> > > am
> > > > > > > >>>> wondering
> > > > > > > >>>> > if the
> > > > > > > >>>> > > >> > > >>>>>>> following
> > > > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be easier to
> understand
> > > by
> > > > > the
> > > > > > > >>>> users.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Change
> > "withSplitsAlignment"
> > > to
> > > > > > > >>>> internal
> > > > > > > >>>> > > >> interface,
> > > > > > > >>>> > > >> > > >>>>>>> let both
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from
> > it,
> > > > > with
> > > > > > a
> > > > > > > >>>> default
> > > > > > > >>>> > > >> no-op
> > > > > > > >>>> > > >> > > >>>>>>> > > implementation.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > > > > > > >>>> > > >> supportSplitsAlignment()"
> > > > > > > >>>> > > >> > > to
> > > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > > >>>> > > >> > > >>>>>>> > Source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> > > > > > > implementation
> > > > > > > >>>> > returning
> > > > > > > >>>> > > >> > > >>>>>>> false. Sources
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment
> > logic
> > > > can
> > > > > > > >>>> change
> > > > > > > >>>> > this to
> > > > > > > >>>> > > >> > > >>>>>>> return true,
> > > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits()
> > > > methods
> > > > > in
> > > > > > > the
> > > > > > > >>>> > > >> > SourceReader
> > > > > > > >>>> > > >> > > /
> > > > > > > >>>> > > >> > > >>>>>>> > > SplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> if
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> needed.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new
> > > > optional
> > > > > > > >>>> feature is
> > > > > > > >>>> > > >> going
> > > > > > > >>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> be added
> > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature
> > > requires
> > > > > the
> > > > > > > >>>> > awareness
> > > > > > > >>>> > > >> from
> > > > > > > >>>> > > >> > > >>>>>>> Flink, we
> > > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> more
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at
> 4:05
> > PM
> > > > > Dawid
> > > > > > > >>>> > Wysakowicz <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > >>>> > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > > > > > > >>>> > dwysakowicz@apache.org>
> > > > > > > >>>> > > >> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > > > >>>> `AlignedSplitReader`
> > > > > > > >>>> > > >> > > interface
> > > > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> stop &
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > > > implemented
> > > > > > > for
> > > > > > > >>>> > Kafka and
> > > > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> correct?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know
> > > though,
> > > > > > those
> > > > > > > >>>> are
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> only
> > > > > > > >>>> > > >> > > >>>>>>> sources
> > > > > > > >>>> > > >> > > >>>>>>> > which
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> consume
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple
> > > splits
> > > > > and
> > > > > > > >>>> thus
> > > > > > > >>>> > > >> alignment
> > > > > > > >>>> > > >> > > >>>>>>> applies.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting"
> > split
> > > > > > > alignment
> > > > > > > >>>> in
> > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> > > does
> > > > > not
> > > > > > > >>>> implement
> > > > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > > > WithSplitsAlignment
> > > > > > > >>>> can
> > > > > > > >>>> > > >> instead
> > > > > > > >>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> added to
> > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader
> (i.e.
> > > > > > > >>>> > KafkaSourceReader) to
> > > > > > > >>>> > > >> > make
> > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually
> supports
> > > it.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern.
> > > Hmm, I
> > > > > > think
> > > > > > > >>>> we
> > > > > > > >>>> > could
> > > > > > > >>>> > > >> > > >>>>>>> actually do
> > > > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> Given
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation
> of
> > > the
> > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> rather
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to
> the
> > > > > > > >>>> corresponding
> > > > > > > >>>> > method
> > > > > > > >>>> > > >> of
> > > > > > > >>>> > > >> > > >>>>>>> > SplitFetcher),
> > > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the
> > > > actual
> > > > > > > source
> > > > > > > >>>> > > >> > > >>>>>>> implementations. This
> > > > > > > >>>> > > >> > > >>>>>>> > > >> solution
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has the downside though.
> > > Authors
> > > > of
> > > > > > new
> > > > > > > >>>> > sources
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > > >>>>>>> have to do
> > > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> things: extend from
> > > > > > AlignedSplitReader
> > > > > > > >>>> and
> > > > > > > >>>> > > >> implement
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> > > > > > > >>>> AlignedSplitReader.
> > > > > > > >>>> > I
> > > > > > > >>>> > > >> would
> > > > > > > >>>> > > >> > > be
> > > > > > > >>>> > > >> > > >>>>>>> fine
> > > > > > > >>>> > > >> > > >>>>>>> > with
> > > > > > > >>>> > > >> > > >>>>>>> > > >> such
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What
> others
> > > > think?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> > > motivation
> > > > > > > >>>> section, is
> > > > > > > >>>> > it
> > > > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > > > partitions
> > > > > > > >>>> and
> > > > > > > >>>> > one of
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> partition
> > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In
> this
> > > > > > > situation,
> > > > > > > >>>> > watermark
> > > > > > > >>>> > > >> > for
> > > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> > > reading
> > > > > > recent
> > > > > > > >>>> data
> > > > > > > >>>> > from
> > > > > > > >>>> > > >> > other
> > > > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> > > won't
> > > > > > hold
> > > > > > > >>>> back
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > overall
> > > > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that
> > we
> > > > may
> > > > > > have
> > > > > > > >>>> late
> > > > > > > >>>> > data
> > > > > > > >>>> > > >> for
> > > > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the
> > > watermark.
> > > > > > > >>>> Watermark of
> > > > > > > >>>> > an
> > > > > > > >>>> > > >> > > >>>>>>> operator is the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> minimum
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all
> > splits[1]
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question
> about
> > > the
> > > > > > > >>>> restart. Say
> > > > > > > >>>> > > >> split
> > > > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > > > completed.
> > > > > > job
> > > > > > > >>>> > failed and
> > > > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because
> alignment
> > > > > > decision
> > > > > > > >>>> is not
> > > > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> > > until
> > > > > we
> > > > > > > get
> > > > > > > >>>> a
> > > > > > > >>>> > cycle
> > > > > > > >>>> > > >> of
> > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > > > saying
> > > > > > this
> > > > > > > >>>> > corner is
> > > > > > > >>>> > > >> a
> > > > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is
> > correct.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark
> > alignment
> > > is
> > > > > > sort
> > > > > > > >>>> of a
> > > > > > > >>>> > > >> general
> > > > > > > >>>> > > >> > use
> > > > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related
> methods
> > to
> > > > > > > >>>> SourceReader
> > > > > > > >>>> > > >> directly
> > > > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new
> interface
> > > of
> > > > > > > >>>> > > >> WithSplitAssignment?
> > > > > > > >>>> > > >> > > We
> > > > > > > >>>> > > >> > > >>>>>>> can
> > > > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > > > backwards
> > > > > > > >>>> > > >> compatibility
> > > > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can
> provide
> > a
> > > > > > default
> > > > > > > >>>> > > >> > implementation.
> > > > > > > >>>> > > >> > > >>>>>>> How would
> > > > > > > >>>> > > >> > > >>>>>>> > we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> do
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a
> > no-op?
> > > > Is
> > > > > it
> > > > > > > >>>> better
> > > > > > > >>>> > than
> > > > > > > >>>> > > >> > > having
> > > > > > > >>>> > > >> > > >>>>>>> an
> > > > > > > >>>> > > >> > > >>>>>>> > opt-in
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface? The default
> > > > > implementation
> > > > > > > >>>> would
> > > > > > > >>>> > have
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> added
> > > > > > > >>>> > > >> > > >>>>>>> > > >> exclusively
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> in
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader
> > > > interface.
> > > > > By
> > > > > > > >>>> the way
> > > > > > > >>>> > > >> notice
> > > > > > > >>>> > > >> > > >>>>>>> > > SourceReaderBase
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> does extend from
> > > > > WithSplitsAlignment,
> > > > > > > so
> > > > > > > >>>> > > >> effectively
> > > > > > > >>>> > > >> > > all
> > > > > > > >>>> > > >> > > >>>>>>> > > >> implementations
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case.
> To
> > > be
> > > > > > > honest I
> > > > > > > >>>> > think
> > > > > > > >>>> > > >> it is
> > > > > > > >>>> > > >> > > >>>>>>> impossible
> > > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader
> > > > > interface
> > > > > > > >>>> directly
> > > > > > > >>>> > by
> > > > > > > >>>> > > >> end
> > > > > > > >>>> > > >> > > >>>>>>> users.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > > > SplitReader
> > > > > > > >>>> interface
> > > > > > > >>>> > > >> > probably
> > > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support
> throttling
> > at
> > > > the
> > > > > > > split
> > > > > > > >>>> > > >> > granularity.
> > > > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > > > >>>> > > >> > > >>>>>>> > add
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> > > public
> > > > > > > >>>> interface
> > > > > > > >>>> > > >> section as
> > > > > > > >>>> > > >> > > >>>>>>> well?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the
> > > > > beginning.
> > > > > > > See
> > > > > > > >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using
> > the
> > > > > method
> > > > > > > >>>> name
> > > > > > > >>>> > > >> > assignSplits
> > > > > > > >>>> > > >> > > >>>>>>> here,
> > > > > > > >>>> > > >> > > >>>>>>> > given
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing
> the
> > > > split
> > > > > > > >>>> > assignments? It
> > > > > > > >>>> > > >> > > seems
> > > > > > > >>>> > > >> > > >>>>>>> > something
> > > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> The method's called
> > > > *alignSplits*,
> > > > > > not
> > > > > > > >>>> > assign. Do
> > > > > > > >>>> > > >> you
> > > > > > > >>>> > > >> > > >>>>>>> still
> > > > > > > >>>> > > >> > > >>>>>>> > prefer
> > > > > > > >>>> > > >> > > >>>>>>> > > a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> different name for that?
> > > > > Personally,
> > > > > > I
> > > > > > > am
> > > > > > > >>>> > open for
> > > > > > > >>>> > > >> > > >>>>>>> suggestions
> > > > > > > >>>> > > >> > > >>>>>>> > > here.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >>
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket
> > Qin
> > > > > > wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the
> > effort,
> > > > > > > >>>> Sebastion. I
> > > > > > > >>>> > think
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> motivation
> > > > > > > >>>> > > >> > > >>>>>>> > > >> makes a
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few
> > > > > suggestions
> > > > > > /
> > > > > > > >>>> > questions.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark
> > alignment
> > > is
> > > > > > sort
> > > > > > > >>>> of a
> > > > > > > >>>> > > >> general
> > > > > > > >>>> > > >> > use
> > > > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related
> methods
> > to
> > > > > > > >>>> SourceReader
> > > > > > > >>>> > > >> directly
> > > > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new
> interface
> > > of
> > > > > > > >>>> > > >> WithSplitAssignment?
> > > > > > > >>>> > > >> > > We
> > > > > > > >>>> > > >> > > >>>>>>> can
> > > > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > > > backwards
> > > > > > > >>>> > > >> compatibility
> > > > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > > > SplitReader
> > > > > > > >>>> interface
> > > > > > > >>>> > > >> > probably
> > > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support
> throttling
> > at
> > > > the
> > > > > > > split
> > > > > > > >>>> > > >> > granularity.
> > > > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > > > >>>> > > >> > > >>>>>>> > add
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> > > public
> > > > > > > >>>> interface
> > > > > > > >>>> > > >> section as
> > > > > > > >>>> > > >> > > >>>>>>> well?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using
> > the
> > > > > method
> > > > > > > >>>> name
> > > > > > > >>>> > > >> > assignSplits
> > > > > > > >>>> > > >> > > >>>>>>> here,
> > > > > > > >>>> > > >> > > >>>>>>> > given
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing
> the
> > > > split
> > > > > > > >>>> > assignments? It
> > > > > > > >>>> > > >> > > seems
> > > > > > > >>>> > > >> > > >>>>>>> > something
> > > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> 11:39
> > > PM
> > > > > > Steven
> > > > > > > >>>> Wu <
> > > > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> > > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> > > > > > > stevenz3wu@gmail.com>
> > > > > > > >>>> <
> > > > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com>
> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> However, a single source
> > > operator
> > > > > may
> > > > > > > >>>> read
> > > > > > > >>>> > data
> > > > > > > >>>> > > >> from
> > > > > > > >>>> > > >> > > >>>>>>> multiple
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g.,
> > > multiple
> > > > > > Kafka
> > > > > > > >>>> > > >> partitions,
> > > > > > > >>>> > > >> > > such
> > > > > > > >>>> > > >> > > >>>>>>> that
> > > > > > > >>>> > > >> > > >>>>>>> > even
> > > > > > > >>>> > > >> > > >>>>>>> > > >> with
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the
> > source
> > > > > > operator
> > > > > > > >>>> may
> > > > > > > >>>> > need
> > > > > > > >>>> > > >> to
> > > > > > > >>>> > > >> > > >>>>>>> buffer
> > > > > > > >>>> > > >> > > >>>>>>> > > excessive
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> amount
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits
> > data
> > > > > > faster
> > > > > > > >>>> than
> > > > > > > >>>> > > >> another.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> > > motivation
> > > > > > > >>>> section, is
> > > > > > > >>>> > it
> > > > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > > > partitions
> > > > > > > >>>> and
> > > > > > > >>>> > one of
> > > > > > > >>>> > > >> the
> > > > > > > >>>> > > >> > > >>>>>>> partition
> > > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In
> this
> > > > > > > situation,
> > > > > > > >>>> > watermark
> > > > > > > >>>> > > >> > for
> > > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> > > reading
> > > > > > recent
> > > > > > > >>>> data
> > > > > > > >>>> > from
> > > > > > > >>>> > > >> > other
> > > > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> > > won't
> > > > > > hold
> > > > > > > >>>> back
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > overall
> > > > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that
> > we
> > > > may
> > > > > > have
> > > > > > > >>>> late
> > > > > > > >>>> > data
> > > > > > > >>>> > > >> for
> > > > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question
> about
> > > the
> > > > > > > >>>> restart. Say
> > > > > > > >>>> > > >> split
> > > > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > > > completed.
> > > > > > job
> > > > > > > >>>> > failed and
> > > > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because
> alignment
> > > > > > decision
> > > > > > > >>>> is not
> > > > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> > > until
> > > > > we
> > > > > > > get
> > > > > > > >>>> a
> > > > > > > >>>> > cycle
> > > > > > > >>>> > > >> of
> > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > > > saying
> > > > > > this
> > > > > > > >>>> > corner is
> > > > > > > >>>> > > >> a
> > > > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> 8:20
> > AM
> > > > > > Thomas
> > > > > > > >>>> Weise <
> > > > > > > >>>> > > >> > > >>>>>>> thw@apache.org> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <
> thw@apache.org
> > >
> > > <
> > > > > > > >>>> thw@apache.org>
> > > > > > > >>>> > <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting"
> > split
> > > > > > > alignment
> > > > > > > >>>> in
> > > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> > > does
> > > > > not
> > > > > > > >>>> implement
> > > > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > > > WithSplitsAlignment
> > > > > > > >>>> can
> > > > > > > >>>> > > >> instead
> > > > > > > >>>> > > >> > be
> > > > > > > >>>> > > >> > > >>>>>>> added to
> > > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader
> (i.e.
> > > > > > > >>>> > KafkaSourceReader) to
> > > > > > > >>>> > > >> > make
> > > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually
> supports
> > > it.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> 4:57
> > AM
> > > > > > > >>>> Konstantin
> > > > > > > >>>> > Knauf <
> > > > > > > >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <
> > > > knaufk@apache.org
> > > > > >
> > > > > > <
> > > > > > > >>>> > > >> > knaufk@apache.org
> > > > > > > >>>> > > >> > > >
> > > > > > > >>>> > > >> > > >>>>>>> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > > > >>>> `AlignedSplitReader`
> > > > > > > >>>> > > >> > > interface
> > > > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> stop
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> &
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > > > implemented
> > > > > > > for
> > > > > > > >>>> > Kafka and
> > > > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> correct?
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it
> > is
> > > > > > valuable
> > > > > > > >>>> to
> > > > > > > >>>> > > >> complete
> > > > > > > >>>> > > >> > the
> > > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> aligned
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at
> 12:36
> > > PM
> > > > > > Dawid
> > > > > > > >>>> > Wysakowicz
> > > > > > > >>>> > > >> <
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > > >>>> dwysakowicz@apache.org>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having
> worked
> > > on
> > > > > it,
> > > > > > I
> > > > > > > >>>> > support it
> > > > > > > >>>> > > >> ;)
> > > > > > > >>>> > > >> > I
> > > > > > > >>>> > > >> > > >>>>>>> think we
> > > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread
> soonish,
> > as
> > > > > there
> > > > > > > >>>> are no
> > > > > > > >>>> > > >> concerns
> > > > > > > >>>> > > >> > > so
> > > > > > > >>>> > > >> > > >>>>>>> far.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27,
> > Sebastian
> > > > > > Mattheis
> > > > > > > >>>> wrote:
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a
> > > discussion
> > > > > on
> > > > > > > >>>> FLIP 217
> > > > > > > >>>> > [1]
> > > > > > > >>>> > > >> for
> > > > > > > >>>> > > >> > > an
> > > > > > > >>>> > > >> > > >>>>>>> > extension
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to
> > perform
> > > > > > > alignment
> > > > > > > >>>> also
> > > > > > > >>>> > in
> > > > > > > >>>> > > >> > > >>>>>>> SplitReaders. To
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> so,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able
> to
> > > > > suspend
> > > > > > > and
> > > > > > > >>>> > resume
> > > > > > > >>>> > > >> > reading
> > > > > > > >>>> > > >> > > >>>>>>> from
> > > > > > > >>>> > > >> > > >>>>>>> > split
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator
> > > > > coordinates
> > > > > > > and
> > > > > > > >>>> > controlls
> > > > > > > >>>> > > >> > > >>>>>>> suspend and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> To
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> gather information about
> > > current
> > > > > > > >>>> watermarks
> > > > > > > >>>> > of the
> > > > > > > >>>> > > >> > > >>>>>>> SplitReaders,
> > > > > > > >>>> > > >> > > >>>>>>> > we
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> extend
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the internal
> > > > > > WatermarkOutputMulitplexer
> > > > > > > >>>> and
> > > > > > > >>>> > report
> > > > > > > >>>> > > >> > > >>>>>>> watermarks to
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this
> FLIP
> > > [2],
> > > > > > > >>>> prototyped
> > > > > > > >>>> > by
> > > > > > > >>>> > > >> Arvid
> > > > > > > >>>> > > >> > > >>>>>>> Heise and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> revised
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid
> Wysakowicz
> > > (He
> > > > > did
> > > > > > > >>>> most of
> > > > > > > >>>> > the
> > > > > > > >>>> > > >> > work.)
> > > > > > > >>>> > > >> > > >>>>>>> and me.
> > > > > > > >>>> > > >> > > >>>>>>> > The
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> changes
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible
> in a
> > > way
> > > > > > that
> > > > > > > if
> > > > > > > >>>> > affected
> > > > > > > >>>> > > >> > > >>>>>>> components do
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> not
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the
> > > > > behavior
> > > > > > is
> > > > > > > >>>> as
> > > > > > > >>>> > before.
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >>
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [2]
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> --
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> twitter.com/snntrablehttps://github.com/knaufk
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > > >>>> > > >> > > >>>>>>>
> > > > > > > >>>> > > >> > > >>>>>>
> > > > > > > >>>> > > >> > >
> > > > > > > >>>> > > >> >
> > > > > > > >>>> > > >>
> > > > > > > >>>> > > >
> > > > > > > >>>> >
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
 Thanks for the explanation, Piotr.

So it looks like we have a conclusion here.

1. Regarding the supportsPausingSplits() method, I feel it brings more
confusion while the benefit is marginal, so I prefer not having that if
possible. It would be good to also hear @Thomas Weise <th...@apache.org>'s
opinion as he mentioned some concern earlier.
2. Let's add the feature knob then. In the future we can simply ignore the
configuration when deprecating it.

Thanks,

Jiangjie (Becket) Qin

On Wed, Jun 29, 2022 at 10:19 PM Piotr Nowojski <pn...@apache.org>
wrote:

> Hi,
>
> I mean I'm fine with throwing an exception by default in Flink 1.16 in the
> "Case 2", but I think we need to provide a way to workaround it for example
> via a feature toggle, if it's an easy thing to do. And it seems to be a
> simple thing.
>
> However this is orthogonal to the `supportsPausingSplits()` issue. I don't
> have a big preference whether
>   a) the exception should originate on JM, using `default boolean
> supportsPausingSplits() { return false; }` (as currently proposed in the
> FLIP),
>   b) or on the TM from `pauseOrResumeSplits()` throwing
> `UnsupportedOperationException` as you are proposing.
>
> a) fails earlier, so it's more user friendly from this perspective, but it
> provides more possibilities for bugs/inconsistencies for connector
> developers, since `supportsPausingSplits()` would have to be kept in sync
> with `pauseOrResumeSplits()`.
>
> Best,
> Piotrek
>
> śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com> napisał(a):
>
> > Hi Piotr,
> >
> > Just to make sure we are on the same page. There are two cases for the
> > existing FLIP-182 users:
> >
> > Case 1: Each source reader only has one split assigned. This is the
> > targeted case for FLIP-182.
> > Case 2: Each source reader has multiple splits assigned. This is the
> flaky
> > case that may or may not work.
> >
> > With solution 1, the users of case 1 won't be impacted. The users in
> case 2
> > will receive an exception which they won't get at the moment.
> >
> > Do you mean we should not throw an exception in case 2? Personally I feel
> > that is OK and could have been done in FLIP-182 itself because it's not a
> > designed use case. As a user I may see a big variation of the job state
> > sizes from time to time and I am not able to rely on this feature to plan
> > my resources and uphold the SLA.
> >
> > That said, if you have a strong opinion on this, I am fine with having
> the
> > configuration like "allow.coarse-grained.watermark.alignment" with the
> > default value set to false, given that a configuration is much easier to
> > deprecate than a method.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Wed, Jun 29, 2022 at 8:02 PM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> > > Thanks for the explanation.
> > >
> > > > 2. It is fully compatible with FLIP-182, if we consider it as the
> right
> > > > thing to throw an exception for readers reading from multiple splits
> > > > without supporting split pausing.
> > >
> > > I think that's fine. But the question is should we provide a workaround
> > for
> > > existing users? IMO if it's easy to do, we should.
> > >
> > > > I actually think neither solution 1 or 2 breaks FLIP-182 users,
> > >
> > > They do. User has currently a working Flink 1.15 deployment, where
> > > watermark alignment maybe is not behaving ideally, but it's working to
> > some
> > > extent and you are proposing to throw them an exception after upgrading
> > > Flink, without any workaround (short of implementing a feature, which
> is
> > a
> > > very problematic requirement). Given that costly upgrades are one of
> the
> > > major complaints, I would be definitely in favor of option 2. Given the
> > > most likely small actually affected user base, I would be +1 for
> > solution 2
> > > with throwing an exception by default.
> > >
> > > Best,
> > > Piotrek
> > >
> > >
> > > śr., 29 cze 2022 o 12:55 Becket Qin <be...@gmail.com> napisał(a):
> > >
> > > > Hi Piotr,
> > > >
> > > > Please see the reply inline below:
> > > >
> > > > On Wed, Jun 29, 2022 at 5:11 PM Piotr Nowojski <pnowojski@apache.org
> >
> > > > wrote:
> > > >
> > > > > Hi Becket,
> > > > >
> > > > > > My main concern of having a supportsPausingSplits() knob
> > > > >
> > > > > What is the problem with `supportsPausingSplits()` that you see?
> Do
> > > you
> > > > > want to remove it?
> > > > >
> > > > Just to make sure we are on the same page, I assume we are talking
> > about
> > > > this supportingPausingSplits() method in the Source interface. If we
> go
> > > > with the obligatory features addition path, having this method seems
> > > > misleading. And also, later on at some point when we see all the
> > sources
> > > > have implemented this feature, we will have to worry about
> deprecating
> > > this
> > > > method, which is backwards incompatible.
> > > >
> > > >
> > > > > Also I don't understand your proposal for Solution 1. How do you
> want
> > > to
> > > > > decide whether to throw an exception? For that we would need to
> have
> > > > > `supportsPausingSplits()`, right?
> > > > >
> > > >
> > > > What I am thinking is the following:
> > > >
> > > > 1. The Flink framework always assumes split pausing is supported and
> > just
> > > > naively invokes SourceReader#pauseOrResumeSplits().
> > > > 2. The SourceReaderBase will basically again try to ask the
> SplitReader
> > > to
> > > > pause the splits.
> > > > 3. Because the default implementation throws an
> > > > UnsupportedOperationException, if the source developer did not
> override
> > > it,
> > > > this exception will be thrown and bubbled up.
> > > > 4. After catching this exception, the SourceReaderBase will just
> check
> > if
> > > > there is only one split that is currently assigned to the split
> reader.
> > > If
> > > > so, it swallows the exception, stops polling the split reader and
> > returns
> > > > NOTHING_AVAILABLE. This is the same as the current logic in the
> > > > SourceOperator. If we are not comfortable with moving this logic to
> the
> > > > SourceReaderBase, we can also just keep the logic there and simply
> let
> > > > SourceOperator remember if there are more than one split assigned to
> > the
> > > > source reader, when SourceOperator.handleAddSplitsEvent() is invoked.
> > > >
> > > > This way the existing FLIP-182 users won't be impacted by this FLIP.
> > For
> > > > those source readers that only have one split assigned, it works fine
> > > > without any change. For those source readers with multiple splits
> > > assigned,
> > > > they are already in a limp state with unpredictable side effects. We
> > > might
> > > > as well let them know this instead of pretending the
> > > > coarse-grained watermark alignment works fine for them.
> > > >
> > > > The advantage of this solution is that we don't have to do anything
> > after
> > > > this. That would work fine as the final state, as in:
> > > > 1. We have already done the best we can do for the Sources that do
> not
> > > > support split pausing.
> > > > 2. It is fully compatible with FLIP-182, if we consider it as the
> right
> > > > thing to throw an exception for readers reading from multiple splits
> > > > without supporting split pausing.
> > > > 3. There is nothing to deprecate in the future.
> > > >
> > > >
> > > > >
> > > > > If so, I would prefer solution 2, to provide a graceful migration
> > path
> > > > for
> > > > > any users that are already using FLIP-182 with multiple splits per
> > > > > operator. I don't think there are many of those, but such a flag
> > seems
> > > > easy
> > > > > to implement while making migration easier. Having said that,
> > > technically
> > > > > we could change the behaviour and start throwing an exception
> always
> > in
> > > > > such a case, as this feature is marked as Experimental.
> > > > >
> > > >
> > > > I actually think neither solution 1 or 2 breaks FLIP-182 users, but
> > > > solution 2 needs a deprecation process for the option in the future.
> > > >
> > > >
> > > > >
> > > > > Best,
> > > > > Piotrek
> > > > >
> > > > > śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com>
> > napisał(a):
> > > > >
> > > > > > Hi Sebastian,
> > > > > >
> > > > > > Regarding the question,
> > > > > >
> > > > > > >
> > > > > > > @Becket: I'm not sure about the intention of solution 1. Can
> you
> > > > > explain
> > > > > > > that a bit more? In particular, I don't understand: "The
> > reasoning
> > > > > behind
> > > > > > > this solution is that existing users should only use the
> > > > > > > coarse watermark alignment when a source reader only reads
> from a
> > > > > single
> > > > > > > split." Why should a user not use coarse watermark alignment
> when
> > > > > source
> > > > > > > reader reads from multiple splits? The split alignment uses the
> > > > "coarse
> > > > > > > watermark", i.e., maxDesiredWatermark, as described in the FLIP
> > for
> > > > > > > alignment.
> > > > > >
> > > > > >
> > > > > > Imagine you have a source reader reading from two splits, and the
> > > > > > watermarks look like the following:
> > > > > > 1. Watermark of Split 1: 10:00 AM Jun 29,
> > > > > > 2. Watermark of Split 2: 11:00 AM Jun 29
> > > > > > 3. maxDesiredWatermark:10:30 AM Jun 29
> > > > > >
> > > > > > At this point, the source reader's watermark is 10:00 AM which is
> > > lower
> > > > > > than the maxDesiredWatermark, so the source reader won't be
> paused
> > > from
> > > > > > reading. However, because the source reader cannot specify which
> > > split
> > > > to
> > > > > > read from, if it continues to read, the watermark gap between the
> > two
> > > > > > splits may become even bigger. This essentially fails the main
> > > purpose
> > > > of
> > > > > > watermark alignment - to reduce the number of records buffered in
> > the
> > > > > > state. This does not necessarily happen, but this is not what
> > > FLIP-182
> > > > > was
> > > > > > designed for to begin with. So I'd rather avoid extending the
> > feature
> > > > > > to that case.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jiangjie (Becket) Qin
> > > > > >
> > > > > > On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> > > > > > sebastian@ververica.com>
> > > > > > wrote:
> > > > > >
> > > > > > > (Sorry I didn't send to the list but only to Becket. My bad and
> > > > thanks
> > > > > > > Piotr. Next attempt:)
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Thanks for pushing the FLIP. I would drive it and would be
> happy
> > to
> > > > get
> > > > > > > back to you, @Thomas, for reviews. (Sorry for low
> responsiveness,
> > > > there
> > > > > > > were several efforts with high priority on my side ...) As next
> > > > step, I
> > > > > > > would revise the FLIP to get the discussion concluded.
> > > > > > >
> > > > > > > However, as Becket mentioned I feel that some things are still
> > not
> > > > > clear
> > > > > > > yet:
> > > > > > >
> > > > > > > Re: Thomas
> > > > > > >>
> > > > > > >> However, from a user perspective, should the split level
> > alignment
> > > > be
> > > > > > >>> an opt-in feature, at least for a few releases? If yes, then
> we
> > > > would
> > > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > > understand
> > > > > > >>> won't be part of the revised FLIP. Such control may be
> > > beneficial:
> > > > > > >>
> > > > > > >>
> > > > > > >>> * Compare runtime behavior with split level alignment on/off
> > > > > > >>> * Allow use of sources that don't implement pausing splits
> yet
> > > > > > >>
> > > > > > >>
> > > > > > >>> The second point would, from the user's perspective, be
> > necessary
> > > > for
> > > > > > >>> backward compatibility? While the interface aspect and source
> > > > > > >>> compatibility has been discussed in great detail, I don't
> think
> > > it
> > > > > > >>> would be desirable if an application that already uses
> > alignment
> > > > > fails
> > > > > > >>> after upgrading to the new Flink version, forcing users to
> lock
> > > > step
> > > > > > >>> modify sources for the new non-optional split level
> alignment.
> > > > > > >>
> > > > > > >>
> > > > > > >>> So I think clarification of the compatibility aspect on the
> > FLIP
> > > > page
> > > > > > >>> would be necessary.
> > > > > > >>
> > > > > > >>
> > > > > > >> That is a good point. Currently FLIP-182 is marked as
> > > experimental.
> > > > So
> > > > > > >> technically speaking it could change. That said, I agree that
> it
> > > > would
> > > > > > be
> > > > > > >> good to not break the existing sources.
> > > > > > >>
> > > > > > >> My main concern of having a supportsPausingSplits() knob is
> that
> > > > this
> > > > > > >> might be something requiring code change on future
> derepcation.
> > I
> > > am
> > > > > > >> thinking of two potential solutions:
> > > > > > >>
> > > > > > >> Solution 1:
> > > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is
> invoked,
> > if
> > > > the
> > > > > > >> source reader only has one split assigned, the source reader
> > > simply
> > > > > > stops
> > > > > > >> polling but just returns NOTHING_AVAILABLE. If there are more
> > than
> > > > one
> > > > > > >> splits assigned, it throws an exception with a message such as
> > > "The
> > > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > > alignment
> > > > > > >> when assigned a single split. There are more than one split
> > > assigned
> > > > > to
> > > > > > the
> > > > > > >> SplitReader".
> > > > > > >> The reasoning behind this solution is that existing users
> should
> > > > only
> > > > > > use
> > > > > > >> the coarse watermark alignment when a source reader only reads
> > > from
> > > > a
> > > > > > >> single split. Reading from more than one split might have
> > unwanted
> > > > > side
> > > > > > >> effects, so we might as well throw an exception in this case.
> > > > > > >>
> > > > > > >> Solution 2:
> > > > > > >> Having a configuration
> > > "enable.coarse-grained.watermark.alignment",
> > > > > the
> > > > > > >> default value is false. Once it is set to true, we will allow
> > > > > > >> coarse-grained watermark alignment if a SplitReader is
> pausable.
> > > > > > >> This solution allows users to keep the current FLIP-182
> > behavior,
> > > > with
> > > > > > >> the risk of side effects.
> > > > > > >>
> > > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > > >> coarse-grained watermark alignment could be frustrating to the
> > > users
> > > > > > >> when more than one split is assigned. So we might as well not
> > > > support
> > > > > > it at
> > > > > > >> all. And also there is nothing to deprecate in the future with
> > > this
> > > > > > >> solution.
> > > > > > >>
> > > > > > >> What do you think?
> > > > > > >>
> > > > > > >
> > > > > > > @Thomas: My understanding is that you intend a simple
> switch/knob
> > > to
> > > > > test
> > > > > > > w/ and w/o (split) watermark alignment, right? Isn't the
> > > > coarse-grained
> > > > > > w/
> > > > > > > vs w/o watermark alignment sufficient for that? Or do you think
> > > that
> > > > > > > switching watermark aligment explicitly on split level is
> > required?
> > > > > > >
> > > > > > > @Becket: I'm not sure about the intention of solution 1. Can
> you
> > > > > explain
> > > > > > > that a bit more? In particular, I don't understand: "The
> > reasoning
> > > > > behind
> > > > > > > this solution is that existing users should only use the coarse
> > > > > watermark
> > > > > > > alignment when a source reader only reads from a single split."
> > Why
> > > > > > > should a user not use coarse watermark alignment when source
> > reader
> > > > > reads
> > > > > > > from multiple splits? The split alignment uses the "coarse
> > > > watermark",
> > > > > > > i.e., maxDesiredWatermark, as described in the FLIP for
> > alignment.
> > > > > > >
> > > > > > > Could you please clarify?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Sebastian
> > > > > > >
> > > > > > > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <
> becket.qin@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> Thanks for the feedback, Thomas and Steve. And thanks Piotr
> for
> > > the
> > > > > > >> patient and detailed discussion.
> > > > > > >>
> > > > > > >> Let's move forward with option 1 then.
> > > > > > >>
> > > > > > >> Re: Thomas
> > > > > > >>
> > > > > > >> However, from a user perspective, should the split level
> > alignment
> > > > be
> > > > > > >>> an opt-in feature, at least for a few releases? If yes, then
> we
> > > > would
> > > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > > understand
> > > > > > >>> won't be part of the revised FLIP. Such control may be
> > > beneficial:
> > > > > > >>
> > > > > > >>
> > > > > > >>> * Compare runtime behavior with split level alignment on/off
> > > > > > >>> * Allow use of sources that don't implement pausing splits
> yet
> > > > > > >>
> > > > > > >>
> > > > > > >>> The second point would, from the user's perspective, be
> > necessary
> > > > for
> > > > > > >>> backward compatibility? While the interface aspect and source
> > > > > > >>> compatibility has been discussed in great detail, I don't
> think
> > > it
> > > > > > >>> would be desirable if an application that already uses
> > alignment
> > > > > fails
> > > > > > >>> after upgrading to the new Flink version, forcing users to
> lock
> > > > step
> > > > > > >>> modify sources for the new non-optional split level
> alignment.
> > > > > > >>
> > > > > > >>
> > > > > > >>> So I think clarification of the compatibility aspect on the
> > FLIP
> > > > page
> > > > > > >>> would be necessary.
> > > > > > >>
> > > > > > >>
> > > > > > >> That is a good point. Currently FLIP-182 is marked as
> > > experimental.
> > > > So
> > > > > > >> technically speaking it could change. That said, I agree that
> it
> > > > would
> > > > > > be
> > > > > > >> good to not break the existing sources.
> > > > > > >>
> > > > > > >> My main concern of having a supportsPausingSplits() knob is
> that
> > > > this
> > > > > > >> might be something requiring code change on future
> derepcation.
> > I
> > > am
> > > > > > >> thinking of two potential solutions:
> > > > > > >>
> > > > > > >> Solution 1:
> > > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is
> invoked,
> > if
> > > > the
> > > > > > >> source reader only has one split assigned, the source reader
> > > simply
> > > > > > stops
> > > > > > >> polling but just returns NOTHING_AVAILABLE. If there are more
> > than
> > > > one
> > > > > > >> splits assigned, it throws an exception with a message such as
> > > "The
> > > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > > alignment
> > > > > > when
> > > > > > >> assigned a single split. There are more than one split
> assigned
> > to
> > > > the
> > > > > > >> SplitReader".
> > > > > > >> The reasoning behind this solution is that existing users
> should
> > > > only
> > > > > > use
> > > > > > >> the coarse watermark alignment when a source reader only reads
> > > from
> > > > a
> > > > > > >> single split. Reading from more than one split might have
> > unwanted
> > > > > side
> > > > > > >> effects, so we might as well throw an exception in this case.
> > > > > > >>
> > > > > > >> Solution 2:
> > > > > > >> Having a configuration
> > > "enable.coarse-grained.watermark.alignment",
> > > > > the
> > > > > > >> default value is false. Once it is set to true, we will allow
> > > > > > >> coarse-grained watermark alignment if a SplitReader is
> pausable.
> > > > > > >> This solution allows users to keep the current FLIP-182
> > behavior,
> > > > with
> > > > > > >> the risk of side effects.
> > > > > > >>
> > > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > > >> coarse-grained watermark alignment could be frustrating to the
> > > users
> > > > > > when
> > > > > > >> more than one split is assigned. So we might as well not
> support
> > > it
> > > > at
> > > > > > all.
> > > > > > >> And also there is nothing to deprecate in the future with this
> > > > > solution.
> > > > > > >>
> > > > > > >> What do you think?
> > > > > > >>
> > > > > > >> Thank,
> > > > > > >>
> > > > > > >> Jiangjie (Becket) Qin
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <
> > > > pnowojski@apache.org>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> Hi,
> > > > > > >>>
> > > > > > >>> It looks like option 1 wins overall? So let's go with that.
> > > > > > >>>
> > > > > > >>> Best,
> > > > > > >>> Piotrek
> > > > > > >>>
> > > > > > >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com>
> > > > > napisał(a):
> > > > > > >>>
> > > > > > >>>> Both option 1 (default impl in base interface) and option 2
> > > > > > (decorative
> > > > > > >>>> interface) are pretty common patterns. I would also be fine
> > with
> > > > > > either.
> > > > > > >>>> The important thing is that an exception is thrown if a
> source
> > > > > doesn't
> > > > > > >>>> support the alignment capability.
> > > > > > >>>>
> > > > > > >>>> The other point is that we can validate the source
> capability
> > if
> > > > > > >>>> alignment
> > > > > > >>>> is enabled in WatermarkStrategy. I believe either option can
> > > > achieve
> > > > > > >>>> this
> > > > > > >>>> goal too.
> > > > > > >>>> public interface WatermarkStrategy<T> {
> > > > > > >>>>     WatermarkStrategy<T> withWatermarkAlignment(String
> > > > > watermarkGroup,
> > > > > > >>>> Duration maxAllowedWatermarkDrift);
> > > > > > >>>> }
> > > > > > >>>>
> > > > > > >>>> If I have to pick one, I am slightly favoring option 1 (base
> > > > > > >>>> interface). As
> > > > > > >>>> watermark is already an essential concept of source, maybe
> > > > watermark
> > > > > > >>>> alignment capability can also be a property of the base
> > > > > source/reader
> > > > > > >>>> interface.
> > > > > > >>>>
> > > > > > >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <
> thw@apache.org>
> > > > > wrote:
> > > > > > >>>>
> > > > > > >>>> > Hi everyone,
> > > > > > >>>> >
> > > > > > >>>> > Thank you for all the effort that went into this
> discussion.
> > > The
> > > > > > split
> > > > > > >>>> > level watermark alignment will be an important feature for
> > > Flink
> > > > > > that
> > > > > > >>>> > will address operational problems for various use cases.
> > From
> > > > > > reading
> > > > > > >>>> > through this thread it appears that not too much remains
> to
> > > > bring
> > > > > > this
> > > > > > >>>> > FLIP to acceptance and allow development to move forward.
> I
> > > > would
> > > > > > like
> > > > > > >>>> > to contribute if possible.
> > > > > > >>>> >
> > > > > > >>>> > Regarding option 1 vs. option 2: I don't have a strong
> > > > preference,
> > > > > > >>>> > perhaps slightly leaning towards option 1.
> > > > > > >>>> >
> > > > > > >>>> > However, from a user perspective, should the split level
> > > > alignment
> > > > > > be
> > > > > > >>>> > an opt-in feature, at least for a few releases? If yes,
> then
> > > we
> > > > > > would
> > > > > > >>>> > require a knob similar to supportsPausingSplits(), which I
> > > > > > understand
> > > > > > >>>> > won't be part of the revised FLIP. Such control may be
> > > > beneficial:
> > > > > > >>>> >
> > > > > > >>>> > * Compare runtime behavior with split level alignment
> on/off
> > > > > > >>>> > * Allow use of sources that don't implement pausing splits
> > yet
> > > > > > >>>> >
> > > > > > >>>> > The second point would, from the user's perspective, be
> > > > necessary
> > > > > > for
> > > > > > >>>> > backward compatibility? While the interface aspect and
> > source
> > > > > > >>>> > compatibility has been discussed in great detail, I don't
> > > think
> > > > it
> > > > > > >>>> > would be desirable if an application that already uses
> > > alignment
> > > > > > fails
> > > > > > >>>> > after upgrading to the new Flink version, forcing users to
> > > lock
> > > > > step
> > > > > > >>>> > modify sources for the new non-optional split level
> > alignment.
> > > > > > >>>> >
> > > > > > >>>> > So I think clarification of the compatibility aspect on
> the
> > > FLIP
> > > > > > page
> > > > > > >>>> > would be necessary.
> > > > > > >>>> >
> > > > > > >>>> > Thanks,
> > > > > > >>>> > Thomas
> > > > > > >>>> >
> > > > > > >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> > > > > > >>>> piotr.nowojski@gmail.com>
> > > > > > >>>> > wrote:
> > > > > > >>>> > >
> > > > > > >>>> > > Hi Becket,
> > > > > > >>>> > >
> > > > > > >>>> > > Thanks for summing this up. Just one correction:
> > > > > > >>>> > >
> > > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > > >>>> > > >   e) It is OK that the code itself in option 2
> indicates
> > > the
> > > > > > >>>> developers
> > > > > > >>>> > > that a feature is optional. We will rely on the
> > > documentation
> > > > to
> > > > > > >>>> correct
> > > > > > >>>> > > that and clarify that the feature is actually
> obligatory.
> > > > > > >>>> > >
> > > > > > >>>> > > I would say based on a) and b) that feature would be
> still
> > > > > > >>>> optional. So
> > > > > > >>>> > > both the implementation and the documentation would be
> > > saying
> > > > > > that.
> > > > > > >>>> We
> > > > > > >>>> > > could add a mention to the docs and release notes, that
> > this
> > > > > > >>>> feature will
> > > > > > >>>> > > be obligatory in the next major release and plan such a
> > > > release
> > > > > > >>>> > accordingly.
> > > > > > >>>> > >
> > > > > > >>>> > > Re the option 1., as you mentioned:
> > > > > > >>>> > > > As for option 1: For developers, the feature is still
> > > > optional
> > > > > > >>>> due to
> > > > > > >>>> > the
> > > > > > >>>> > > default implementation in the interface, regardless of
> > what
> > > > the
> > > > > > >>>> default
> > > > > > >>>> > > implementation does, because the code compiles without
> > > > > overriding
> > > > > > >>>> these
> > > > > > >>>> > > methods
> > > > > > >>>> > >
> > > > > > >>>> > > Also importantly, the code will work in most cases.
> > > > > > >>>> > >
> > > > > > >>>> > > > Obligatory: Jobs may fail if these methods are not
> > > > implemented
> > > > > > >>>> > properly.
> > > > > > >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a
> common
> > > > > pattern
> > > > > > in
> > > > > > >>>> > Java,
> > > > > > >>>> > > e.g. Iterator.remove() by default throws
> > > > > > >>>> "UnsupportedOperationException",
> > > > > > >>>> > > informing the implementation that things may go wrong if
> > > this
> > > > > > >>>> method is
> > > > > > >>>> > not
> > > > > > >>>> > > implemented.
> > > > > > >>>> > >
> > > > > > >>>> > > For me `Iterator#remove()` is an optional feature.
> > > > Personally, I
> > > > > > >>>> don't
> > > > > > >>>> > > remember if I have ever implemented it.
> > > > > > >>>> > >
> > > > > > >>>> > > Best,
> > > > > > >>>> > > Piotrek
> > > > > > >>>> > >
> > > > > > >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <
> becket.qin@gmail.com
> > >
> > > > > > >>>> napisał(a):
> > > > > > >>>> > >
> > > > > > >>>> > > > I had an offline discussion with Piotr and here is the
> > > > > summary.
> > > > > > >>>> Please
> > > > > > >>>> > > > correct me if I miss something, Piotr.
> > > > > > >>>> > > >
> > > > > > >>>> > > > There are two things we would like to seek more
> opinions
> > > > from
> > > > > > the
> > > > > > >>>> > > > community, so we can make progress on this FLIP.
> > > > > > >>>> > > >
> > > > > > >>>> > > > 1. The General pattern to add obligatory features to
> > > > existing
> > > > > > >>>> > interfaces.
> > > > > > >>>> > > >
> > > > > > >>>> > > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> ***********************************************************************************
> > > > > > >>>> > > > For interfaces exposed to the developers for
> > > implementation,
> > > > > > they
> > > > > > >>>> are
> > > > > > >>>> > > > either intended to be *optional* or *obligatory.
> *While
> > it
> > > > is
> > > > > > >>>> quite
> > > > > > >>>> > clear
> > > > > > >>>> > > > about how to convey that intention when creating the
> > > > > interfaces,
> > > > > > >>>> it is
> > > > > > >>>> > not
> > > > > > >>>> > > > as commonly agreed when we are adding new features to
> an
> > > > > > existing
> > > > > > >>>> > > > interface. In general, Flink uses decorative
> interfaces
> > > when
> > > > > > >>>> adding
> > > > > > >>>> > > > optional features to existing interfaces. Both Piotr
> > and I
> > > > > agree
> > > > > > >>>> that
> > > > > > >>>> > looks
> > > > > > >>>> > > > good.
> > > > > > >>>> > > >
> > > > > > >>>> > > > Different opinions are mainly about how to add
> > obligatory
> > > > > > >>>> features to
> > > > > > >>>> > the
> > > > > > >>>> > > > existing interfaces, probably due to different
> > > > understandings
> > > > > of
> > > > > > >>>> > > > "obligatory".
> > > > > > >>>> > > >
> > > > > > >>>> > > > We have discussed about four options:
> > > > > > >>>> > > >
> > > > > > >>>> > > > *Option 1:*
> > > > > > >>>> > > >
> > > > > > >>>> > > >    - Just add a new method to the existing interface.
> > > > > > >>>> > > >    - For backwards compatibility, the method would
> have
> > a
> > > > > > default
> > > > > > >>>> > > >    implementation throwing
> > > "UnsupportedOperationException".
> > > > > > >>>> > > >    - In the next major version, remove the default
> > > > > > implementation.
> > > > > > >>>> > > >    - For the developers, any method with a default
> > > > > > implementation
> > > > > > >>>> > > >    throwing an "UnsupportedOperationException" should
> be
> > > > taken
> > > > > > as
> > > > > > >>>> > obligatory.
> > > > > > >>>> > > >
> > > > > > >>>> > > > *Option 2:*
> > > > > > >>>> > > >
> > > > > > >>>> > > >    - Always make the features optional by adding a
> > > > decorative
> > > > > > >>>> > interface,
> > > > > > >>>> > > >    just like ordinary optional features.
> > > > > > >>>> > > >    - Inform the developers via documentation that this
> > > > feature
> > > > > > is
> > > > > > >>>> > > >    obligatory, although it looks like optional from
> the
> > > > code.
> > > > > > >>>> > > >    - In case the developers did not implement the
> > > decorative
> > > > > > >>>> interface,
> > > > > > >>>> > > >    throw an exception
> > > > > > >>>> > > >    - In the next major version, move the methods in
> the
> > > > > > decorative
> > > > > > >>>> > > >    interface to the base interface, and deprecate the
> > > > > decorative
> > > > > > >>>> > interface.
> > > > > > >>>> > > >
> > > > > > >>>> > > > *Option 3:*
> > > > > > >>>> > > >
> > > > > > >>>> > > >    - Always bump the major version when a new
> obligatory
> > > > > feature
> > > > > > >>>> is
> > > > > > >>>> > > >    added, even if we may have to do it frequently.
> > > > > > >>>> > > >
> > > > > > >>>> > > > *Option 4:*
> > > > > > >>>> > > >
> > > > > > >>>> > > >    - Add a V2, V3... of the interface affected by the
> > new
> > > > > > >>>> obligatory
> > > > > > >>>> > > >    feature.
> > > > > > >>>> > > >    - In the next major versions, deprecate old
> versions
> > of
> > > > the
> > > > > > >>>> > interfaces.
> > > > > > >>>> > > >
> > > > > > >>>> > > > Both Piotr and me agreed that option 3 and option 4
> > have a
> > > > big
> > > > > > >>>> side
> > > > > > >>>> > effect
> > > > > > >>>> > > > and should be avoided. We have different preference
> > > between
> > > > > > >>>> option 1
> > > > > > >>>> > and
> > > > > > >>>> > > > option 2.
> > > > > > >>>> > > >
> > > > > > >>>> > > > Personally I prefer option 1, the reasons are:
> > > > > > >>>> > > >   a) simple and intuitive. Java 8 introduced the
> default
> > > > impl
> > > > > in
> > > > > > >>>> > > > interfaces exactly for interface evolving, and this
> is a
> > > > > common
> > > > > > >>>> > pattern in
> > > > > > >>>> > > > many projects.
> > > > > > >>>> > > >   b) prominent to the developers that the feature is
> > > > expected
> > > > > to
> > > > > > >>>> be
> > > > > > >>>> > > > implemented, because it explicitly throws an exception
> > in
> > > > the
> > > > > > >>>> default
> > > > > > >>>> > impl.
> > > > > > >>>> > > >   c) low maintenance overhead - the Flink framework
> can
> > > > always
> > > > > > >>>> assume
> > > > > > >>>> > the
> > > > > > >>>> > > > method exists, so no special handling logic is needed.
> > > > > > >>>> > > >   d) communicate a clear semantic boundary between
> > > optional
> > > > > and
> > > > > > >>>> > obligatory
> > > > > > >>>> > > > features in the Flink to the developers.
> > > > > > >>>> > > >       - Optional: Jobs still run without exception if
> > > these
> > > > > > >>>> methods are
> > > > > > >>>> > > > not implemented. e.g. all the SupportsXXXPushDown
> > > > interfaces.
> > > > > > >>>> > > >       - Obligatory: Jobs may fail if these methods are
> > not
> > > > > > >>>> implemented
> > > > > > >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits().
> This
> > > is a
> > > > > > >>>> common
> > > > > > >>>> > pattern
> > > > > > >>>> > > > in Java, e.g. Iterator.remove() by default throws
> > > > > > >>>> > > > "UnsupportedOperationException", informing the
> > > > implementation
> > > > > > that
> > > > > > >>>> > things
> > > > > > >>>> > > > may go wrong if this method is not implemented.
> > > > > > >>>> > > >
> > > > > > >>>> > > > As for option 2, Although the API itself sounds clean,
> > it
> > > > > > misleads
> > > > > > >>>> > people
> > > > > > >>>> > > > to think of an obligatory feature to be optional -
> from
> > > the
> > > > > code
> > > > > > >>>> the
> > > > > > >>>> > > > feature is optional, but the documents say it is
> > > obligatory.
> > > > > We
> > > > > > >>>> > probably
> > > > > > >>>> > > > should avoid such code-doc inconsistency, as people
> will
> > > be
> > > > > > >>>> confused.
> > > > > > >>>> > And I
> > > > > > >>>> > > > would actually be bewildered that sometimes not
> > > implementing
> > > > > an
> > > > > > >>>> > "optional"
> > > > > > >>>> > > > feature is fine, but sometimes it causes the jobs to
> > fail.
> > > > > > >>>> > > >
> > > > > > >>>> > > > In response to the argument that the method with a
> > default
> > > > > > >>>> > implementation
> > > > > > >>>> > > > is always optional, if that is true, it actually means
> > all
> > > > the
> > > > > > >>>> > interfaces
> > > > > > >>>> > > > should be immutable once they are created. If we want
> to
> > > > add a
> > > > > > >>>> method
> > > > > > >>>> > to an
> > > > > > >>>> > > > existing interface, for backwards compatibility, we
> will
> > > > have
> > > > > to
> > > > > > >>>> > provide a
> > > > > > >>>> > > > default implementation. And the fact it has a default
> > > > > > >>>> implementation
> > > > > > >>>> > > > indicates the method is optional. If that method is
> > > > optional,
> > > > > it
> > > > > > >>>> should
> > > > > > >>>> > > > reside in a separate decorative interface, otherwise
> it
> > > > clogs
> > > > > > that
> > > > > > >>>> > existing
> > > > > > >>>> > > > interface. Therefore, people should never add a method
> > to
> > > an
> > > > > > >>>> existing
> > > > > > >>>> > > > interface. I find this conclusion a bit extreme.
> > > > > > >>>> > > >
> > > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > > >>>> > > >     a) Obligatory methods are the methods that fail
> the
> > > code
> > > > > > >>>> > compilation
> > > > > > >>>> > > > if not implemented.
> > > > > > >>>> > > >     b) All obligatory methods should reside in the
> base
> > > > > > interface,
> > > > > > >>>> > without
> > > > > > >>>> > > > a default implementation. And all the optional methods
> > > > should
> > > > > be
> > > > > > >>>> in
> > > > > > >>>> > > > decorative interfaces. This is a clean API.
> > > > > > >>>> > > >     c) due to b), there isn't a viable solution to add
> > an
> > > > > > >>>> obligatory
> > > > > > >>>> > > > method to an existing interface in a backwards
> > compatible
> > > > way.
> > > > > > >>>> Unless
> > > > > > >>>> > we
> > > > > > >>>> > > > are OK with breaking backwards compatibility, all the
> > > > > interfaces
> > > > > > >>>> > should be
> > > > > > >>>> > > > treated as immutable. As a compromise, we might as
> well
> > > just
> > > > > > >>>> treat all
> > > > > > >>>> > the
> > > > > > >>>> > > > features added later as optional features. This way we
> > > keep
> > > > > the
> > > > > > >>>> API
> > > > > > >>>> > clean.
> > > > > > >>>> > > >     d) based on b) and c), option 2 has a clean API,
> > while
> > > > > > option
> > > > > > >>>> 1
> > > > > > >>>> > does
> > > > > > >>>> > > > not.
> > > > > > >>>> > > >     e) It is OK that the code itself in option 2
> > indicates
> > > > the
> > > > > > >>>> > developers
> > > > > > >>>> > > > that a feature is optional. We will rely on the
> > > > documentation
> > > > > to
> > > > > > >>>> > correct
> > > > > > >>>> > > > that and clarify that the feature is actually
> > obligatory.
> > > > > > >>>> > > >     f) Regarding the effectiveness of making people
> > aware
> > > > that
> > > > > > the
> > > > > > >>>> > feature
> > > > > > >>>> > > > is obligatory, Option 1 and Option 2 are similar. For
> > > people
> > > > > > that
> > > > > > >>>> do
> > > > > > >>>> > not
> > > > > > >>>> > > > read the release note / documentation, they will
> mistake
> > > the
> > > > > > >>>> feature
> > > > > > >>>> > to be
> > > > > > >>>> > > > optional anyways.
> > > > > > >>>> > > >
> > > > > > >>>> > > > As for option 1: For developers, the feature is still
> > > > optional
> > > > > > >>>> due to
> > > > > > >>>> > the
> > > > > > >>>> > > > default implementation in the interface, regardless of
> > > what
> > > > > the
> > > > > > >>>> default
> > > > > > >>>> > > > implementation does, because the code compiles without
> > > > > > overriding
> > > > > > >>>> these
> > > > > > >>>> > > > methods. Also, another problem of this option is that
> > for
> > > > > users
> > > > > > >>>> that
> > > > > > >>>> > do not
> > > > > > >>>> > > > know about the history of the interface, they may be
> > > > confused
> > > > > by
> > > > > > >>>> the
> > > > > > >>>> > > > default implementation throwing an exception.
> > > > > > >>>> > > >
> > > > > > >>>> > > >
> > > > > > >>>> > > > 2. For this particular FLIP, should it be optional or
> > not?
> > > > > > >>>> > > >
> > > > > > >>>> > > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> ***********************************************************************************
> > > > > > >>>> > > > As mentioned in the previous email, I feel this FLIP
> > > should
> > > > be
> > > > > > >>>> > obligatory,
> > > > > > >>>> > > > for the following reasons:
> > > > > > >>>> > > > 1. The Flink framework exposes the watermark alignment
> > API
> > > > to
> > > > > > the
> > > > > > >>>> end
> > > > > > >>>> > > > users. From the end users' perspective, the feature
> > should
> > > > be
> > > > > > >>>> available
> > > > > > >>>> > > > regardless of the implementation details in the
> > > pluggables.
> > > > > This
> > > > > > >>>> is
> > > > > > >>>> > true
> > > > > > >>>> > > > for any other methods exposed as the Flink API.
> > > > > > >>>> > > > 2. If a Source is not pausable, the end user should
> > > receive
> > > > an
> > > > > > >>>> > exception
> > > > > > >>>> > > > when enable the watermark alignment, (both Piotr and
> me
> > > > agree
> > > > > on
> > > > > > >>>> > this). In
> > > > > > >>>> > > > that case, it meets my criteria of obligatory feature
> > > > because
> > > > > > not
> > > > > > >>>> > > > implementing the feature causes a framework API to
> throw
> > > > > > >>>> exception and
> > > > > > >>>> > > > fails the job.
> > > > > > >>>> > > >
> > > > > > >>>> > > > On the other hand, Piotr does not have a strong
> opinion
> > > > > > regarding
> > > > > > >>>> > whether
> > > > > > >>>> > > > this feature should be optional or not.
> > > > > > >>>> > > >
> > > > > > >>>> > > >
> > > > > > >>>> > > > Thanks for reading through this long email. So
> basically
> > > in
> > > > > > order
> > > > > > >>>> to
> > > > > > >>>> > make
> > > > > > >>>> > > > progress on this FLIP, we want to see what do people
> > feel
> > > > > about
> > > > > > >>>> the
> > > > > > >>>> > above
> > > > > > >>>> > > > two topics.
> > > > > > >>>> > > >
> > > > > > >>>> > > > Thanks,
> > > > > > >>>> > > >
> > > > > > >>>> > > > Jiangjie (Becket) Qin
> > > > > > >>>> > > >
> > > > > > >>>> > > >
> > > > > > >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> > > > > > >>>> pnowojski@apache.org>
> > > > > > >>>> > > > wrote:
> > > > > > >>>> > > >
> > > > > > >>>> > > >> Hi Becket,
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> I still sustain what I wrote before:
> > > > > > >>>> > > >> > I think I would still vote soft -1 on this option,
> > but
> > > I
> > > > > > >>>> wouldn't
> > > > > > >>>> > block
> > > > > > >>>> > > >> it in case I am out-voted.
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> > I think it might be helpful to agree on the
> > definition
> > > of
> > > > > > >>>> optional
> > > > > > >>>> > in
> > > > > > >>>> > > >> our
> > > > > > >>>> > > >> case.
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> For me it doesn't matter whether a default method
> > > throwing
> > > > an
> > > > > > >>>> > exception we
> > > > > > >>>> > > >> call optional or non-optional. As long as we keep it
> > this
> > > > > way,
> > > > > > >>>> the
> > > > > > >>>> > effect
> > > > > > >>>> > > >> is the same. It's effectively a method that a user
> > > doesn't
> > > > > have
> > > > > > >>>> to
> > > > > > >>>> > > >> implement. If interface/system allows some methods to
> > be
> > > > not
> > > > > > >>>> > implemented,
> > > > > > >>>> > > >> some users will do just that, regardless if we call
> it
> > > and
> > > > > > >>>> document as
> > > > > > >>>> > > >> non-optional. And at the same time it's clogging the
> > base
> > > > > > >>>> interface.
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> By the way, just the need for a
> java-doc/documentation
> > > > > > >>>> explaining the
> > > > > > >>>> > > >> existence of some construct is a bad smell (code
> should
> > > be
> > > > > > >>>> > > >> self-documenting
> > > > > > >>>> > > >> and default method throwing an
> > > > UnsupportedOperationException
> > > > > is
> > > > > > >>>> not).
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> > Please note that so far we do not assume whether
> the
> > > > > feature
> > > > > > >>>> is in
> > > > > > >>>> > > >> > the original API or it is added later. A newly
> added
> > > > > feature
> > > > > > >>>> can
> > > > > > >>>> > also be
> > > > > > >>>> > > >> > non-optional, although it might take some time for
> > all
> > > > the
> > > > > > >>>> pluggable
> > > > > > >>>> > > >> > developers to catch up, and they should still work
> if
> > > the
> > > > > new
> > > > > > >>>> > feature is
> > > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> > never
> > > > > > expect
> > > > > > >>>> an
> > > > > > >>>> > > >> optional
> > > > > > >>>> > > >> > feature to catch up, because leaving it
> unimplemented
> > > is
> > > > > also
> > > > > > >>>> > blessed.
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > Let's take the checkpointing as an example. Imagine
> > > Flink
> > > > > did
> > > > > > >>>> not
> > > > > > >>>> > > >> support
> > > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > > trying
> > > > to
> > > > > > add
> > > > > > >>>> > > >> > checkpointing to Flink. So we exposed the
> checkpoint
> > > > > > >>>> configuration
> > > > > > >>>> > to
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > end users. In the meantime, will we tell the
> > pluggable
> > > > > (e.g.
> > > > > > >>>> > operators,
> > > > > > >>>> > > >> > connectors) developers that methods like
> > > > "snapshotState()"
> > > > > is
> > > > > > >>>> > optional?
> > > > > > >>>> > > >> If
> > > > > > >>>> > > >> > we do that, the availability of checkpointing in
> > Flink
> > > > > would
> > > > > > be
> > > > > > >>>> > severely
> > > > > > >>>> > > >> > weakened. But apparently we should still allow the
> > > > existing
> > > > > > >>>> > > >> implementations
> > > > > > >>>> > > >> > to work without checkpointing. It looks to me that
> > > adding
> > > > > the
> > > > > > >>>> > method to
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > pluggable interfaces with a default implementation
> > > > throwing
> > > > > > >>>> > > >> > "UnsupportedOperationException" would be the
> solution
> > > > here.
> > > > > > >>>> Please
> > > > > > >>>> > note
> > > > > > >>>> > > >> > that in this case, having the default
> implementation
> > > does
> > > > > not
> > > > > > >>>> mean
> > > > > > >>>> > this
> > > > > > >>>> > > >> is
> > > > > > >>>> > > >> > optional. It is just the technique to support
> > backwards
> > > > > > >>>> > compatibility in
> > > > > > >>>> > > >> > the feature evolution. The fact that this method is
> > in
> > > > the
> > > > > > base
> > > > > > >>>> > > >> interface
> > > > > > >>>> > > >> > suggests it is not optional, so the developers
> SHOULD
> > > > > > >>>> implement it.
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> I would soft vote -1 for having the default method
> > > throwing
> > > > > > >>>> > > >> UnsupportedOperationException as one of thing for
> this
> > > > > > (FLIP-217)
> > > > > > >>>> > special
> > > > > > >>>> > > >> circumstances.
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> At the moment, without thinking this over too much, I
> > > would
> > > > > > vote
> > > > > > >>>> > harder -1
> > > > > > >>>> > > >> for having this as a general rule when adding new
> > > features.
> > > > > If
> > > > > > we
> > > > > > >>>> > ever end
> > > > > > >>>> > > >> up with an API that is littered with default methods
> > > > throwing
> > > > > > >>>> > > >> UnsupportedOperationException that are documented as
> > "non
> > > > > > >>>> optional" it
> > > > > > >>>> > > >> would be IMO a big design failure. I would be ok-ish
> > with
> > > > > that,
> > > > > > >>>> only
> > > > > > >>>> > if
> > > > > > >>>> > > >> that was a temporary thing and we had an aggressive
> > plan
> > > to
> > > > > > >>>> release
> > > > > > >>>> > more
> > > > > > >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...)
> > > breaking
> > > > > API
> > > > > > >>>> > > >> compatibility, that would get rid of those default
> > > methods.
> > > > > > >>>> Adding
> > > > > > >>>> > > >> checkpointing and methods like "snapshotState()"
> would
> > > IMO
> > > > > > easily
> > > > > > >>>> > justify
> > > > > > >>>> > > >> a
> > > > > > >>>> > > >> new major Flink release. In that case we could add
> > those
> > > > > > methods
> > > > > > >>>> with
> > > > > > >>>> > > >> default implementation for some transition period, a
> > one
> > > or
> > > > > two
> > > > > > >>>> minor
> > > > > > >>>> > > >> releases, followed by a clean up in a major release.
> > > > However
> > > > > I
> > > > > > >>>> would
> > > > > > >>>> > still
> > > > > > >>>> > > >> argue that it would be cleaner/better to add a
> > decorative
> > > > > > >>>> interface
> > > > > > >>>> > like
> > > > > > >>>> > > >> `CheckpointedOperator` interface instead of adding
> > those
> > > > > > default
> > > > > > >>>> > methods
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> the base `Operator` interface.
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> I think I can sum up our disagreement here is that I
> > > would
> > > > > like
> > > > > > >>>> to
> > > > > > >>>> > keep
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> interfaces simpler, with only obligatory
> > methods/features
> > > > on
> > > > > > one
> > > > > > >>>> side
> > > > > > >>>> > and
> > > > > > >>>> > > >> clearly optional features on the other. While you
> would
> > > > like
> > > > > to
> > > > > > >>>> add an
> > > > > > >>>> > > >> extra third state in between those two?
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> Best,
> > > > > > >>>> > > >> Piotrek
> > > > > > >>>> > > >>
> > > > > > >>>> > > >>
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <
> > > becket.qin@gmail.com
> > > > >
> > > > > > >>>> > napisał(a):
> > > > > > >>>> > > >>
> > > > > > >>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > It looks like the key problem is still whether the
> > > > > > >>>> implementation of
> > > > > > >>>> > > >> > pausable splits in the Sources should be optional
> or
> > > not.
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > I think it might be helpful to agree on the
> > definition
> > > of
> > > > > > >>>> optional
> > > > > > >>>> > in
> > > > > > >>>> > > >> our
> > > > > > >>>> > > >> > case. To me:
> > > > > > >>>> > > >> > Optional = "You CAN leave the method unimplemented,
> > and
> > > > > that
> > > > > > is
> > > > > > >>>> > fine."
> > > > > > >>>> > > >> > Non-Optional = "You CAN leave the method
> > unimplemented,
> > > > but
> > > > > > you
> > > > > > >>>> > SHOULD
> > > > > > >>>> > > >> NOT,
> > > > > > >>>> > > >> > because people assume this works."
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > I think one sufficient condition of a Non-Optional
> > > > feature
> > > > > is
> > > > > > >>>> that
> > > > > > >>>> > if
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > feature is exposed through the framework API, Flink
> > > > should
> > > > > > >>>> expect
> > > > > > >>>> > the
> > > > > > >>>> > > >> > pluggables to support this feature by default.
> > > Otherwise
> > > > > the
> > > > > > >>>> > > >> availability
> > > > > > >>>> > > >> > of that feature becomes undefined.
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > Please note that so far we do not assume whether
> the
> > > > > feature
> > > > > > >>>> is in
> > > > > > >>>> > > >> > the original API or it is added later. A newly
> added
> > > > > feature
> > > > > > >>>> can
> > > > > > >>>> > also be
> > > > > > >>>> > > >> > non-optional, although it might take some time for
> > all
> > > > the
> > > > > > >>>> pluggable
> > > > > > >>>> > > >> > developers to catch up, and they should still work
> if
> > > the
> > > > > new
> > > > > > >>>> > feature is
> > > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> > never
> > > > > > expect
> > > > > > >>>> an
> > > > > > >>>> > > >> optional
> > > > > > >>>> > > >> > feature to catch up, because leaving it
> unimplemented
> > > is
> > > > > also
> > > > > > >>>> > blessed.
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > Let's take the checkpointing as an example. Imagine
> > > Flink
> > > > > did
> > > > > > >>>> not
> > > > > > >>>> > > >> support
> > > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > > trying
> > > > to
> > > > > > add
> > > > > > >>>> > > >> > checkpointing to Flink. So we exposed the
> checkpoint
> > > > > > >>>> configuration
> > > > > > >>>> > to
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > end users. In the meantime, will we tell the
> > pluggable
> > > > > (e.g.
> > > > > > >>>> > operators,
> > > > > > >>>> > > >> > connectors) developers that methods like
> > > > "snapshotState()"
> > > > > is
> > > > > > >>>> > optional?
> > > > > > >>>> > > >> If
> > > > > > >>>> > > >> > we do that, the availability of checkpointing in
> > Flink
> > > > > would
> > > > > > be
> > > > > > >>>> > severely
> > > > > > >>>> > > >> > weakened. But apparently we should still allow the
> > > > existing
> > > > > > >>>> > > >> implementations
> > > > > > >>>> > > >> > to work without checkpointing. It looks to me that
> > > adding
> > > > > the
> > > > > > >>>> > method to
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > pluggable interfaces with a default implementation
> > > > throwing
> > > > > > >>>> > > >> > "UnsupportedOperationException" would be the
> solution
> > > > here.
> > > > > > >>>> Please
> > > > > > >>>> > note
> > > > > > >>>> > > >> > that in this case, having the default
> implementation
> > > does
> > > > > not
> > > > > > >>>> mean
> > > > > > >>>> > this
> > > > > > >>>> > > >> is
> > > > > > >>>> > > >> > optional. It is just the technique to support
> > backwards
> > > > > > >>>> > compatibility in
> > > > > > >>>> > > >> > the feature evolution. The fact that this method is
> > in
> > > > the
> > > > > > base
> > > > > > >>>> > > >> interface
> > > > > > >>>> > > >> > suggests it is not optional, so the developers
> SHOULD
> > > > > > >>>> implement it.
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > When it comes to this FLIP, I think it meets the
> > > criteria
> > > > > of
> > > > > > >>>> > > >> non-optional
> > > > > > >>>> > > >> > features, so we should just use the evolution path
> of
> > > > > > >>>> non-optional
> > > > > > >>>> > > >> > features.
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > Thanks,
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > Jiangjie (Becket) Qin
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > > > > > >>>> > pnowojski@apache.org>
> > > > > > >>>> > > >> > wrote:
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >> > > Hi,
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> > > Actually previously I thought about having a
> > > decorative
> > > > > > >>>> interface
> > > > > > >>>> > and
> > > > > > >>>> > > >> > > whenever watermark alignment is enabled, checking
> > > that
> > > > > the
> > > > > > >>>> source
> > > > > > >>>> > > >> > > implements the decorative interface. If not,
> > throwing
> > > > an
> > > > > > >>>> > exception.
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> > > The option with default methods in the source
> > > > interfaces
> > > > > > >>>> throwing
> > > > > > >>>> > > >> > > `UnsupportedOperationException` I think still
> > suffers
> > > > > from
> > > > > > >>>> the
> > > > > > >>>> > same
> > > > > > >>>> > > >> > > problems I mentioned before. It's still an
> optional
> > > > > > >>>> implementation
> > > > > > >>>> > > >> and at
> > > > > > >>>> > > >> > > the same time it's clogging the base interface. I
> > > > think I
> > > > > > >>>> would
> > > > > > >>>> > still
> > > > > > >>>> > > >> > vote
> > > > > > >>>> > > >> > > soft -1 on this option, but I wouldn't block it
> in
> > > > case I
> > > > > > am
> > > > > > >>>> > > >> out-voted.
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> > > Best,
> > > > > > >>>> > > >> > > Piotrek
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > > > > > >>>> > sebastian@ververica.com>
> > > > > > >>>> > > >> > > napisał(a):
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> > > > Hi Becket,
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > > Thanks a lot for your fast and detailed
> response.
> > > For
> > > > > me,
> > > > > > >>>> it
> > > > > > >>>> > > >> converges
> > > > > > >>>> > > >> > > and
> > > > > > >>>> > > >> > > > dropping the supportsX method sounds very
> > > reasonable
> > > > to
> > > > > > me.
> > > > > > >>>> > (Side
> > > > > > >>>> > > >> note:
> > > > > > >>>> > > >> > > > With "pausable splits" enabled as "default" I
> > think
> > > > we
> > > > > > >>>> > > >> misunderstood.
> > > > > > >>>> > > >> > As
> > > > > > >>>> > > >> > > > you described now "default" I understand as
> that
> > it
> > > > > > should
> > > > > > >>>> be
> > > > > > >>>> > the
> > > > > > >>>> > > >> new
> > > > > > >>>> > > >> > > > recommended way of implementation, and I think
> > that
> > > > is
> > > > > > >>>> fully
> > > > > > >>>> > valid.
> > > > > > >>>> > > >> > > Before,
> > > > > > >>>> > > >> > > > I understood "default" here as the default
> > > > > > implementation,
> > > > > > >>>> i.e.,
> > > > > > >>>> > > >> > throwing
> > > > > > >>>> > > >> > > > UnsupportedOperationException, which is the
> exact
> > > > > > >>>> opposite. :) )
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > > Nevertheless: As mentioned, an open question
> for
> > me
> > > > is
> > > > > if
> > > > > > >>>> > watermark
> > > > > > >>>> > > >> > > > alignment should enforce pausable splits. For
> > > > > > >>>> clarification, the
> > > > > > >>>> > > >> > current
> > > > > > >>>> > > >> > > > documentation [1] says:
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning
> > across
> > > > > tasks
> > > > > > >>>> of the
> > > > > > >>>> > same
> > > > > > >>>> > > >> > > >> source and/or different sources. It does not
> > > support
> > > > > > >>>> aligning
> > > > > > >>>> > > >> > > >> splits/partitions/shards in the same task.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> In a case where there are e.g. two Kafka
> > > partitions
> > > > > that
> > > > > > >>>> > produce
> > > > > > >>>> > > >> > > >> watermarks at different pace, that get
> assigned
> > to
> > > > the
> > > > > > >>>> same
> > > > > > >>>> > task
> > > > > > >>>> > > >> > > watermark
> > > > > > >>>> > > >> > > >> might not behave as expected. Fortunately,
> worst
> > > > case
> > > > > it
> > > > > > >>>> > should not
> > > > > > >>>> > > >> > > perform
> > > > > > >>>> > > >> > > >> worse than without alignment.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> Given the limitation above, we suggest
> applying
> > > > > > watermark
> > > > > > >>>> > > >> alignment in
> > > > > > >>>> > > >> > > >> two situations:
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>    1. You have two different sources (e.g.
> Kafka
> > > and
> > > > > > >>>> File) that
> > > > > > >>>> > > >> > produce
> > > > > > >>>> > > >> > > >>    watermarks at different speeds
> > > > > > >>>> > > >> > > >>    2. You run your source with parallelism
> equal
> > > to
> > > > > the
> > > > > > >>>> number
> > > > > > >>>> > of
> > > > > > >>>> > > >> > > >>    splits/shards/partitions, which results in
> > > every
> > > > > > >>>> subtask
> > > > > > >>>> > being
> > > > > > >>>> > > >> > > assigned a
> > > > > > >>>> > > >> > > >>    single unit of work.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> I personally see no issue in implementing and
> I
> > > see
> > > > no
> > > > > > >>>> reason
> > > > > > >>>> > > >> against
> > > > > > >>>> > > >> > > > implementing this dependency of watermark
> > alignment
> > > > and
> > > > > > >>>> pausable
> > > > > > >>>> > > >> > splits.
> > > > > > >>>> > > >> > > (I
> > > > > > >>>> > > >> > > > think this would even be a good path towards
> > > shaping
> > > > > > >>>> watermark
> > > > > > >>>> > > >> > alignment
> > > > > > >>>> > > >> > > in
> > > > > > >>>> > > >> > > > 1.16.) However, "I don't see" means that I
> would
> > be
> > > > > happy
> > > > > > >>>> to
> > > > > > >>>> > hear
> > > > > > >>>> > > >> > Dawid's
> > > > > > >>>> > > >> > > > and Piotrek's opinions as they implemented
> > > watermark
> > > > > > >>>> alignment
> > > > > > >>>> > > >> based on
> > > > > > >>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
> > > > > > >>>> > rationale/background
> > > > > > >>>> > > >> > info
> > > > > > >>>> > > >> > > > from their side.
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > > Regards,
> > > > > > >>>> > > >> > > > Sebastian
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > > [1]
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >>
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > > > > >>>> > > >> > > > [2]
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >>
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > > > > > >>>> > becket.qin@gmail.com>
> > > > > > >>>> > > >> > wrote:
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > >> +dev
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> Hi Sebastian,
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> Thank you for the summary. Please see the
> > detailed
> > > > > > replies
> > > > > > >>>> > inline.
> > > > > > >>>> > > >> As
> > > > > > >>>> > > >> > a
> > > > > > >>>> > > >> > > >> recap of my suggestions.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> 1. Pausable splits API.
> > > > > > >>>> > > >> > > >>   a) Add default implementations to methods
> > > > > > >>>> > "pauseOrResumeSplits"
> > > > > > >>>> > > >> in
> > > > > > >>>> > > >> > > both
> > > > > > >>>> > > >> > > >> SourceReader and SplitReader where both
> default
> > > > > > >>>> implementations
> > > > > > >>>> > > >> throw
> > > > > > >>>> > > >> > > >>  UnsupportedOperationException.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> 2. User story.
> > > > > > >>>> > > >> > > >>     a) We tell users to enable the watermark
> > > > alignment
> > > > > > as
> > > > > > >>>> they
> > > > > > >>>> > > >> like.
> > > > > > >>>> > > >> > > This
> > > > > > >>>> > > >> > > >> is exactly what the current Flink API is.
> > > > > > >>>> > > >> > > >>     b) We tell the source developers, please
> > > > implement
> > > > > > >>>> pausable
> > > > > > >>>> > > >> > splits,
> > > > > > >>>> > > >> > > >> otherwise bad things may happen. Think of it
> > like
> > > > you
> > > > > > are
> > > > > > >>>> > expected
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >> implement SourceReader#snapshotState()
> properly,
> > > > > > otherwise
> > > > > > >>>> > > >> exceptions
> > > > > > >>>> > > >> > > will
> > > > > > >>>> > > >> > > >> be thrown when users enable checkpointing.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> Thanks,
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian
> > > Mattheis <
> > > > > > >>>> > > >> > > >> sebastian@ververica.com> wrote:
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>> Hi Becket, Hi everybody,
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> I'm sorry if I misread the messages but I
> could
> > > not
> > > > > > >>>> derive an
> > > > > > >>>> > > >> > agreement
> > > > > > >>>> > > >> > > >>> from the mailing list. Nevertheless, if I
> > > > understand
> > > > > > you
> > > > > > >>>> > right the
> > > > > > >>>> > > >> > > >>> suggestion is:
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> * Add default implementations to methods
> > > > > > >>>> > "pauseOrResumeSplits" in
> > > > > > >>>> > > >> > both
> > > > > > >>>> > > >> > > >>> SourceReader and SplitReader where both
> default
> > > > > > >>>> > implementations
> > > > > > >>>> > > >> throw
> > > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >> Yes.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the
> > Source
> > > > > > >>>> interface.
> > > > > > >>>> > (In
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>> following, I refer to supporting this as
> > > "pausable
> > > > > > >>>> splits".)
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >> We may no longer need this if pausable splits
> > are
> > > > > > >>>> expected to
> > > > > > >>>> > be
> > > > > > >>>> > > >> > > >> implemented by the source developers, i.e.
> > > > > non-optional.
> > > > > > >>>> Having
> > > > > > >>>> > > >> this
> > > > > > >>>> > > >> > > method
> > > > > > >>>> > > >> > > >> would then be somewhat misleading as it looks
> > like
> > > > the
> > > > > > >>>> sources
> > > > > > >>>> > > >> that do
> > > > > > >>>> > > >> > > not
> > > > > > >>>> > > >> > > >> support pausable splits are also acceptable in
> > the
> > > > > long
> > > > > > >>>> term.
> > > > > > >>>> > So
> > > > > > >>>> > > >> API
> > > > > > >>>> > > >> > > wise,
> > > > > > >>>> > > >> > > >> I'd say maybe we should remove this for this
> > FLIP,
> > > > > > >>>> although I
> > > > > > >>>> > > >> believe
> > > > > > >>>> > > >> > > this
> > > > > > >>>> > > >> > > >> supportXXX pattern itself is still attractive
> > for
> > > > > > optional
> > > > > > >>>> > > >> features.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> To make the conclusions explicit:
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits
> in
> > > > both
> > > > > > >>>> > interfaces
> > > > > > >>>> > > >> > > >>> SourceReader and SplitReader are optional
> where
> > > the
> > > > > > >>>> default is
> > > > > > >>>> > > >> that
> > > > > > >>>> > > >> > it
> > > > > > >>>> > > >> > > >>> doesn't support it. (--> This means that the
> > > > > > >>>> implementation is
> > > > > > >>>> > > >> still
> > > > > > >>>> > > >> > > >>> optional for the source developer.)
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >> It is optional for backwards compatibility
> with
> > > > > existing
> > > > > > >>>> > sources,
> > > > > > >>>> > > >> as
> > > > > > >>>> > > >> > > they
> > > > > > >>>> > > >> > > >> may still compile without code change. But
> > > starting
> > > > > from
> > > > > > >>>> this
> > > > > > >>>> > FLIP,
> > > > > > >>>> > > >> > > Flink
> > > > > > >>>> > > >> > > >> will always optimistically assume that all the
> > > > sources
> > > > > > >>>> support
> > > > > > >>>> > > >> > pausable
> > > > > > >>>> > > >> > > >> splits. If a source does not support pausable
> > > > splits,
> > > > > it
> > > > > > >>>> goes
> > > > > > >>>> > to an
> > > > > > >>>> > > >> > > error
> > > > > > >>>> > > >> > > >> handling path when watermark alignment is
> > enabled
> > > on
> > > > > it.
> > > > > > >>>> This
> > > > > > >>>> > is
> > > > > > >>>> > > >> > > different
> > > > > > >>>> > > >> > > >> from a usual optional feature, where no error
> is
> > > > > > expected.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>> 2. If watermark alignment is enabled in the
> > > > > application
> > > > > > >>>> code
> > > > > > >>>> > by
> > > > > > >>>> > > >> > adding
> > > > > > >>>> > > >> > > >>> withWatermarkAlignment to the
> WatermarkStrategy
> > > > while
> > > > > > >>>> > > >> SourceReader or
> > > > > > >>>> > > >> > > >>> SplitReader do not support pausableSplits, we
> > > throw
> > > > > an
> > > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >> Yes.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>> 3. With regard to your statement:
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>>> [...] basically means watermark alignment is
> > an
> > > > > > >>>> non-optional
> > > > > > >>>> > > >> feature
> > > > > > >>>> > > >> > > to
> > > > > > >>>> > > >> > > >>>> the end users.
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> You actually mean that "pausable splits" are
> > > > > > >>>> non-optional for
> > > > > > >>>> > the
> > > > > > >>>> > > >> app
> > > > > > >>>> > > >> > > >>> developer if watermark alignment is enabled.
> > > > However,
> > > > > > >>>> > watermark
> > > > > > >>>> > > >> > > alignment
> > > > > > >>>> > > >> > > >>> is optional and can be enabled/disabled.
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >> Yes, watermark alignment can be
> enabled/disabled
> > > in
> > > > > > >>>> individual
> > > > > > >>>> > > >> sources
> > > > > > >>>> > > >> > > in
> > > > > > >>>> > > >> > > >> Flink jobs, which basically means the code
> > > > supporting
> > > > > > >>>> watermark
> > > > > > >>>> > > >> > > alignment
> > > > > > >>>> > > >> > > >> has to already be there. That again means the
> > > Source
> > > > > > >>>> > developers are
> > > > > > >>>> > > >> > also
> > > > > > >>>> > > >> > > >> expected to support pausable splits by
> default.
> > So
> > > > > this
> > > > > > >>>> way we
> > > > > > >>>> > > >> > > essentially
> > > > > > >>>> > > >> > > >> tell the end users that you may enable /
> disable
> > > > this
> > > > > > >>>> feature
> > > > > > >>>> > as
> > > > > > >>>> > > >> you
> > > > > > >>>> > > >> > > wish,
> > > > > > >>>> > > >> > > >> and tell the source developers that you SHOULD
> > > > > implement
> > > > > > >>>> this
> > > > > > >>>> > > >> because
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >> end users may turn it on/off at will. And if
> the
> > > > > source
> > > > > > >>>> does
> > > > > > >>>> > not
> > > > > > >>>> > > >> > support
> > > > > > >>>> > > >> > > >> pausable splits, that goes to an error
> handling
> > > path
> > > > > > when
> > > > > > >>>> > watermark
> > > > > > >>>> > > >> > > >> alignment is enabled on it. So users know they
> > > have
> > > > to
> > > > > > >>>> > explicitly
> > > > > > >>>> > > >> > > exclude
> > > > > > >>>> > > >> > > >> this source.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> So far it's totally clear to me and I hope
> this
> > > is
> > > > > what
> > > > > > >>>> you
> > > > > > >>>> > mean.
> > > > > > >>>> > > >> I
> > > > > > >>>> > > >> > > also
> > > > > > >>>> > > >> > > >>> agree with both statements:
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> So making that expectation aligned with the
> > > source
> > > > > > >>>> developers
> > > > > > >>>> > > >> seems
> > > > > > >>>> > > >> > > >>>> reasonable.
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> I think this is a simple and clean solution
> > from
> > > > both
> > > > > > >>>> the end
> > > > > > >>>> > user
> > > > > > >>>> > > >> > and
> > > > > > >>>> > > >> > > >>>> source developers' standpoint.
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> However, a last conclusion derives from 3.
> and
> > is
> > > > an
> > > > > > open
> > > > > > >>>> > question
> > > > > > >>>> > > >> > for
> > > > > > >>>> > > >> > > >>> me:
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> 4. The feature of "pausable splits" is now
> > > tightly
> > > > > > bound
> > > > > > >>>> to
> > > > > > >>>> > > >> watermark
> > > > > > >>>> > > >> > > >>> alignment, i.e., if sources do not support
> > > > "pausable
> > > > > > >>>> splits"
> > > > > > >>>> > one
> > > > > > >>>> > > >> can
> > > > > > >>>> > > >> > > not
> > > > > > >>>> > > >> > > >>> enable watermark alignment for these sources.
> > > This
> > > > > > >>>> dependency
> > > > > > >>>> > is
> > > > > > >>>> > > >> not
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >>> current status of watermark alignment
> > > > implementation
> > > > > > >>>> because
> > > > > > >>>> > it
> > > > > > >>>> > > >> > is/was
> > > > > > >>>> > > >> > > >>> implemented without pausable splits. Do we
> want
> > > to
> > > > > > >>>> introduce
> > > > > > >>>> > this
> > > > > > >>>> > > >> > > >>> dependency? (This is an open question. I
> cannot
> > > > judge
> > > > > > >>>> that.)
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >> The watermark alignment basically relies on
> the
> > > > > pausable
> > > > > > >>>> > splits,
> > > > > > >>>> > > >> > right?
> > > > > > >>>> > > >> > > >> So personally I found it quite reasonable that
> > if
> > > > the
> > > > > > >>>> source
> > > > > > >>>> > does
> > > > > > >>>> > > >> not
> > > > > > >>>> > > >> > > >> support pausable splits, end users cannot
> enable
> > > > > > watermark
> > > > > > >>>> > > >> alignment
> > > > > > >>>> > > >> > on
> > > > > > >>>> > > >> > > it.
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>
> > > > > > >>>> > > >> > > >>> If something is wrong, please correct me.
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> Regards,
> > > > > > >>>> > > >> > > >>> Sebastian
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> > > > > > >>>> > becket.qin@gmail.com>
> > > > > > >>>> > > >> > > wrote:
> > > > > > >>>> > > >> > > >>>
> > > > > > >>>> > > >> > > >>>> Hi Sebastian,
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>> Thanks for the reply and patient
> discussion. I
> > > > agree
> > > > > > >>>> this is
> > > > > > >>>> > a
> > > > > > >>>> > > >> > tricky
> > > > > > >>>> > > >> > > >>>> decision.
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns
> about
> > > > Option
> > > > > > c)
> > > > > > >>>> > which I
> > > > > > >>>> > > >> see
> > > > > > >>>> > > >> > as
> > > > > > >>>> > > >> > > >>>>> follows:
> > > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > > implementation
> > > > > > >>>> makes the
> > > > > > >>>> > > >> > > >>>>> implementation optional. And in my
> opinion, a
> > > > > default
> > > > > > >>>> > > >> > implementation
> > > > > > >>>> > > >> > > is and
> > > > > > >>>> > > >> > > >>>>> will remain a way of making implementation
> > > > optional
> > > > > > >>>> because
> > > > > > >>>> > > >> even in
> > > > > > >>>> > > >> > > future
> > > > > > >>>> > > >> > > >>>>> a developer can decide to implement the
> "old
> > > > > flavor"
> > > > > > >>>> without
> > > > > > >>>> > > >> > support
> > > > > > >>>> > > >> > > for
> > > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> > find
> > > it
> > > > > > >>>> suboptimal
> > > > > > >>>> > > >> that
> > > > > > >>>> > > >> > > with
> > > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no
> way
> > > to
> > > > > > check
> > > > > > >>>> at
> > > > > > >>>> > > >> runtime
> > > > > > >>>> > > >> > if
> > > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually
> support
> > > > > pausing.
> > > > > > >>>> (To
> > > > > > >>>> > do so,
> > > > > > >>>> > > >> > one
> > > > > > >>>> > > >> > > would
> > > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it
> again
> > > more
> > > > > > >>>> > complicated.)\
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>> Based on the last few messages in the
> mailing
> > > > list.
> > > > > > >>>> Piotr
> > > > > > >>>> > and I
> > > > > > >>>> > > >> > > agreed
> > > > > > >>>> > > >> > > >>>> that the default implementation should just
> > > throw
> > > > an
> > > > > > >>>> > > >> > > >>>> UnsupportedOperationException if the source
> is
> > > > > > >>>> unpausable. So
> > > > > > >>>> > > >> this
> > > > > > >>>> > > >> > > >>>> basically tells the Source developers that
> > this
> > > > > > feature
> > > > > > >>>> is
> > > > > > >>>> > > >> expected
> > > > > > >>>> > > >> > > to be
> > > > > > >>>> > > >> > > >>>> supported. Because we cannot prevent end
> users
> > > > from
> > > > > > >>>> putting
> > > > > > >>>> > an
> > > > > > >>>> > > >> > > unpausable
> > > > > > >>>> > > >> > > >>>> source into the watermark alignment group,
> > that
> > > > > > >>>> basically
> > > > > > >>>> > means
> > > > > > >>>> > > >> > > watermark
> > > > > > >>>> > > >> > > >>>> alignment is an non-optional feature to the
> > end
> > > > > users.
> > > > > > >>>> So
> > > > > > >>>> > making
> > > > > > >>>> > > >> > that
> > > > > > >>>> > > >> > > >>>> expectation aligned with the source
> developers
> > > > seems
> > > > > > >>>> > reasonable.
> > > > > > >>>> > > >> > And
> > > > > > >>>> > > >> > > if a
> > > > > > >>>> > > >> > > >>>> source does not support this feature, the
> end
> > > > users
> > > > > > >>>> should
> > > > > > >>>> > > >> > explicitly
> > > > > > >>>> > > >> > > >>>> remove that source from the watermark
> > alignment
> > > > > group.
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>> Personally speaking I think this is a simple
> > and
> > > > > clean
> > > > > > >>>> > solution
> > > > > > >>>> > > >> from
> > > > > > >>>> > > >> > > >>>> both the end user and source developers'
> > > > standpoint.
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>> Does this address your concerns?
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>> Thanks,
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian
> > > > Mattheis <
> > > > > > >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> > > > > > >>>> > > >> > > >>>>
> > > > > > >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> > > > > > >>>> suggestions/options
> > > > > > >>>> > and
> > > > > > >>>> > > >> we
> > > > > > >>>> > > >> > > >>>>> would be okay either way because we find
> > > neither
> > > > > > >>>> solution is
> > > > > > >>>> > > >> > perfect
> > > > > > >>>> > > >> > > just
> > > > > > >>>> > > >> > > >>>>> because of the already present complexity.
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> Option c) Adding methods to the interfaces
> of
> > > > > > >>>> SourceReader
> > > > > > >>>> > and
> > > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > > >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to
> be
> > > used
> > > > > by
> > > > > > >>>> > > >> SourceReader
> > > > > > >>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> As of the current status (v. 12) of the
> FLIP
> > > [1],
> > > > > it
> > > > > > is
> > > > > > >>>> > based on
> > > > > > >>>> > > >> > > >>>>> Option c) which we find acceptable because
> > the
> > > > > > >>>> complexity
> > > > > > >>>> > added
> > > > > > >>>> > > >> is
> > > > > > >>>> > > >> > > only a
> > > > > > >>>> > > >> > > >>>>> single method.
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns
> about
> > > > Option
> > > > > > c)
> > > > > > >>>> > which I
> > > > > > >>>> > > >> see
> > > > > > >>>> > > >> > as
> > > > > > >>>> > > >> > > >>>>> follows:
> > > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > > implementation
> > > > > > >>>> makes the
> > > > > > >>>> > > >> > > >>>>> implementation optional. And in my
> opinion, a
> > > > > default
> > > > > > >>>> > > >> > implementation
> > > > > > >>>> > > >> > > is and
> > > > > > >>>> > > >> > > >>>>> will remain a way of making implementation
> > > > optional
> > > > > > >>>> because
> > > > > > >>>> > > >> even in
> > > > > > >>>> > > >> > > future
> > > > > > >>>> > > >> > > >>>>> a developer can decide to implement the
> "old
> > > > > flavor"
> > > > > > >>>> without
> > > > > > >>>> > > >> > support
> > > > > > >>>> > > >> > > for
> > > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> > find
> > > it
> > > > > > >>>> suboptimal
> > > > > > >>>> > > >> that
> > > > > > >>>> > > >> > > with
> > > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no
> way
> > > to
> > > > > > check
> > > > > > >>>> at
> > > > > > >>>> > > >> runtime
> > > > > > >>>> > > >> > if
> > > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually
> support
> > > > > pausing.
> > > > > > >>>> (To
> > > > > > >>>> > do so,
> > > > > > >>>> > > >> > one
> > > > > > >>>> > > >> > > would
> > > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it
> again
> > > more
> > > > > > >>>> > complicated.)
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> However, we haven't changed it because
> Option
> > > a)
> > > > is
> > > > > > >>>> also not
> > > > > > >>>> > > >> > optimal
> > > > > > >>>> > > >> > > >>>>> or straight-forward:
> > > > > > >>>> > > >> > > >>>>> (1) We need to add two distinct yet similar
> > > > > > decorative
> > > > > > >>>> > > >> interfaces
> > > > > > >>>> > > >> > > >>>>> since, as mentioned, the signatures of the
> > > > methods
> > > > > > are
> > > > > > >>>> > > >> different.
> > > > > > >>>> > > >> > For
> > > > > > >>>> > > >> > > >>>>> example, we would need decorative
> interfaces
> > > like
> > > > > > >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > > > > > >>>> > > >> > `SourceReaderWithPausableSplits`.
> > > > > > >>>> > > >> > > >>>>> (2) As a consequence, we would need to
> > somehow
> > > > > > document
> > > > > > >>>> > > >> how/where
> > > > > > >>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>> implement both interfaces and how this
> > relates
> > > to
> > > > > > each
> > > > > > >>>> > other.
> > > > > > >>>> > > >> This
> > > > > > >>>> > > >> > > we could
> > > > > > >>>> > > >> > > >>>>> solve by adding a note in the interface of
> > > > > > >>>> SourceReader and
> > > > > > >>>> > > >> > > SplitReader and
> > > > > > >>>> > > >> > > >>>>> reference to the decorative interfaces but
> it
> > > > still
> > > > > > >>>> > increases
> > > > > > >>>> > > >> > > complexity
> > > > > > >>>> > > >> > > >>>>> too.
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> In summary, we see both as acceptable and
> > > > preferred
> > > > > > >>>> over
> > > > > > >>>> > other
> > > > > > >>>> > > >> > > >>>>> options. The question is if we can find a
> > > > solution
> > > > > or
> > > > > > >>>> > compromise
> > > > > > >>>> > > >> > > that is
> > > > > > >>>> > > >> > > >>>>> acceptable for everybody to reach
> consensus.
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> Please let us know what you think because
> we
> > > > would
> > > > > be
> > > > > > >>>> happy
> > > > > > >>>> > if
> > > > > > >>>> > > >> we
> > > > > > >>>> > > >> > can
> > > > > > >>>> > > >> > > >>>>> conclude the discussion to avoid dropping
> the
> > > > > > >>>> initiative on
> > > > > > >>>> > this
> > > > > > >>>> > > >> > > FLIP.
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> Regards,
> > > > > > >>>> > > >> > > >>>>> Sebastian
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> [1]
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >>
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > > > > >>>> > > >> > > >>>>> (v. 12)
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr
> > Nowojski
> > > <
> > > > > > >>>> > > >> > pnowojski@apache.org
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > >>>>> wrote:
> > > > > > >>>> > > >> > > >>>>>
> > > > > > >>>> > > >> > > >>>>>> Hi Guowei,
> > > > > > >>>> > > >> > > >>>>>>
> > > > > > >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
> > > > > > >>>> > > >> > > >>>>>>
> > > > > > >>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1]
> > > which
> > > > > has
> > > > > > >>>> been
> > > > > > >>>> > > >> already
> > > > > > >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be
> > > > enabled
> > > > > > >>>> with the
> > > > > > >>>> > > >> > watermark
> > > > > > >>>> > > >> > > >>>>>> strategy which also configures drift and
> > > update
> > > > > > >>>> interval
> > > > > > >>>> > > >> > > >>>>>>
> > > > > > >>>> > > >> > > >>>>>> So by default watermark alignment is
> > disabled,
> > > > > > >>>> regardless
> > > > > > >>>> > if a
> > > > > > >>>> > > >> > > source
> > > > > > >>>> > > >> > > >>>>>> supports it or not.
> > > > > > >>>> > > >> > > >>>>>>
> > > > > > >>>> > > >> > > >>>>>> Best,
> > > > > > >>>> > > >> > > >>>>>> Piotrek
> > > > > > >>>> > > >> > > >>>>>>
> > > > > > >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> > > > > > >>>> guowei.mgw@gmail.com>
> > > > > > >>>> > > >> > > napisał(a):
> > > > > > >>>> > > >> > > >>>>>>
> > > > > > >>>> > > >> > > >>>>>>> Hi,
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>> We know that in the case of Bounded input
> > > Flink
> > > > > > >>>> supports
> > > > > > >>>> > the
> > > > > > >>>> > > >> > Batch
> > > > > > >>>> > > >> > > >>>>>>> execution mode. Currently in Batch
> > execution
> > > > > mode,
> > > > > > >>>> flink
> > > > > > >>>> > is
> > > > > > >>>> > > >> > > executed
> > > > > > >>>> > > >> > > >>>>>>> on a
> > > > > > >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way,
> perhaps
> > > > > > watermark
> > > > > > >>>> > alignment
> > > > > > >>>> > > >> > > might
> > > > > > >>>> > > >> > > >>>>>>> not
> > > > > > >>>> > > >> > > >>>>>>> gain much.
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>> So my question is: Is watermark alignment
> > the
> > > > > > default
> > > > > > >>>> > > >> > behavior(for
> > > > > > >>>> > > >> > > >>>>>>> implemented source only)? If so, have you
> > > > > > considered
> > > > > > >>>> > > >> evaluating
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> impact
> > > > > > >>>> > > >> > > >>>>>>> of this behavior on the Batch execution
> > mode?
> > > > Or
> > > > > > >>>> thinks
> > > > > > >>>> > it is
> > > > > > >>>> > > >> not
> > > > > > >>>> > > >> > > >>>>>>> necessary.
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>> Correct me if I miss something.
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>> Best,
> > > > > > >>>> > > >> > > >>>>>>> Guowei
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr
> > > Nowojski <
> > > > > > >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > > I feel that no matter which option we
> > > > choose
> > > > > > >>>> this can
> > > > > > >>>> > not
> > > > > > >>>> > > >> be
> > > > > > >>>> > > >> > > >>>>>>> solved
> > > > > > >>>> > > >> > > >>>>>>> > entirely in either of the options,
> > because
> > > of
> > > > > the
> > > > > > >>>> point
> > > > > > >>>> > > >> above
> > > > > > >>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>>>> because
> > > > > > >>>> > > >> > > >>>>>>> > the signature of
> > > > > SplitReader#pauseOrResumeSplits
> > > > > > >>>> and
> > > > > > >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are
> > > slightly
> > > > > > >>>> different
> > > > > > >>>> > (one
> > > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > > >>>> > > >> > > >>>>>>> > splits with splitId the other one
> passes
> > > the
> > > > > > splits
> > > > > > >>>> > > >> directly).
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case
> and
> > > for
> > > > > > >>>> features
> > > > > > >>>> > that
> > > > > > >>>> > > >> > need
> > > > > > >>>> > > >> > > >>>>>>> to be
> > > > > > >>>> > > >> > > >>>>>>> > implemented in more than one place.
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing
> reading
> > > > from
> > > > > a
> > > > > > >>>> split
> > > > > > >>>> > an
> > > > > > >>>> > > >> > > optional
> > > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > > >>>> > > >> > > >>>>>>> > > other than that this was not included
> > in
> > > > the
> > > > > > >>>> original
> > > > > > >>>> > > >> > > interface?
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > An additional argument in favor of
> making
> > > it
> > > > > > >>>> optional
> > > > > > >>>> > is to
> > > > > > >>>> > > >> > > >>>>>>> simplify source
> > > > > > >>>> > > >> > > >>>>>>> > implementation. But on its own I'm not
> > sure
> > > > if
> > > > > > that
> > > > > > >>>> > would be
> > > > > > >>>> > > >> > > >>>>>>> enough to
> > > > > > >>>> > > >> > > >>>>>>> > justify making this feature optional.
> > > Maybe.
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > > clearer
> > > > > to
> > > > > > >>>> just
> > > > > > >>>> > let
> > > > > > >>>> > > >> end
> > > > > > >>>> > > >> > > >>>>>>> users and
> > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> > implement
> > > > this
> > > > > > >>>> feature.
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > As I wrote above that would be an
> > > interesting
> > > > > > >>>> choice to
> > > > > > >>>> > make
> > > > > > >>>> > > >> > > (ease
> > > > > > >>>> > > >> > > >>>>>>> of
> > > > > > >>>> > > >> > > >>>>>>> > implementation for new users, vs system
> > > > > > >>>> consistency).
> > > > > > >>>> > > >> > Regardless
> > > > > > >>>> > > >> > > >>>>>>> of that,
> > > > > > >>>> > > >> > > >>>>>>> > yes, for me the main argument is the
> API
> > > > > backward
> > > > > > >>>> > > >> > compatibility.
> > > > > > >>>> > > >> > > >>>>>>> But let's
> > > > > > >>>> > > >> > > >>>>>>> > clear a couple of points:
> > > > > > >>>> > > >> > > >>>>>>> > - The current proposal adding methods
> to
> > > the
> > > > > base
> > > > > > >>>> > interface
> > > > > > >>>> > > >> > with
> > > > > > >>>> > > >> > > >>>>>>> default
> > > > > > >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature.
> > > Same
> > > > as
> > > > > > the
> > > > > > >>>> > > >> decorative
> > > > > > >>>> > > >> > > >>>>>>> version
> > > > > > >>>> > > >> > > >>>>>>> > would be.
> > > > > > >>>> > > >> > > >>>>>>> > - Decorative version could implement
> > "throw
> > > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> > > > > > >>>> > > >> > > >>>>>>> > if user enabled watermark alignment
> just
> > as
> > > > > well
> > > > > > >>>> and I
> > > > > > >>>> > agree
> > > > > > >>>> > > >> > > >>>>>>> that's a
> > > > > > >>>> > > >> > > >>>>>>> > better option compared to logging a
> > > warning.
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > Best,
> > > > > > >>>> > > >> > > >>>>>>> > Piotrek
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > > > > > >>>> > becket.qin@gmail.com>
> > > > > > >>>> > > >> > > >>>>>>> napisał(a):
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient
> > > > discussion,
> > > > > > >>>> Piotr and
> > > > > > >>>> > > >> Dawid.
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing
> reading
> > > > from
> > > > > a
> > > > > > >>>> split
> > > > > > >>>> > an
> > > > > > >>>> > > >> > > optional
> > > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > > >>>> > > >> > > >>>>>>> > > other than that this was not included
> > in
> > > > the
> > > > > > >>>> original
> > > > > > >>>> > > >> > > interface?
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > To be honest I am really worried
> about
> > > the
> > > > > > >>>> complexity
> > > > > > >>>> > of
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> user story
> > > > > > >>>> > > >> > > >>>>>>> > > here. Optional features like this
> have
> > a
> > > > high
> > > > > > >>>> > overhead.
> > > > > > >>>> > > >> > Imagine
> > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > >>>> > > >> > > >>>>>>> > > feature is optional, now a user
> enabled
> > > > > > watermark
> > > > > > >>>> > > >> alignment
> > > > > > >>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>>>> defined a
> > > > > > >>>> > > >> > > >>>>>>> > > few watermark groups. Would it work?
> > Hmm,
> > > > > that
> > > > > > >>>> > depends on
> > > > > > >>>> > > >> > > >>>>>>> whether the
> > > > > > >>>> > > >> > > >>>>>>> > > involved Source has implmemented this
> > > > > feature.
> > > > > > >>>> If the
> > > > > > >>>> > > >> Sources
> > > > > > >>>> > > >> > > >>>>>>> are well
> > > > > > >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end
> > > users
> > > > > may
> > > > > > >>>> have to
> > > > > > >>>> > > >> look
> > > > > > >>>> > > >> > > into
> > > > > > >>>> > > >> > > >>>>>>> the code
> > > > > > >>>> > > >> > > >>>>>>> > > of the Source to see whether the
> > feature
> > > is
> > > > > > >>>> supported.
> > > > > > >>>> > > >> Which
> > > > > > >>>> > > >> > is
> > > > > > >>>> > > >> > > >>>>>>> something
> > > > > > >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > > clearer
> > > > > to
> > > > > > >>>> just
> > > > > > >>>> > let
> > > > > > >>>> > > >> end
> > > > > > >>>> > > >> > > >>>>>>> users and
> > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> > implement
> > > > this
> > > > > > >>>> feature.
> > > > > > >>>> > > >> After
> > > > > > >>>> > > >> > > all
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to
> the
> > > end
> > > > > > >>>> users. If
> > > > > > >>>> > in
> > > > > > >>>> > > >> some
> > > > > > >>>> > > >> > > >>>>>>> rare cases,
> > > > > > >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a
> > clear
> > > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException
> > > > > > >>>> > > >> > > >>>>>>> > will
> > > > > > >>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly
> > > > remove
> > > > > > this
> > > > > > >>>> > Source
> > > > > > >>>> > > >> from
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >>>>>>> > watermark
> > > > > > >>>> > > >> > > >>>>>>> > > group. I don't think we should have a
> > > > warning
> > > > > > >>>> message
> > > > > > >>>> > > >> here,
> > > > > > >>>> > > >> > as
> > > > > > >>>> > > >> > > >>>>>>> they tend
> > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do
> > this,
> > > we
> > > > > > >>>> don't even
> > > > > > >>>> > > >> need
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> > supportXXX
> > > > > > >>>> > > >> > > >>>>>>> > > method in the Source for this
> feature.
> > In
> > > > > fact
> > > > > > >>>> this is
> > > > > > >>>> > > >> > exactly
> > > > > > >>>> > > >> > > >>>>>>> how many
> > > > > > >>>> > > >> > > >>>>>>> > > interfaces works today. For example,
> > > > > > >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > > > > >>>> > > >> > > >>>>>>> > not
> > > > > > >>>> > > >> > > >>>>>>> > > supported by Pravega source because
> it
> > > does
> > > > > not
> > > > > > >>>> > support
> > > > > > >>>> > > >> > partial
> > > > > > >>>> > > >> > > >>>>>>> failover.
> > > > > > >>>> > > >> > > >>>>>>> > > In that case, it simply throws an
> > > exception
> > > > > to
> > > > > > >>>> > trigger a
> > > > > > >>>> > > >> > global
> > > > > > >>>> > > >> > > >>>>>>> recovery.
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > The reason we add a default
> > > implementation
> > > > in
> > > > > > >>>> this
> > > > > > >>>> > case
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > > >>>>>>> just for
> > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so
> the
> > > old
> > > > > > >>>> source can
> > > > > > >>>> > > >> still
> > > > > > >>>> > > >> > > >>>>>>> compile.
> > > > > > >>>> > > >> > > >>>>>>> > Sure,
> > > > > > >>>> > > >> > > >>>>>>> > > in short term, this feature might not
> > be
> > > > > > >>>> supported by
> > > > > > >>>> > many
> > > > > > >>>> > > >> > > >>>>>>> existing
> > > > > > >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite
> > > > visible
> > > > > to
> > > > > > >>>> the
> > > > > > >>>> > source
> > > > > > >>>> > > >> > > >>>>>>> developers
> > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > >>>> > > >> > > >>>>>>> > > they did not override the default
> impl
> > > > which
> > > > > > >>>> throws an
> > > > > > >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > @Dawid,
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX()
> method
> > > in
> > > > > the
> > > > > > >>>> Source
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> the single
> > > > > > >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to
> > > > implement
> > > > > > >>>> this
> > > > > > >>>> > > >> feature.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > I also don't find it entirely true.
> > Half
> > > of
> > > > > the
> > > > > > >>>> > classes
> > > > > > >>>> > > >> are
> > > > > > >>>> > > >> > > >>>>>>> theoretically
> > > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes
> from
> > > the
> > > > > > >>>> point of
> > > > > > >>>> > view
> > > > > > >>>> > > >> how
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users
> do
> > > not
> > > > > > need
> > > > > > >>>> to
> > > > > > >>>> > use
> > > > > > >>>> > > >> any
> > > > > > >>>> > > >> > of
> > > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader.
> Would
> > > be
> > > > > > weird
> > > > > > >>>> to
> > > > > > >>>> > list
> > > > > > >>>> > > >> > their
> > > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java
> docs
> > is
> > > > to
> > > > > > >>>> guide
> > > > > > >>>> > users
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> implement the
> > > > > > >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the
> > > > preferred
> > > > > > way
> > > > > > >>>> to
> > > > > > >>>> > > >> > implement a
> > > > > > >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth
> mentioning
> > > > that.
> > > > > > >>>> Even the
> > > > > > >>>> > > >> Java
> > > > > > >>>> > > >> > > >>>>>>> language
> > > > > > >>>> > > >> > > >>>>>>> > > documentation interfaces lists the
> > konwn
> > > > > > >>>> > implementations
> > > > > > >>>> > > >> [1]
> > > > > > >>>> > > >> > so
> > > > > > >>>> > > >> > > >>>>>>> people
> > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > >>>> > > >> > > >>>>>>> > > leverage them. But for this
> particular
> > > > case,
> > > > > if
> > > > > > >>>> we
> > > > > > >>>> > make
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> feature
> > > > > > >>>> > > >> > > >>>>>>> > > non-optional, we don't even need the
> > > > > > supportXXX()
> > > > > > >>>> > method
> > > > > > >>>> > > >> for
> > > > > > >>>> > > >> > > now.
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > Thanks,
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid
> > > > > > Wysakowicz <
> > > > > > >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy
> > > with
> > > > > > >>>> whichever
> > > > > > >>>> > > >> option
> > > > > > >>>> > > >> > is
> > > > > > >>>> > > >> > > >>>>>>> agreed in
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > > discussion.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points
> from
> > > the
> > > > > > >>>> discussion
> > > > > > >>>> > > >> > though:
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > @Becket:
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > The main argument for adding the
> > > methods
> > > > to
> > > > > > the
> > > > > > >>>> > > >> > SourceReader
> > > > > > >>>> > > >> > > >>>>>>> is that
> > > > > > >>>> > > >> > > >>>>>>> > > these
> > > > > > >>>> > > >> > > >>>>>>> > > > methods are effectively
> NON-OPTIONAL
> > to
> > > > > > >>>> SourceReader
> > > > > > >>>> > > >> impl,
> > > > > > >>>> > > >> > > i.e.
> > > > > > >>>> > > >> > > >>>>>>> > starting
> > > > > > >>>> > > >> > > >>>>>>> > > > from this FLIP, all the
> SourceReaders
> > > > impl
> > > > > > are
> > > > > > >>>> > expected
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> support this
> > > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > > implementations
> > > > > may
> > > > > > >>>> not
> > > > > > >>>> > have
> > > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> > distinguish
> > > > the
> > > > > > new
> > > > > > >>>> > features
> > > > > > >>>> > > >> > from
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > > >>>> > > >> > > >>>>>>> > > > features. While the public
> decorative
> > > > > > >>>> interface is a
> > > > > > >>>> > > >> > solution
> > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not
> use
> > it
> > > > for
> > > > > > the
> > > > > > >>>> > features
> > > > > > >>>> > > >> > that
> > > > > > >>>> > > >> > > >>>>>>> are
> > > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > I don't think that this feature is
> > > > > > >>>> NON-OPTIONAL.
> > > > > > >>>> > Even
> > > > > > >>>> > > >> > though
> > > > > > >>>> > > >> > > >>>>>>> > preferred, I
> > > > > > >>>> > > >> > > >>>>>>> > > > still think it can be simply
> > optional.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX()
> > method
> > > > in
> > > > > > the
> > > > > > >>>> > Source
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > > be
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > single
> > > > > > >>>> > > >> > > >>>>>>> > > > source of truth regarding how to
> > > > implement
> > > > > > this
> > > > > > >>>> > feature.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > I also don't find it entirely true.
> > > Half
> > > > of
> > > > > > the
> > > > > > >>>> > classes
> > > > > > >>>> > > >> are
> > > > > > >>>> > > >> > > >>>>>>> > theoretically
> > > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes
> from
> > > the
> > > > > > >>>> point of
> > > > > > >>>> > view
> > > > > > >>>> > > >> how
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users
> do
> > > not
> > > > > > need
> > > > > > >>>> to
> > > > > > >>>> > use
> > > > > > >>>> > > >> any
> > > > > > >>>> > > >> > of
> > > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader.
> Would
> > > be
> > > > > > weird
> > > > > > >>>> to
> > > > > > >>>> > list
> > > > > > >>>> > > >> > their
> > > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > @Piotr
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > If we have all of the methods with
> > > > default
> > > > > > >>>> > > >> implementation
> > > > > > >>>> > > >> > in
> > > > > > >>>> > > >> > > >>>>>>> the base
> > > > > > >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any
> > > clue
> > > > to
> > > > > > >>>> the user
> > > > > > >>>> > > >> which
> > > > > > >>>> > > >> > > set
> > > > > > >>>> > > >> > > >>>>>>> of
> > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > >>>> > > >> > > >>>>>>> > > > are required to be implemented at
> the
> > > > same
> > > > > > >>>> time.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > I feel that no matter which option
> we
> > > > > choose
> > > > > > >>>> this
> > > > > > >>>> > can
> > > > > > >>>> > > >> not
> > > > > > >>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> solved
> > > > > > >>>> > > >> > > >>>>>>> > > > entirely in either of the options,
> > > > because
> > > > > of
> > > > > > >>>> the
> > > > > > >>>> > point
> > > > > > >>>> > > >> > above
> > > > > > >>>> > > >> > > >>>>>>> and
> > > > > > >>>> > > >> > > >>>>>>> > because
> > > > > > >>>> > > >> > > >>>>>>> > > > the signature of
> > > > > > >>>> SplitReader#pauseOrResumeSplits and
> > > > > > >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits
> are
> > > > > slightly
> > > > > > >>>> > different
> > > > > > >>>> > > >> > (one
> > > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > > >>>> > > >> > > >>>>>>> > > > splits with splitId the other one
> > > passes
> > > > > the
> > > > > > >>>> splits
> > > > > > >>>> > > >> > > directly).
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Dawid
> > > > > > >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin
> > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against
> the
> > > > > > >>>> decorative
> > > > > > >>>> > > >> > interfaces,
> > > > > > >>>> > > >> > > >>>>>>> but I do
> > > > > > >>>> > > >> > > >>>>>>> > > > think we should use them with
> > caution.
> > > > The
> > > > > > main
> > > > > > >>>> > argument
> > > > > > >>>> > > >> > for
> > > > > > >>>> > > >> > > >>>>>>> adding the
> > > > > > >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that
> > > these
> > > > > > >>>> methods
> > > > > > >>>> > are
> > > > > > >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to
> > > SourceReader
> > > > > > impl,
> > > > > > >>>> i.e.
> > > > > > >>>> > > >> > starting
> > > > > > >>>> > > >> > > >>>>>>> from this
> > > > > > >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl
> are
> > > > > expected
> > > > > > >>>> to
> > > > > > >>>> > support
> > > > > > >>>> > > >> > this
> > > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > > implementations
> > > > > may
> > > > > > >>>> not
> > > > > > >>>> > have
> > > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> > distinguish
> > > > the
> > > > > > new
> > > > > > >>>> > features
> > > > > > >>>> > > >> > from
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > > >>>> > > >> > > >>>>>>> > > > features. While the public
> decorative
> > > > > > >>>> interface is a
> > > > > > >>>> > > >> > solution
> > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not
> use
> > it
> > > > for
> > > > > > the
> > > > > > >>>> > features
> > > > > > >>>> > > >> > that
> > > > > > >>>> > > >> > > >>>>>>> are
> > > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > That said, this feature is optional
> > for
> > > > > > >>>> > SplitReaders.
> > > > > > >>>> > > >> > > Arguably
> > > > > > >>>> > > >> > > >>>>>>> we can
> > > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > > >>>> > > >> > > >>>>>>> > > > a decorative interface for that,
> but
> > > for
> > > > > > >>>> simplicity
> > > > > > >>>> > and
> > > > > > >>>> > > >> > > >>>>>>> symmetry of the
> > > > > > >>>> > > >> > > >>>>>>> > > > interface, personally I prefer just
> > > > adding
> > > > > a
> > > > > > >>>> new
> > > > > > >>>> > method.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Regarding the advantages you
> > mentioned
> > > > > about
> > > > > > >>>> the
> > > > > > >>>> > > >> decorative
> > > > > > >>>> > > >> > > >>>>>>> interfaces,
> > > > > > >>>> > > >> > > >>>>>>> > > > they would make sense if:
> > > > > > >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > > > > > >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative
> > > interface
> > > > > > >>>> involved
> > > > > > >>>> > for a
> > > > > > >>>> > > >> > > >>>>>>> feature.
> > > > > > >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the
> > > > methods
> > > > > > are
> > > > > > >>>> > grouped
> > > > > > >>>> > > >> > > >>>>>>> together will
> > > > > > >>>> > > >> > > >>>>>>> > not
> > > > > > >>>> > > >> > > >>>>>>> > > > stand.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Compared with that, I think the
> > current
> > > > > > >>>> solution
> > > > > > >>>> > works
> > > > > > >>>> > > >> fine
> > > > > > >>>> > > >> > > in
> > > > > > >>>> > > >> > > >>>>>>> all
> > > > > > >>>> > > >> > > >>>>>>> > cases,
> > > > > > >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in
> > > > Source,
> > > > > > and
> > > > > > >>>> > default
> > > > > > >>>> > > >> > > >>>>>>> methods /
> > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces in base
> > > > interfaces.".
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > The advantages are:
> > > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > > > > interface
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > In the current approach, the Java
> doc
> > > of
> > > > > the
> > > > > > >>>> > > >> SupportXXX()
> > > > > > >>>> > > >> > > >>>>>>> method in the
> > > > > > >>>> > > >> > > >>>>>>> > > > Source would be the single source
> of
> > > > truth
> > > > > > >>>> regarding
> > > > > > >>>> > > >> how to
> > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > > >>>> > > >> > > >>>>>>> > > > feature. It lists the method that
> has
> > > to
> > > > be
> > > > > > >>>> > implemented
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> support this
> > > > > > >>>> > > >> > > >>>>>>> > > > feature, regardless of how many
> > > classes /
> > > > > > >>>> > interfaces are
> > > > > > >>>> > > >> > > >>>>>>> involved.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > When implementing the base
> interface,
> > > > users
> > > > > > do
> > > > > > >>>> not
> > > > > > >>>> > need
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> implement a
> > > > > > >>>> > > >> > > >>>>>>> > > > method with default implementation.
> > If
> > > > they
> > > > > > are
> > > > > > >>>> > curious
> > > > > > >>>> > > >> > what
> > > > > > >>>> > > >> > > >>>>>>> the method
> > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > >>>> > > >> > > >>>>>>> > > > for, the java doc of that method
> > simply
> > > > > > points
> > > > > > >>>> > users to
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > SupportXXX()
> > > > > > >>>> > > >> > > >>>>>>> > > > method in the Source. It seems not
> > > adding
> > > > > > work
> > > > > > >>>> to
> > > > > > >>>> > the
> > > > > > >>>> > > >> users
> > > > > > >>>> > > >> > > >>>>>>> compared
> > > > > > >>>> > > >> > > >>>>>>> > with
> > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives
> much
> > > > > better
> > > > > > >>>> > > >> > discoverability.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > - all of the methods from a single
> > > > feature
> > > > > > are
> > > > > > >>>> > grouped
> > > > > > >>>> > > >> in a
> > > > > > >>>> > > >> > > >>>>>>> single
> > > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with
> > > their
> > > > > > >>>> dedicated
> > > > > > >>>> > java
> > > > > > >>>> > > >> > doc.
> > > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> > using
> > > > the
> > > > > > >>>> > decorator
> > > > > > >>>> > > >> name
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > - if an optional feature requires
> two
> > > > > methods
> > > > > > >>>> to be
> > > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > These two points are not true when
> > > > multiple
> > > > > > >>>> > components
> > > > > > >>>> > > >> and
> > > > > > >>>> > > >> > > >>>>>>> classes are
> > > > > > >>>> > > >> > > >>>>>>> > > > involved collaboratively to
> provide a
> > > > > > feature.
> > > > > > >>>> In
> > > > > > >>>> > our
> > > > > > >>>> > > >> case,
> > > > > > >>>> > > >> > > we
> > > > > > >>>> > > >> > > >>>>>>> have
> > > > > > >>>> > > >> > > >>>>>>> > both
> > > > > > >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader
> > involved.
> > > > And
> > > > > > >>>> there
> > > > > > >>>> > might
> > > > > > >>>> > > >> be
> > > > > > >>>> > > >> > > other
> > > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > > >>>> > > >> > > >>>>>>> > > > on the JM side involved for some
> > future
> > > > > > >>>> features.
> > > > > > >>>> > So the
> > > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > >>>> > > >> > > >>>>>>> > > > can actually be scattered over the
> > > > places.
> > > > > > That
> > > > > > >>>> > said, we
> > > > > > >>>> > > >> > may
> > > > > > >>>> > > >> > > >>>>>>> still use
> > > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces for each
> > > component,
> > > > > if
> > > > > > >>>> the
> > > > > > >>>> > > >> feature is
> > > > > > >>>> > > >> > > >>>>>>> optional,
> > > > > > >>>> > > >> > > >>>>>>> > > given
> > > > > > >>>> > > >> > > >>>>>>> > > > there is a single source of truth
> for
> > > the
> > > > > > >>>> feature.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards
> > > making
> > > > > > life
> > > > > > >>>> > easier
> > > > > > >>>> > > >> for
> > > > > > >>>> > > >> > new
> > > > > > >>>> > > >> > > >>>>>>> users,
> > > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the
> > > (imo)
> > > > > > >>>> slight
> > > > > > >>>> > expense
> > > > > > >>>> > > >> > for
> > > > > > >>>> > > >> > > >>>>>>> the power
> > > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > I actually think the current
> approach
> > > is
> > > > > > >>>> simpler,
> > > > > > >>>> > more
> > > > > > >>>> > > >> > > >>>>>>> extensible and
> > > > > > >>>> > > >> > > >>>>>>> > > more
> > > > > > >>>> > > >> > > >>>>>>> > > > general for all the users. Can you
> > > > > articulate
> > > > > > >>>> a bit
> > > > > > >>>> > > >> more on
> > > > > > >>>> > > >> > > >>>>>>> which part
> > > > > > >>>> > > >> > > >>>>>>> > > you
> > > > > > >>>> > > >> > > >>>>>>> > > > think makes users harder to
> > understand?
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > There is another benefit of the
> > > > decorative
> > > > > > >>>> > interfaces
> > > > > > >>>> > > >> which
> > > > > > >>>> > > >> > > is
> > > > > > >>>> > > >> > > >>>>>>> not
> > > > > > >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth
> > > considering
> > > > > > here.
> > > > > > >>>> > Usually
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > >>>> > > >> > > >>>>>>> > > > interfaces give slightly better
> > > backwards
> > > > > > >>>> > compatibility
> > > > > > >>>> > > >> > than
> > > > > > >>>> > > >> > > >>>>>>> the new
> > > > > > >>>> > > >> > > >>>>>>> > > > default method in the interfaces.
> > That
> > > is
> > > > > > when
> > > > > > >>>> > users are
> > > > > > >>>> > > >> > > using
> > > > > > >>>> > > >> > > >>>>>>> a jar
> > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > >>>> > > >> > > >>>>>>> > > > was compiled with an older version
> of
> > > > Flink
> > > > > > >>>> which
> > > > > > >>>> > does
> > > > > > >>>> > > >> not
> > > > > > >>>> > > >> > > >>>>>>> have the
> > > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > > >>>> > > >> > > >>>>>>> > > > method in the interfaces in
> > question. A
> > > > > > >>>> decorative
> > > > > > >>>> > > >> > interface
> > > > > > >>>> > > >> > > >>>>>>> may still
> > > > > > >>>> > > >> > > >>>>>>> > > > provide backwards compatibility in
> > that
> > > > > case,
> > > > > > >>>> while
> > > > > > >>>> > > >> default
> > > > > > >>>> > > >> > > >>>>>>> method impl
> > > > > > >>>> > > >> > > >>>>>>> > > > cannot.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do
> not
> > > > > > guarantee
> > > > > > >>>> > custom
> > > > > > >>>> > > >> > > >>>>>>> components
> > > > > > >>>> > > >> > > >>>>>>> > > compiled
> > > > > > >>>> > > >> > > >>>>>>> > > > with an older version can run with
> a
> > > > newer
> > > > > > >>>> version
> > > > > > >>>> > of
> > > > > > >>>> > > >> > Flink.
> > > > > > >>>> > > >> > > A
> > > > > > >>>> > > >> > > >>>>>>> > recompile
> > > > > > >>>> > > >> > > >>>>>>> > > > with a newer version would be
> > required.
> > > > > That
> > > > > > >>>> said,
> > > > > > >>>> > if
> > > > > > >>>> > > >> we do
> > > > > > >>>> > > >> > > >>>>>>> care about
> > > > > > >>>> > > >> > > >>>>>>> > > > this, we can just change the
> > > > "supportXXX()"
> > > > > > >>>> method
> > > > > > >>>> > in
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> Source
> > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and
> > leave
> > > > the
> > > > > > >>>> other
> > > > > > >>>> > parts
> > > > > > >>>> > > >> > > >>>>>>> unchanged.
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Thanks,
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM
> Piotr
> > > > > > Nowojski <
> > > > > > >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> Hi,
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late,
> but I
> > > was
> > > > > OoO
> > > > > > >>>> for
> > > > > > >>>> > the
> > > > > > >>>> > > >> last
> > > > > > >>>> > > >> > > two
> > > > > > >>>> > > >> > > >>>>>>> weeks.
> > > > > > >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would
> be
> > > > > > actually
> > > > > > >>>> > against
> > > > > > >>>> > > >> > adding
> > > > > > >>>> > > >> > > >>>>>>> those
> > > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > > >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the
> > reasons
> > > > > > >>>> mentioned
> > > > > > >>>> > above.
> > > > > > >>>> > > >> > > >>>>>>> Clogging the
> > > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > > >>>> > > >> > > >>>>>>> > > >> interface for new users with tons
> of
> > > > > methods
> > > > > > >>>> that
> > > > > > >>>> > they
> > > > > > >>>> > > >> do
> > > > > > >>>> > > >> > > not
> > > > > > >>>> > > >> > > >>>>>>> need, do
> > > > > > >>>> > > >> > > >>>>>>> > > not
> > > > > > >>>> > > >> > > >>>>>>> > > >> understand and do not know what to
> > do
> > > > with
> > > > > > >>>> them.
> > > > > > >>>> > > >> Moreover,
> > > > > > >>>> > > >> > > >>>>>>> such
> > > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem
> if
> > a
> > > > > > feature
> > > > > > >>>> > requires
> > > > > > >>>> > > >> two
> > > > > > >>>> > > >> > > or
> > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > > >>>> > > >> > > >>>>>>> > > >> to be implemented at the same
> time.
> > If
> > > > we
> > > > > > >>>> have all
> > > > > > >>>> > of
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> methods with
> > > > > > >>>> > > >> > > >>>>>>> > > >> default implementation in the base
> > > > > > interface,
> > > > > > >>>> the
> > > > > > >>>> > API
> > > > > > >>>> > > >> > > doesn't
> > > > > > >>>> > > >> > > >>>>>>> give any
> > > > > > >>>> > > >> > > >>>>>>> > > >> clue
> > > > > > >>>> > > >> > > >>>>>>> > > >> to the user which set of methods
> are
> > > > > > required
> > > > > > >>>> to be
> > > > > > >>>> > > >> > > >>>>>>> implemented at the
> > > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > > >>>> > > >> > > >>>>>>> > > >> time.
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback
> of
> > > > > > decorative
> > > > > > >>>> > > >> interfaces
> > > > > > >>>> > > >> > is
> > > > > > >>>> > > >> > > >>>>>>> which
> > > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> > they can decorate and which
> > > > combinations
> > > > > > of
> > > > > > >>>> > multiple
> > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > > >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new
> > > > feature
> > > > > > >>>> added
> > > > > > >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned
> > data
> > > > > > >>>> aware), are
> > > > > > >>>> > we
> > > > > > >>>> > > >> > going
> > > > > > >>>> > > >> > > >>>>>>> to create
> > > > > > >>>> > > >> > > >>>>>>> > > >> another
> > > > > > >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
> > > > > > >>>> > SortedSplitReader or
> > > > > > >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think
> > the
> > > > > > >>>> additional
> > > > > > >>>> > > >> > decorative
> > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually
> > > increases
> > > > > the
> > > > > > >>>> > > >> understanding
> > > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > > >>>> > > >> > > >>>>>>> > users
> > > > > > >>>> > > >> > > >>>>>>> > > >> > because they have to know what
> > > > > decorative
> > > > > > >>>> > interfaces
> > > > > > >>>> > > >> are
> > > > > > >>>> > > >> > > >>>>>>> there,
> > > > > > >>>> > > >> > > >>>>>>> > which
> > > > > > >>>> > > >> > > >>>>>>> > > >> > interface they can decorate and
> > > which
> > > > > > >>>> > combinations of
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which
> are
> > > > not.
> > > > > > >>>> Ideally
> > > > > > >>>> > we
> > > > > > >>>> > > >> want
> > > > > > >>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> avoid
> > > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how
> > > > embedding
> > > > > > >>>> default
> > > > > > >>>> > > >> methods
> > > > > > >>>> > > >> > > in
> > > > > > >>>> > > >> > > >>>>>>> the base
> > > > > > >>>> > > >> > > >>>>>>> > > >> interface is solving the problem:
> > what
> > > > can
> > > > > > be
> > > > > > >>>> > combined
> > > > > > >>>> > > >> or
> > > > > > >>>> > > >> > > >>>>>>> not? If
> > > > > > >>>> > > >> > > >>>>>>> > there
> > > > > > >>>> > > >> > > >>>>>>> > > >> are
> > > > > > >>>> > > >> > > >>>>>>> > > >> two conflicting features, having
> > > > > decorative
> > > > > > >>>> > interfaces
> > > > > > >>>> > > >> > that
> > > > > > >>>> > > >> > > >>>>>>> can not be
> > > > > > >>>> > > >> > > >>>>>>> > > >> mixed together actually makes much
> > > more
> > > > > > sense
> > > > > > >>>> to me
> > > > > > >>>> > > >> rather
> > > > > > >>>> > > >> > > >>>>>>> than having
> > > > > > >>>> > > >> > > >>>>>>> > > >> them
> > > > > > >>>> > > >> > > >>>>>>> > > >> all in one base class. How would
> you
> > > > allow
> > > > > > >>>> users to
> > > > > > >>>> > > >> > > implement
> > > > > > >>>> > > >> > > >>>>>>> only one
> > > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> those two features?
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes,
> > there
> > > > are
> > > > > > >>>> > drawbacks:
> > > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user discover what
> > > > decorators
> > > > > > are
> > > > > > >>>> > there?
> > > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user know where the
> > > > decorator
> > > > > > can
> > > > > > >>>> be
> > > > > > >>>> > > >> applied
> > > > > > >>>> > > >> > to?
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for
> more
> > > > power
> > > > > > >>>> users,
> > > > > > >>>> > that
> > > > > > >>>> > > >> can
> > > > > > >>>> > > >> > > be
> > > > > > >>>> > > >> > > >>>>>>> > mitigated
> > > > > > >>>> > > >> > > >>>>>>> > > by
> > > > > > >>>> > > >> > > >>>>>>> > > >> the documentation. For example
> > listing
> > > > all
> > > > > > of
> > > > > > >>>> the
> > > > > > >>>> > > >> > decorators
> > > > > > >>>> > > >> > > >>>>>>> with
> > > > > > >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the
> > docs
> > > > and
> > > > > in
> > > > > > >>>> the
> > > > > > >>>> > java
> > > > > > >>>> > > >> > docs.
> > > > > > >>>> > > >> > > >>>>>>> More
> > > > > > >>>> > > >> > > >>>>>>> > > >> experienced users will be able to
> > deal
> > > > > with
> > > > > > >>>> those
> > > > > > >>>> > > >> issues
> > > > > > >>>> > > >> > > >>>>>>> easier, as
> > > > > > >>>> > > >> > > >>>>>>> > they
> > > > > > >>>> > > >> > > >>>>>>> > > >> will already have some basic
> > > > understanding
> > > > > > of
> > > > > > >>>> > Flink.
> > > > > > >>>> > > >> Also
> > > > > > >>>> > > >> > if
> > > > > > >>>> > > >> > > >>>>>>> user has
> > > > > > >>>> > > >> > > >>>>>>> > a
> > > > > > >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he
> > > will
> > > > > > google
> > > > > > >>>> > search a
> > > > > > >>>> > > >> > > >>>>>>> potential
> > > > > > >>>> > > >> > > >>>>>>> > > solution
> > > > > > >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while
> > doing
> > > > > that
> > > > > > >>>> he is
> > > > > > >>>> > very
> > > > > > >>>> > > >> > > likely
> > > > > > >>>> > > >> > > >>>>>>> to
> > > > > > >>>> > > >> > > >>>>>>> > > discover
> > > > > > >>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway
> > in
> > > > the
> > > > > > >>>> docs.
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> The advantages are:
> > > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > > > > interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> - all of the methods from a single
> > > > feature
> > > > > > are
> > > > > > >>>> > grouped
> > > > > > >>>> > > >> in
> > > > > > >>>> > > >> > a
> > > > > > >>>> > > >> > > >>>>>>> single
> > > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with
> > > their
> > > > > > >>>> dedicated
> > > > > > >>>> > java
> > > > > > >>>> > > >> > doc.
> > > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> > using
> > > > the
> > > > > > >>>> > decorator
> > > > > > >>>> > > >> name
> > > > > > >>>> > > >> > > >>>>>>> > > >> - if an optional feature requires
> > two
> > > > > > methods
> > > > > > >>>> to be
> > > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards
> > > > making
> > > > > > life
> > > > > > >>>> > easier
> > > > > > >>>> > > >> for
> > > > > > >>>> > > >> > > new
> > > > > > >>>> > > >> > > >>>>>>> users,
> > > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the
> > > (imo)
> > > > > > >>>> slight
> > > > > > >>>> > expense
> > > > > > >>>> > > >> > for
> > > > > > >>>> > > >> > > >>>>>>> the power
> > > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> Piotrek
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket
> Qin
> > <
> > > > > > >>>> > > >> becket.qin@gmail.com
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > napisał(a):
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian
> and
> > > > > Dawid,
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good
> > > summary.
> > > > > This
> > > > > > >>>> is a
> > > > > > >>>> > > >> really
> > > > > > >>>> > > >> > > >>>>>>> helpful
> > > > > > >>>> > > >> > > >>>>>>> > > >> discussion.
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that
> > it
> > > > > might
> > > > > > >>>> still
> > > > > > >>>> > be
> > > > > > >>>> > > >> > better
> > > > > > >>>> > > >> > > >>>>>>> to add the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the
> Source
> > > > > rather
> > > > > > >>>> than
> > > > > > >>>> > > >> > > SourceReader.
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are
> > > trying
> > > > > to
> > > > > > do
> > > > > > >>>> > here is
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > let
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > > >>>> > > >> > > >>>>>>> > > >> > framework know what the Source
> is
> > > > > capable
> > > > > > >>>> of. In
> > > > > > >>>> > this
> > > > > > >>>> > > >> > > FLIP,
> > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > >>>> > > >> > > >>>>>>> > happens
> > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > be the capability that only
> > involves
> > > > > > >>>> > SourceReader.
> > > > > > >>>> > > >> But
> > > > > > >>>> > > >> > in
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > future,
> > > > > > >>>> > > >> > > >>>>>>> > > >> it is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > possible that another
> > functionality
> > > > > > involves
> > > > > > >>>> > both the
> > > > > > >>>> > > >> > > >>>>>>> > SplitEnumerator
> > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case,
> > > following
> > > > > the
> > > > > > >>>> current
> > > > > > >>>> > > >> > > approach,
> > > > > > >>>> > > >> > > >>>>>>> we
> > > > > > >>>> > > >> > > >>>>>>> > should
> > > > > > >>>> > > >> > > >>>>>>> > > >> put
> > > > > > >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in
> both
> > > > > > >>>> > SplitEnumerator
> > > > > > >>>> > > >> and
> > > > > > >>>> > > >> > > >>>>>>> SourceReader.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in
> the
> > > > > > >>>> SourceReader,
> > > > > > >>>> > then
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> JM would
> > > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order
> to
> > > know
> > > > > > >>>> whether
> > > > > > >>>> > this
> > > > > > >>>> > > >> > > feature
> > > > > > >>>> > > >> > > >>>>>>> is
> > > > > > >>>> > > >> > > >>>>>>> > > >> supported,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if
> we
> > > put
> > > > > the
> > > > > > >>>> > > >> > "supportsXXX()"
> > > > > > >>>> > > >> > > >>>>>>> method in
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Source, we will break the
> > > "symmetric"
> > > > > > design
> > > > > > >>>> > because
> > > > > > >>>> > > >> > this
> > > > > > >>>> > > >> > > >>>>>>> FLIP
> > > > > > >>>> > > >> > > >>>>>>> > chose a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > different way.
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > This is also why I think
> > > supportsXXX()
> > > > > > >>>> method
> > > > > > >>>> > seems a
> > > > > > >>>> > > >> > good
> > > > > > >>>> > > >> > > >>>>>>> thing to
> > > > > > >>>> > > >> > > >>>>>>> > > >> have,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > because when there are a few
> > > > interfaces
> > > > > /
> > > > > > >>>> methods
> > > > > > >>>> > > >> that
> > > > > > >>>> > > >> > are
> > > > > > >>>> > > >> > > >>>>>>> expected
> > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> be
> > > > > > >>>> > > >> > > >>>>>>> > > >> > implemented at the same time in
> > > order
> > > > to
> > > > > > >>>> deliver
> > > > > > >>>> > a
> > > > > > >>>> > > >> > > feature,
> > > > > > >>>> > > >> > > >>>>>>> it is
> > > > > > >>>> > > >> > > >>>>>>> > > always
> > > > > > >>>> > > >> > > >>>>>>> > > >> > good to have a single source of
> > > truth
> > > > to
> > > > > > >>>> tell the
> > > > > > >>>> > > >> > > framework
> > > > > > >>>> > > >> > > >>>>>>> what to
> > > > > > >>>> > > >> > > >>>>>>> > > do,
> > > > > > >>>> > > >> > > >>>>>>> > > >> so
> > > > > > >>>> > > >> > > >>>>>>> > > >> > the framework can do consistent
> > > things
> > > > > in
> > > > > > >>>> > different
> > > > > > >>>> > > >> > parts.
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> > > > > > >>>> sebastian@ververica.com>
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b),
> > i.e.
> > > > > > >>>> > > >> AlignedSourceReader
> > > > > > >>>> > > >> > +
> > > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel
> > > > awkward
> > > > > > >>>> about is
> > > > > > >>>> > > >> that we
> > > > > > >>>> > > >> > > are
> > > > > > >>>> > > >> > > >>>>>>> > > essentially
> > > > > > >>>> > > >> > > >>>>>>> > > >> > expecting almost all the
> > > SourceReader
> > > > > > >>>> > > >> implementations to
> > > > > > >>>> > > >> > > >>>>>>> extend
> > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which
> > effectively
> > > > > makes
> > > > > > >>>> the
> > > > > > >>>> > > >> > SourceReader
> > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> > without the pausing support
> > useless.
> > > > So
> > > > > > this
> > > > > > >>>> > > >> indicates
> > > > > > >>>> > > >> > > that
> > > > > > >>>> > > >> > > >>>>>>> public
> > > > > > >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or
> > > > sub-interfaces
> > > > > > >>>> for the
> > > > > > >>>> > same
> > > > > > >>>> > > >> > > >>>>>>> purpose) only
> > > > > > >>>> > > >> > > >>>>>>> > > >> > make sense if the original
> > interface
> > > > is
> > > > > > also
> > > > > > >>>> > > >> expected to
> > > > > > >>>> > > >> > > be
> > > > > > >>>> > > >> > > >>>>>>> used.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more
> > sense
> > > > to
> > > > > > add
> > > > > > >>>> the
> > > > > > >>>> > > >> method
> > > > > > >>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > original
> > > > > > >>>> > > >> > > >>>>>>> > > >> > interface itself.
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Cheers,
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM
> > > Dawid
> > > > > > >>>> Wysakowicz
> > > > > > >>>> > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice
> > > > > summary.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > I think most of your points
> > > aligned
> > > > > with
> > > > > > >>>> the
> > > > > > >>>> > > >> > suggestions
> > > > > > >>>> > > >> > > >>>>>>> I made to
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing
> > your
> > > > > reply
> > > > > > (I
> > > > > > >>>> > believe
> > > > > > >>>> > > >> we
> > > > > > >>>> > > >> > > hit
> > > > > > >>>> > > >> > > >>>>>>> enter
> > > > > > >>>> > > >> > > >>>>>>> > > >> nearly at
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced
> > offline
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > > > > > >>>> > > >> supportsWatermarksSplitAlignment
> > > > > > >>>> > > >> > > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to
> express
> > > the
> > > > > > >>>> general
> > > > > > >>>> > > >> > capability
> > > > > > >>>> > > >> > > of
> > > > > > >>>> > > >> > > >>>>>>> > pausing.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > > > > > >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > (option
> > > > > > >>>> > > >> > > >>>>>>> > > >> b)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the
> > > methods
> > > > > > >>>> (option
> > > > > > >>>> > c), I
> > > > > > >>>> > > >> > > >>>>>>> suggest to
> > > > > > >>>> > > >> > > >>>>>>> > simply
> > > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this
> > is
> > > > much
> > > > > > >>>> > preferred
> > > > > > >>>> > > >> > > approach
> > > > > > >>>> > > >> > > >>>>>>> Becket,
> > > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless
> > there
> > > > is
> > > > > an
> > > > > > >>>> > opposition
> > > > > > >>>> > > >> > > let's
> > > > > > >>>> > > >> > > >>>>>>> go with
> > > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > option c.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Dawid
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian
> > > > > Mattheis
> > > > > > >>>> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent.
> > Many
> > > > > > thanks
> > > > > > >>>> for
> > > > > > >>>> > all
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> input and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit
> new, I
> > > > > needed
> > > > > > >>>> some
> > > > > > >>>> > time
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> catch up and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > structure
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion
> and
> > I
> > > > > wanted
> > > > > > >>>> to
> > > > > > >>>> > find a
> > > > > > >>>> > > >> way
> > > > > > >>>> > > >> > > to
> > > > > > >>>> > > >> > > >>>>>>> > structure
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I
> had
> > > the
> > > > > > >>>> feeling
> > > > > > >>>> > that
> > > > > > >>>> > > >> some
> > > > > > >>>> > > >> > > >>>>>>> concerns
> > > > > > >>>> > > >> > > >>>>>>> > got
> > > > > > >>>> > > >> > > >>>>>>> > > >> lost
> > > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my
> > > attempt
> > > > > and
> > > > > > >>>> please
> > > > > > >>>> > > >> correct
> > > > > > >>>> > > >> > > me
> > > > > > >>>> > > >> > > >>>>>>> if
> > > > > > >>>> > > >> > > >>>>>>> > > >> something is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I
> tried
> > to
> > > > > > >>>> collect and
> > > > > > >>>> > > >> > assemble
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > > opinions,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions
> (to
> > > the
> > > > > > best
> > > > > > >>>> of my
> > > > > > >>>> > > >> > > >>>>>>> knowledge):
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split
> alignment
> > > > > > >>>> (pause/resume
> > > > > > >>>> > > >> > behavior)
> > > > > > >>>> > > >> > > >>>>>>> be a
> > > > > > >>>> > > >> > > >>>>>>> > general
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > capability?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any
> > reason
> > > no
> > > > > to
> > > > > > >>>> have
> > > > > > >>>> > it a
> > > > > > >>>> > > >> > > general
> > > > > > >>>> > > >> > > >>>>>>> > capability
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit
> > method
> > > it
> > > > > is
> > > > > > >>>> > actually
> > > > > > >>>> > > >> > > >>>>>>> independent of
> > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here
> to
> > > have
> > > > > it
> > > > > > a
> > > > > > >>>> > general
> > > > > > >>>> > > >> > > >>>>>>> capability, we
> > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > also agree on the right
> wording.
> > > > Does
> > > > > > >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what
> is
> > > > then
> > > > > > >>>> actually
> > > > > > >>>> > > >> meant?
> > > > > > >>>> > > >> > (I
> > > > > > >>>> > > >> > > >>>>>>> see it as
> > > > > > >>>> > > >> > > >>>>>>> > > >> okay.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea
> > whilst
> > > > > Arvid
> > > > > > >>>> > suggested
> > > > > > >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible
> > do
> > > > > > >>>> > enable/disable
> > > > > > >>>> > > >> split
> > > > > > >>>> > > >> > > >>>>>>> alignment?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > I would personally not disable
> > the
> > > > > split
> > > > > > >>>> > alignment
> > > > > > >>>> > > >> on
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >>>>>>> source
> > > > > > >>>> > > >> > > >>>>>>> > > >> reader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > side because if split
> alignment
> > is
> > > > > used
> > > > > > >>>> for
> > > > > > >>>> > some
> > > > > > >>>> > > >> other
> > > > > > >>>> > > >> > > >>>>>>> use case
> > > > > > >>>> > > >> > > >>>>>>> > (see
> > > > > > >>>> > > >> > > >>>>>>> > > >> A)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects
> on
> > > > > > >>>> other/future
> > > > > > >>>> > use
> > > > > > >>>> > > >> > cases.
> > > > > > >>>> > > >> > > >>>>>>> Instead,
> > > > > > >>>> > > >> > > >>>>>>> > I
> > > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split
> > > alignment"
> > > > > > where
> > > > > > >>>> I
> > > > > > >>>> > think
> > > > > > >>>> > > >> it
> > > > > > >>>> > > >> > > >>>>>>> should
> > > > > > >>>> > > >> > > >>>>>>> > disable
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger
> for
> > > > split
> > > > > > >>>> > alignment.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a
> > supportsX
> > > > > > method?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define
> > the
> > > > > scope
> > > > > > >>>> of a
> > > > > > >>>> > > >> supportsX
> > > > > > >>>> > > >> > > >>>>>>> method
> > > > > > >>>> > > >> > > >>>>>>> > w.r.t.
> > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > the following questions: a)
> > Where
> > > is
> > > > > it
> > > > > > >>>> used?
> > > > > > >>>> > and
> > > > > > >>>> > > >> b)
> > > > > > >>>> > > >> > > What
> > > > > > >>>> > > >> > > >>>>>>> is the
> > > > > > >>>> > > >> > > >>>>>>> > > >> expected
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not
> > > > > straight-forward
> > > > > > >>>> to
> > > > > > >>>> > > >> provide a
> > > > > > >>>> > > >> > > >>>>>>> meaningful
> > > > > > >>>> > > >> > > >>>>>>> > > >> output,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports
> > > split
> > > > > > >>>> alignment
> > > > > > >>>> > but
> > > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > > >>>> > > >> > > >>>>>>> > not.
> > > > > > >>>> > > >> > > >>>>>>> > > >> This
> > > > > > >>>> > > >> > > >>>>>>> > > >> > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > because with the current
> > > > > implementation,
> > > > > > >>>> we can
> > > > > > >>>> > > >> > > determine
> > > > > > >>>> > > >> > > >>>>>>> whether
> > > > > > >>>> > > >> > > >>>>>>> > > >> split
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported
> > only
> > > > > during
> > > > > > >>>> > runtime
> > > > > > >>>> > > >> and
> > > > > > >>>> > > >> > > >>>>>>> specifically
> > > > > > >>>> > > >> > > >>>>>>> > > >> > actually
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits
> > down
> > > > the
> > > > > > >>>> call
> > > > > > >>>> > > >> hierarchy
> > > > > > >>>> > > >> > up
> > > > > > >>>> > > >> > > >>>>>>> to the
> > > > > > >>>> > > >> > > >>>>>>> > > actual
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to
> > > either
> > > > > > >>>> raise an
> > > > > > >>>> > > >> error or
> > > > > > >>>> > > >> > > >>>>>>> warning if
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not
> > > > supported
> > > > > at
> > > > > > >>>> some
> > > > > > >>>> > > >> point. I
> > > > > > >>>> > > >> > > >>>>>>> know we
> > > > > > >>>> > > >> > > >>>>>>> > > should
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > carefully think about when
> this
> > > > could
> > > > > be
> > > > > > >>>> the
> > > > > > >>>> > case
> > > > > > >>>> > > >> > > because
> > > > > > >>>> > > >> > > >>>>>>> we don't
> > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such
> > warnings.
> > > > > > However,
> > > > > > >>>> > warnings
> > > > > > >>>> > > >> > > could
> > > > > > >>>> > > >> > > >>>>>>> be an
> > > > > > >>>> > > >> > > >>>>>>> > > >> indicator
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > for the user that for
> watermark
> > > > split
> > > > > > >>>> > alignment use
> > > > > > >>>> > > >> > case
> > > > > > >>>> > > >> > > >>>>>>> split
> > > > > > >>>> > > >> > > >>>>>>> > > >> reading is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion
> > to
> > > > > either
> > > > > > >>>> > disable
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> trigger for
> > > > > > >>>> > > >> > > >>>>>>> > > >> > watermark
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or
> > to
> > > > > > >>>> > use/implement a
> > > > > > >>>> > > >> > source
> > > > > > >>>> > > >> > > >>>>>>> and
> > > > > > >>>> > > >> > > >>>>>>> > reader
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > fully supports split
> alignment.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design
> > interfaces?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the
> > > > discussion
> > > > > > >>>> with the
> > > > > > >>>> > the
> > > > > > >>>> > > >> > > various
> > > > > > >>>> > > >> > > >>>>>>> > > >> possibilities
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and
> > > > > emails, I
> > > > > > >>>> would
> > > > > > >>>> > > >> like to
> > > > > > >>>> > > >> > > >>>>>>> summarize
> > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be
> > consistent
> > > > > > >>>> > ("symmetric"),
> > > > > > >>>> > > >> i.e.,
> > > > > > >>>> > > >> > > >>>>>>> similar
> > > > > > >>>> > > >> > > >>>>>>> > > >> semantics
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces
> > > with
> > > > > > >>>> similar
> > > > > > >>>> > usage.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which
> > > > implementations
> > > > > > >>>> implement
> > > > > > >>>> > > >> > > >>>>>>> interfaces/support
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > behavior.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
> > > > > > >>>> implementations
> > > > > > >>>> > and
> > > > > > >>>> > > >> how
> > > > > > >>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired
> > behavior.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of
> the
> > > > > > relations
> > > > > > >>>> > between
> > > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > > >>>> > > >> > > >>>>>>> > classes
> > > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> > > > > > >>>> SourceReaderBase
> > > > > > >>>> > > >> (Internal)
> > > > > > >>>> > > >> > > >>>>>>> <|-- ..
> > > > > > >>>> > > >> > > >>>>>>> > <|--
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>--
> > > > > SplitFetcherManager
> > > > > > >>>> > (Internal)
> > > > > > >>>> > > >> > <>--
> > > > > > >>>> > > >> > > >>>>>>> > SplitFetcher
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader
> > > (Public)
> > > > > > <|--
> > > > > > >>>> > > >> > MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A;
> A
> > > <>--
> > > > > B:
> > > > > > A
> > > > > > >>>> "has
> > > > > > >>>> > a"
> > > > > > >>>> > > >> B)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> > > > > > >>>> > SplitFetcherManager
> > > > > > >>>> > > >> > > >>>>>>> implement most
> > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment
> just
> > > > > because
> > > > > > >>>> we
> > > > > > >>>> > wanted
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> implement
> > > > > > >>>> > > >> > > >>>>>>> > > split
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be
> > > > > available
> > > > > > as
> > > > > > >>>> > kind of
> > > > > > >>>> > > >> a
> > > > > > >>>> > > >> > > >>>>>>> default. As a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a
> "default
> > > > > > >>>> > implementation" for
> > > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic
> for
> > > > split
> > > > > > >>>> > alignment.
> > > > > > >>>> > > >> For
> > > > > > >>>> > > >> > > that
> > > > > > >>>> > > >> > > >>>>>>> reason,
> > > > > > >>>> > > >> > > >>>>>>> > I
> > > > > > >>>> > > >> > > >>>>>>> > > >> find
> > > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP
> > > > default
> > > > > > >>>> > > >> implementation
> > > > > > >>>> > > >> > in
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> for
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a
> > > consequence,
> > > > > > >>>> interface
> > > > > > >>>> > > >> strategy
> > > > > > >>>> > > >> > > c)
> > > > > > >>>> > > >> > > >>>>>>> is
> > > > > > >>>> > > >> > > >>>>>>> > > difficult
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > because this would require
> NOOP
> > > > > default
> > > > > > >>>> > > >> > implementations
> > > > > > >>>> > > >> > > >>>>>>> in the
> > > > > > >>>> > > >> > > >>>>>>> > > public
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and
> > > > > > >>>> SplitReader.
> > > > > > >>>> > This is
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> same for
> > > > > > >>>> > > >> > > >>>>>>> > > >> strategy
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > d) because it would require
> NOOP
> > > > > default
> > > > > > >>>> > > >> > implementation
> > > > > > >>>> > > >> > > >>>>>>> in the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as
> Dawid
> > > > > > described
> > > > > > >>>> > method
> > > > > > >>>> > > >> > > >>>>>>> signatures of
> > > > > > >>>> > > >> > > >>>>>>> > > >> alignSplit
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and
> SplitReader
> > > > > differ
> > > > > > >>>> and it
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> extremely
> > > > > > >>>> > > >> > > >>>>>>> > > >> > difficult
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the
> same
> > > > (with
> > > > > > even
> > > > > > >>>> > > >> potential
> > > > > > >>>> > > >> > > >>>>>>> performance
> > > > > > >>>> > > >> > > >>>>>>> > > >> impact
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups
> > of
> > > > > split
> > > > > > >>>> ids).
> > > > > > >>>> > > >> > Therefore,
> > > > > > >>>> > > >> > > >>>>>>> having a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > symmetric
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of
> > > strategy
> > > > a)
> > > > > > is
> > > > > > >>>> > actually
> > > > > > >>>> > > >> not
> > > > > > >>>> > > >> > > >>>>>>> possible
> > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > having
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with
> > > > > different
> > > > > > >>>> method
> > > > > > >>>> > > >> > > >>>>>>> signatures is
> > > > > > >>>> > > >> > > >>>>>>> > > >> confusing.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > My
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best
> > > with
> > > > > > >>>> strategy b)
> > > > > > >>>> > > >> which
> > > > > > >>>> > > >> > > >>>>>>> means to
> > > > > > >>>> > > >> > > >>>>>>> > have
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces
> that
> > > > > inherit
> > > > > > >>>> from
> > > > > > >>>> > the
> > > > > > >>>> > > >> > parent
> > > > > > >>>> > > >> > > >>>>>>> > interface:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|--
> > > > AlignedSourceReader,
> > > > > > >>>> > SplitReader
> > > > > > >>>> > > >> > <|--
> > > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100%
> > > sure
> > > > > what
> > > > > > >>>> the
> > > > > > >>>> > > >> > > implications
> > > > > > >>>> > > >> > > >>>>>>> are and
> > > > > > >>>> > > >> > > >>>>>>> > if
> > > > > > >>>> > > >> > > >>>>>>> > > >> this
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would
> suggest
> > > > that
> > > > > > >>>> Dawid
> > > > > > >>>> > and I
> > > > > > >>>> > > >> just
> > > > > > >>>> > > >> > > >>>>>>> try to
> > > > > > >>>> > > >> > > >>>>>>> > > >> implement
> > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > # Summary
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me
> > know
> > > > your
> > > > > > >>>> > > >> perspectives.
> > > > > > >>>> > > >> > > >>>>>>> Please
> > > > > > >>>> > > >> > > >>>>>>> > correct
> > > > > > >>>> > > >> > > >>>>>>> > > >> me,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > if
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I
> > > > > misunderstood
> > > > > > >>>> > > >> something. My
> > > > > > >>>> > > >> > > >>>>>>> perspective
> > > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > > >>>> > > >> > > >>>>>>> > > >> > be:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable
> > watermark
> > > > > > trigger
> > > > > > >>>> for
> > > > > > >>>> > split
> > > > > > >>>> > > >> > > >>>>>>> alignment)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55
> AM
> > > > Dawid
> > > > > > >>>> > Wysakowicz <
> > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > > capability
> > > > > > >>>> > approach, I
> > > > > > >>>> > > >> > fear
> > > > > > >>>> > > >> > > >>>>>>> that it
> > > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> > > > Sources
> > > > > > can
> > > > > > >>>> > always
> > > > > > >>>> > > >> be
> > > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > > > > alignment.
> > > > > > >>>> So
> > > > > > >>>> > it's
> > > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even
> > > though
> > > > > > they
> > > > > > >>>> do
> > > > > > >>>> > not
> > > > > > >>>> > > >> use
> > > > > > >>>> > > >> > any
> > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should
> > just
> > > > > > issue a
> > > > > > >>>> > > >> warning, if
> > > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> > section).
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX"
> > method
> > > > > would
> > > > > > >>>> be for
> > > > > > >>>> > > >> telling
> > > > > > >>>> > > >> > > if
> > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > >>>> > > >> > > >>>>>>> > > supports
> > > > > > >>>> > > >> > > >>>>>>> > > >> > that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment.
> > Cross-task
> > > > > > >>>> alignment
> > > > > > >>>> > would
> > > > > > >>>> > > >> > always
> > > > > > >>>> > > >> > > >>>>>>> be
> > > > > > >>>> > > >> > > >>>>>>> > > supported.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what
> I
> > > > > believe
> > > > > > >>>> to be
> > > > > > >>>> > > >> closest
> > > > > > >>>> > > >> > > to a
> > > > > > >>>> > > >> > > >>>>>>> > consensus
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do
> > you
> > > > mind
> > > > > > >>>> taking a
> > > > > > >>>> > > >> look?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind
> > addressing
> > > > the
> > > > > > >>>> nits?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid
> > Heise
> > > > > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this
> effort.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of
> > 1b).
> > > I
> > > > > > fully
> > > > > > >>>> agree
> > > > > > >>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> decorator
> > > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm
> also
> > > not
> > > > a
> > > > > > big
> > > > > > >>>> fan
> > > > > > >>>> > of
> > > > > > >>>> > > >> > > >>>>>>> overloading the
> > > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to
> > > > > implement
> > > > > > as
> > > > > > >>>> > is). The
> > > > > > >>>> > > >> > > usual
> > > > > > >>>> > > >> > > >>>>>>> feedback
> > > > > > >>>> > > >> > > >>>>>>> > > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are
> > > > always
> > > > > > that
> > > > > > >>>> > they are
> > > > > > >>>> > > >> > > >>>>>>> overwhelming
> > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> too
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However,
> I'd
> > > > also
> > > > > > not
> > > > > > >>>> > oppose
> > > > > > >>>> > > >> 1c)
> > > > > > >>>> > > >> > as
> > > > > > >>>> > > >> > > >>>>>>> scattered
> > > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just
> > > > dislike
> > > > > > >>>> 1a) and
> > > > > > >>>> > 1d).
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > > capability
> > > > > > >>>> > approach, I
> > > > > > >>>> > > >> > fear
> > > > > > >>>> > > >> > > >>>>>>> that it
> > > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> > > > Sources
> > > > > > can
> > > > > > >>>> > always
> > > > > > >>>> > > >> be
> > > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > > > > alignment.
> > > > > > >>>> So
> > > > > > >>>> > it's
> > > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even
> > > though
> > > > > > they
> > > > > > >>>> do
> > > > > > >>>> > not
> > > > > > >>>> > > >> use
> > > > > > >>>> > > >> > any
> > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should
> > just
> > > > > > issue a
> > > > > > >>>> > > >> warning, if
> > > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> > section).
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits
> to
> > > > > > >>>> facilitate
> > > > > > >>>> > future
> > > > > > >>>> > > >> use
> > > > > > >>>> > > >> > > >>>>>>> cases makes
> > > > > > >>>> > > >> > > >>>>>>> > > >> sense
> > > > > > >>>> > > >> > > >>>>>>> > > >> > but
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c)
> is
> > > > > chosen)
> > > > > > >>>> should
> > > > > > >>>> > be
> > > > > > >>>> > > >> > > adjusted
> > > > > > >>>> > > >> > > >>>>>>> > > >> accordingly.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> > > > > > >>>> > PausingSourceReader
> > > > > > >>>> > > >> and
> > > > > > >>>> > > >> > > I'd
> > > > > > >>>> > > >> > > >>>>>>> go for
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's
> > > > > proposal
> > > > > > >>>> > afaik). We
> > > > > > >>>> > > >> > could
> > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > >>>> > > >> > > >>>>>>> > split
> > > > > > >>>> > > >> > > >>>>>>> > > it
> > > > > > >>>> > > >> > > >>>>>>> > > >> > into
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit.
> > While
> > > > > > >>>> > > >> pauseOrResumeSplits
> > > > > > >>>> > > >> > > >>>>>>> may allow
> > > > > > >>>> > > >> > > >>>>>>> > > >> Sources
> > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2
> library
> > > > calls
> > > > > > (as
> > > > > > >>>> > written
> > > > > > >>>> > > >> in
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >>>>>>> > Javadoc),
> > > > > > >>>> > > >> > > >>>>>>> > > >> both
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it
> > and
> > > > I'm
> > > > > > not
> > > > > > >>>> > sure if
> > > > > > >>>> > > >> > there
> > > > > > >>>> > > >> > > >>>>>>> is a
> > > > > > >>>> > > >> > > >>>>>>> > system
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> can.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with
> > > > "pause".
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it
> in
> > > the
> > > > > > >>>> capability
> > > > > > >>>> > > >> > section:
> > > > > > >>>> > > >> > > >>>>>>> Sources
> > > > > > >>>> > > >> > > >>>>>>> > that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > adopt
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used
> > in
> > > > > > earlier
> > > > > > >>>> > > >> versions. So
> > > > > > >>>> > > >> > > it
> > > > > > >>>> > > >> > > >>>>>>> feels
> > > > > > >>>> > > >> > > >>>>>>> > like
> > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > >>>> > > >> > > >>>>>>> > > >> > are
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old
> > > > sources
> > > > > > can
> > > > > > >>>> be
> > > > > > >>>> > used
> > > > > > >>>> > > >> > after
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > change);
> > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API
> > > > > addition.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what
> > > > happens
> > > > > > >>>> when all
> > > > > > >>>> > > >> splits
> > > > > > >>>> > > >> > > are
> > > > > > >>>> > > >> > > >>>>>>> paused.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe
> how
> > > the
> > > > 3
> > > > > > >>>> flavors
> > > > > > >>>> > of
> > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase
> > > > > > >>>> > > >> > > >>>>>>> > > >> > interact
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes
> > sense
> > > to
> > > > > > >>>> include
> > > > > > >>>> > Kafka
> > > > > > >>>> > > >> and
> > > > > > >>>> > > >> > > >>>>>>> Pulsar in
> > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > FLIP.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather
> > immediate
> > > > > > >>>> follow-up
> > > > > > >>>> > work.
> > > > > > >>>> > > >> > (could
> > > > > > >>>> > > >> > > >>>>>>> be in the
> > > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52
> > PM
> > > > > Dawid
> > > > > > >>>> > Wysakowicz
> > > > > > >>>> > > >> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > >>>> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > > > > >>>> SourceReader,
> > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader
> implements
> > > > > > >>>> SplitReader,
> > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > > > > >>>> > AlignedSourceReader"
> > > > > > >>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements
> AlignedSplitReader",
> > > or
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > > > > >>>> SourceReader"
> > > > > > >>>> > and
> > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > > according
> > > > > > to
> > > > > > >>>> Dawid
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > > be:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > > > > >>>> SourceReader"
> > > > > > >>>> > and
> > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> > although
> > > > > > >>>> personally
> > > > > > >>>> > > >> > speaking I
> > > > > > >>>> > > >> > > >>>>>>> think it
> > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to
> the
> > > > split
> > > > > > >>>> reader
> > > > > > >>>> > with
> > > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea
> to
> > > have
> > > > > it
> > > > > > >>>> > aligned as
> > > > > > >>>> > > >> > much
> > > > > > >>>> > > >> > > as
> > > > > > >>>> > > >> > > >>>>>>> > possible.
> > > > > > >>>> > > >> > > >>>>>>> > > >> I'd
> > > > > > >>>> > > >> > > >>>>>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can
> > > merge
> > > > > > >>>> > > >> > AlignedSplitReader
> > > > > > >>>> > > >> > > >>>>>>> with
> > > > > > >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket
> Qin
> > > > > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to
> > > align
> > > > > > other
> > > > > > >>>> > things in
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> future?
> > > > > > >>>> > > >> > > >>>>>>> > For
> > > > > > >>>> > > >> > > >>>>>>> > > >> > example,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global
> > rate
> > > > > > >>>> limits. This
> > > > > > >>>> > > >> > feature
> > > > > > >>>> > > >> > > >>>>>>> may also
> > > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> this
> > > > > > >>>> interface to
> > > > > > >>>> > > >> support
> > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > >>>> > > >> > > >>>>>>> > > >> > things, I
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > such
> > > as
> > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming
> > > wise,
> > > > it
> > > > > > >>>> would
> > > > > > >>>> > > >> usually
> > > > > > >>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > >>>> > > >> > > >>>>>>> > > >> extensible
> > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just describe what the method
> > > > > actually
> > > > > > >>>> does,
> > > > > > >>>> > > >> instead
> > > > > > >>>> > > >> > of
> > > > > > >>>> > > >> > > >>>>>>> assuming
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For
> > > example,
> > > > > in
> > > > > > >>>> this
> > > > > > >>>> > case,
> > > > > > >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible
> > because
> > > > this
> > > > > > >>>> can be
> > > > > > >>>> > used
> > > > > > >>>> > > >> for
> > > > > > >>>> > > >> > > >>>>>>> any kind of
> > > > > > >>>> > > >> > > >>>>>>> > > >> flow
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark
> > > alignment
> > > > or
> > > > > > >>>> simple
> > > > > > >>>> > rate
> > > > > > >>>> > > >> > > >>>>>>> limiting.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion
> > on
> > > > > this.
> > > > > > I
> > > > > > >>>> think
> > > > > > >>>> > > >> they
> > > > > > >>>> > > >> > > have
> > > > > > >>>> > > >> > > >>>>>>> their
> > > > > > >>>> > > >> > > >>>>>>> > own
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > > > > Interfaces
> > > > > > >>>> for
> > > > > > >>>> > > >> extending
> > > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > > TableSource/TableSink,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces
> rather
> > > than
> > > > > > >>>> methods in
> > > > > > >>>> > > >> this
> > > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > > >>>> > > >> > > >>>>>>> > you
> > > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > > ability
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > > Interfaces
> > > > > > can
> > > > > > >>>> help
> > > > > > >>>> > to
> > > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you
> want
> > > to
> > > > > have
> > > > > > >>>> an
> > > > > > >>>> > > >> ability.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative
> > > interfaces
> > > > if
> > > > > > >>>> this is
> > > > > > >>>> > a
> > > > > > >>>> > > >> > general
> > > > > > >>>> > > >> > > >>>>>>> design
> > > > > > >>>> > > >> > > >>>>>>> > > >> pattern
> > > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the other components in
> Flink.
> > > But
> > > > it
> > > > > > >>>> looks
> > > > > > >>>> > like
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> current API
> > > > > > >>>> > > >> > > >>>>>>> > > >> > proposal
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is
> > > essentially
> > > > > > >>>> > > >> "MySourceReader
> > > > > > >>>> > > >> > > >>>>>>> implements
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader,
> > > WithSplitsAlignment",
> > > > > > >>>> along with
> > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> implements
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API
> > symmetric?
> > > > I'd
> > > > > > >>>> > consider any
> > > > > > >>>> > > >> > one
> > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > >>>> > > >> > > >>>>>>> > > >> following
> > > > > > >>>> > > >> > > >>>>>>> > > >> > as
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > > > > >>>> SourceReader,
> > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader
> implements
> > > > > > >>>> SplitReader,
> > > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > > > > >>>> > AlignedSourceReader"
> > > > > > >>>> > > >> > and
> > > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements
> AlignedSplitReader",
> > > or
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > > > > >>>> SourceReader"
> > > > > > >>>> > and
> > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > > according
> > > > > > to
> > > > > > >>>> Dawid
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > > be:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > > > > >>>> SourceReader"
> > > > > > >>>> > and
> > > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> > although
> > > > > > >>>> personally
> > > > > > >>>> > > >> > speaking I
> > > > > > >>>> > > >> > > >>>>>>> think it
> > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to
> the
> > > > split
> > > > > > >>>> reader
> > > > > > >>>> > with
> > > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> > > > > > >>>> dwysakowicz@apache.org> <
> > > > > > >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > >>>> dwysakowicz@apache.org
> > > > > > >>>> > >,
> > > > > > >>>> > > >> > thanks
> > > > > > >>>> > > >> > > >>>>>>> for the
> > > > > > >>>> > > >> > > >>>>>>> > > reply.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't
> > > have a
> > > > > > >>>> preference
> > > > > > >>>> > > >> and I
> > > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > > alignment
> > > > I
> > > > > am
> > > > > > >>>> fine
> > > > > > >>>> > > >> > following
> > > > > > >>>> > > >> > > >>>>>>> your
> > > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader
> extending
> > > > from
> > > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > > there,
> > > > > not
> > > > > > >>>> in the
> > > > > > >>>> > > >> Source
> > > > > > >>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the
> > > > > "supportsXXX"
> > > > > > >>>> in
> > > > > > >>>> > Source
> > > > > > >>>> > > >> is
> > > > > > >>>> > > >> > > >>>>>>> that this
> > > > > > >>>> > > >> > > >>>>>>> > > allows
> > > > > > >>>> > > >> > > >>>>>>> > > >> > some
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For
> > example,
> > > > if a
> > > > > > >>>> user
> > > > > > >>>> > enabled
> > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by
> > the
> > > > > > Source,
> > > > > > >>>> an
> > > > > > >>>> > > >> exception
> > > > > > >>>> > > >> > > >>>>>>> can be
> > > > > > >>>> > > >> > > >>>>>>> > thrown
> > > > > > >>>> > > >> > > >>>>>>> > > >> at
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in
> > general
> > > > > > >>>> useful. That
> > > > > > >>>> > > >> said,
> > > > > > >>>> > > >> > I
> > > > > > >>>> > > >> > > >>>>>>> agree that
> > > > > > >>>> > > >> > > >>>>>>> > > API
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better
> > to
> > > > put
> > > > > > the
> > > > > > >>>> two
> > > > > > >>>> > > >> methods
> > > > > > >>>> > > >> > > >>>>>>> together.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56
> PM
> > > > Jark
> > > > > > Wu <
> > > > > > >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <
> > imjark@gmail.com
> > > >
> > > > <
> > > > > > >>>> > > >> imjark@gmail.com>
> > > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder
> on
> > > > > > FLIP-182.
> > > > > > >>>> > Sorry I
> > > > > > >>>> > > >> did
> > > > > > >>>> > > >> > > >>>>>>> miss it.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns
> > then.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40,
> > > Dawid
> > > > > > >>>> > Wysakowicz <
> > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> > > > > > >>>> dwysakowicz@apache.org>
> > > > > > >>>> > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always
> > > align
> > > > > with
> > > > > > >>>> > watermarks
> > > > > > >>>> > > >> > when
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case
> needs
> > > > > > watermark
> > > > > > >>>> > > >> alignment
> > > > > > >>>> > > >> > > even
> > > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > > throughput
> > > > > > >>>> somehow. I
> > > > > > >>>> > > >> agree
> > > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> > method
> > > > for
> > > > > > >>>> users to
> > > > > > >>>> > > >> > > configure
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > > calculate
> > > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > > >>>> maxDesiredWatermark
> > > > > > >>>> > will
> > > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> > > > resume.
> > > > > > Can
> > > > > > >>>> users
> > > > > > >>>> > > >> > > configure
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the
> previous
> > > > > FLIP[1]
> > > > > > >>>> which
> > > > > > >>>> > has
> > > > > > >>>> > > >> > been
> > > > > > >>>> > > >> > > >>>>>>> already
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be
> > > > enabled
> > > > > > >>>> with the
> > > > > > >>>> > > >> > > watermark
> > > > > > >>>> > > >> > > >>>>>>> strategy
> > > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and
> > update
> > > > > > >>>> interval.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> this
> > > > > > >>>> interface to
> > > > > > >>>> > > >> support
> > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > such
> > > as
> > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On
> the
> > > > other
> > > > > > >>>> hand
> > > > > > >>>> > putting
> > > > > > >>>> > > >> > all
> > > > > > >>>> > > >> > > >>>>>>> methods,
> > > > > > >>>> > > >> > > >>>>>>> > > even
> > > > > > >>>> > > >> > > >>>>>>> > > >> > with
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for
> > > enabling
> > > > > > >>>> certain
> > > > > > >>>> > > >> features,
> > > > > > >>>> > > >> > > >>>>>>> makes the
> > > > > > >>>> > > >> > > >>>>>>> > > entry
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new
> > > source
> > > > > > >>>> higher.
> > > > > > >>>> > > >> Instead of
> > > > > > >>>> > > >> > > >>>>>>> focusing on
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> basic
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and required properties of
> the
> > > > > Source,
> > > > > > >>>> the
> > > > > > >>>> > person
> > > > > > >>>> > > >> > > >>>>>>> implementing a
> > > > > > >>>> > > >> > > >>>>>>> > > >> source
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to
> > > figure
> > > > > out
> > > > > > >>>> what
> > > > > > >>>> > all
> > > > > > >>>> > > >> of
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> extra
> > > > > > >>>> > > >> > > >>>>>>> > > >> features
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with
> > them.
> > > It
> > > > > > >>>> makes it
> > > > > > >>>> > also
> > > > > > >>>> > > >> > > harder
> > > > > > >>>> > > >> > > >>>>>>> to
> > > > > > >>>> > > >> > > >>>>>>> > > organize
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as
> > Jark
> > > > > said.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't
> > > have a
> > > > > > >>>> preference
> > > > > > >>>> > > >> and I
> > > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > > >>>> > > >> > > >>>>>>> > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > > alignment
> > > > I
> > > > > am
> > > > > > >>>> fine
> > > > > > >>>> > > >> > following
> > > > > > >>>> > > >> > > >>>>>>> your
> > > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader
> extending
> > > > from
> > > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > > there,
> > > > > not
> > > > > > >>>> in the
> > > > > > >>>> > > >> Source
> > > > > > >>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really
> > > > > > unfortunate
> > > > > > >>>> the
> > > > > > >>>> > > >> > > >>>>>>> "alignSplits"
> > > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > > >>>> > > >> > > >>>>>>> > > >> > differ
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and
> > > > > > >>>> SpitReader. The
> > > > > > >>>> > > >> reason
> > > > > > >>>> > > >> > > for
> > > > > > >>>> > > >> > > >>>>>>> that is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only
> > with
> > > > > > >>>> SplitIds,
> > > > > > >>>> > whereas
> > > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > needs
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them.
> I
> > > > found
> > > > > > the
> > > > > > >>>> > > >> discrepancy
> > > > > > >>>> > > >> > > >>>>>>> acceptable
> > > > > > >>>> > > >> > > >>>>>>> > for
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> > > > > > >>>> significantly,
> > > > > > >>>> > > >> especially
> > > > > > >>>> > > >> > > as
> > > > > > >>>> > > >> > > >>>>>>> they
> > > > > > >>>> > > >> > > >>>>>>> > would
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> highly
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as
> we
> > > > would
> > > > > > >>>> have to
> > > > > > >>>> > > >> perform
> > > > > > >>>> > > >> > > >>>>>>> additional
> > > > > > >>>> > > >> > > >>>>>>> > > >> > lookups.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a
> > > > > secondary
> > > > > > >>>> > interface.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu
> > > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid
> > and
> > > > > > >>>> Sebastian!
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor
> > questions
> > > > > > (maybe I
> > > > > > >>>> > missed
> > > > > > >>>> > > >> > > >>>>>>> something).
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always
> > > align
> > > > > with
> > > > > > >>>> > watermarks
> > > > > > >>>> > > >> > when
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case
> needs
> > > > > > watermark
> > > > > > >>>> > > >> alignment
> > > > > > >>>> > > >> > > even
> > > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > > throughput
> > > > > > >>>> somehow. I
> > > > > > >>>> > > >> agree
> > > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> > method
> > > > for
> > > > > > >>>> users to
> > > > > > >>>> > > >> > > configure
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > > calculate
> > > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > > >>>> maxDesiredWatermark
> > > > > > >>>> > will
> > > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> > > > resume.
> > > > > > Can
> > > > > > >>>> users
> > > > > > >>>> > > >> > > configure
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to
> > > align
> > > > > > other
> > > > > > >>>> > things in
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> future?
> > > > > > >>>> > > >> > > >>>>>>> > For
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> example,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global
> > rate
> > > > > > >>>> limits. This
> > > > > > >>>> > > >> > feature
> > > > > > >>>> > > >> > > >>>>>>> may also
> > > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend
> this
> > > > > > >>>> interface to
> > > > > > >>>> > > >> support
> > > > > > >>>> > > >> > > >>>>>>> align other
> > > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> > such
> > > as
> > > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion
> > on
> > > > > this.
> > > > > > I
> > > > > > >>>> think
> > > > > > >>>> > > >> they
> > > > > > >>>> > > >> > > have
> > > > > > >>>> > > >> > > >>>>>>> their
> > > > > > >>>> > > >> > > >>>>>>> > own
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > > > > Interfaces
> > > > > > >>>> for
> > > > > > >>>> > > >> extending
> > > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > > TableSource/TableSink,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces
> rather
> > > than
> > > > > > >>>> methods in
> > > > > > >>>> > > >> this
> > > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > > >>>> > > >> > > >>>>>>> > you
> > > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > > ability
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > > Interfaces
> > > > > > can
> > > > > > >>>> help
> > > > > > >>>> > to
> > > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you
> want
> > > to
> > > > > have
> > > > > > >>>> an
> > > > > > >>>> > > >> ability.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13,
> > > > Becket
> > > > > > Qin
> > > > > > >>>> <
> > > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> > > > > > becket.qin@gmail.com>
> > > > > > >>>> <
> > > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation.
> > > > Apologies
> > > > > > >>>> that I
> > > > > > >>>> > > >> somehow
> > > > > > >>>> > > >> > > >>>>>>> misread a
> > > > > > >>>> > > >> > > >>>>>>> > > bunch
> > > > > > >>>> > > >> > > >>>>>>> > > >> of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were
> > > > > "assign".
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default
> > > > > implementation,
> > > > > > >>>> I was
> > > > > > >>>> > > >> > thinking
> > > > > > >>>> > > >> > > >>>>>>> of the
> > > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little
> > > > worried
> > > > > > >>>> about
> > > > > > >>>> > the
> > > > > > >>>> > > >> > > >>>>>>> proliferation of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> decorative
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most
> > > > > important
> > > > > > >>>> thing
> > > > > > >>>> > about
> > > > > > >>>> > > >> > > >>>>>>> interfaces is
> > > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> they
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In
> this
> > > > > case, I
> > > > > > >>>> prefer
> > > > > > >>>> > > >> adding
> > > > > > >>>> > > >> > > >>>>>>> new method
> > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the
> > > > following
> > > > > > >>>> reasons:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest
> drawback
> > of
> > > > > > >>>> decorative
> > > > > > >>>> > > >> > interfaces
> > > > > > >>>> > > >> > > >>>>>>> is which
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> > > > > > combinations
> > > > > > >>>> of
> > > > > > >>>> > > >> multiple
> > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current
> FLIP,
> > > the
> > > > > > >>>> > > >> > withSplitsAlignment
> > > > > > >>>> > > >> > > >>>>>>> interface
> > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > only
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> applicable to the
> SourceReader
> > > > which
> > > > > > >>>> means it
> > > > > > >>>> > > >> can't
> > > > > > >>>> > > >> > > >>>>>>> decorate any
> > > > > > >>>> > > >> > > >>>>>>> > > >> other
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface
> > > design
> > > > > > >>>> > perspective, a
> > > > > > >>>> > > >> > > >>>>>>> natural
> > > > > > >>>> > > >> > > >>>>>>> > question
> > > > > > >>>> > > >> > > >>>>>>> > > >> is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> why
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader"
> > > extend
> > > > > > >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > also
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a
> split
> > > > reader
> > > > > > >>>> > implementing
> > > > > > >>>> > > >> > both
> > > > > > >>>> > > >> > > >>>>>>> > SplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would
> work,
> > > > > because
> > > > > > a
> > > > > > >>>> > source
> > > > > > >>>> > > >> > reader
> > > > > > >>>> > > >> > > >>>>>>> > implementing
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and
> > > > withSplitsAlignment
> > > > > > >>>> works.
> > > > > > >>>> > So why
> > > > > > >>>> > > >> > > isn't
> > > > > > >>>> > > >> > > >>>>>>> there an
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In
> the
> > > > > future,
> > > > > > if
> > > > > > >>>> > there
> > > > > > >>>> > > >> is a
> > > > > > >>>> > > >> > > new
> > > > > > >>>> > > >> > > >>>>>>> feature
> > > > > > >>>> > > >> > > >>>>>>> > > >> added
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or
> pre-partitioned
> > > > data
> > > > > > >>>> aware),
> > > > > > >>>> > are
> > > > > > >>>> > > >> we
> > > > > > >>>> > > >> > > >>>>>>> going to
> > > > > > >>>> > > >> > > >>>>>>> > create
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> another
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such
> > as
> > > > > > >>>> > > >> SortedSplitReader or
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I
> > think
> > > > the
> > > > > > >>>> > additional
> > > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> > like
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually
> > > > > increases
> > > > > > >>>> the
> > > > > > >>>> > > >> > > understanding
> > > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > > >>>> > > >> > > >>>>>>> > > >> users
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> because they have to know
> what
> > > > > > decorative
> > > > > > >>>> > > >> interfaces
> > > > > > >>>> > > >> > > are
> > > > > > >>>> > > >> > > >>>>>>> there,
> > > > > > >>>> > > >> > > >>>>>>> > > which
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate
> and
> > > > which
> > > > > > >>>> > > >> combinations of
> > > > > > >>>> > > >> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and
> which
> > > are
> > > > > not.
> > > > > > >>>> > Ideally we
> > > > > > >>>> > > >> > want
> > > > > > >>>> > > >> > > >>>>>>> to avoid
> > > > > > >>>> > > >> > > >>>>>>> > > >> that.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > To
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing
> > > having
> > > > an
> > > > > > >>>> > interface of
> > > > > > >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it
> as
> > an
> > > > > > >>>> internal
> > > > > > >>>> > > >> interface
> > > > > > >>>> > > >> > > and
> > > > > > >>>> > > >> > > >>>>>>> let
> > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend
> it.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the
> > > > > > >>>> SourceReader
> > > > > > >>>> > with a
> > > > > > >>>> > > >> > > default
> > > > > > >>>> > > >> > > >>>>>>> > > >> implementation
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic
> > > > > branching
> > > > > > >>>> in the
> > > > > > >>>> > > >> source
> > > > > > >>>> > > >> > > >>>>>>> logic,
> > > > > > >>>> > > >> > > >>>>>>> > > >> especially
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the
> > vast
> > > > > > >>>> majority of
> > > > > > >>>> > the
> > > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all,
> > > would
> > > > > just
> > > > > > >>>> extend
> > > > > > >>>> > > >> from
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new
> method
> > to
> > > > the
> > > > > > >>>> > interface
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > > >>>>>>> effectively
> > > > > > >>>> > > >> > > >>>>>>> > > >> give
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but
> > > simpler.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle
> > > that
> > > > > may
> > > > > > be
> > > > > > >>>> > worth
> > > > > > >>>> > > >> > > >>>>>>> discussing is how
> > > > > > >>>> > > >> > > >>>>>>> > > do
> > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> let
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations
> tell
> > > > Flink
> > > > > > >>>> what
> > > > > > >>>> > > >> capability
> > > > > > >>>> > > >> > > is
> > > > > > >>>> > > >> > > >>>>>>> > supported
> > > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> what
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I
> > > feel
> > > > > the
> > > > > > >>>> most
> > > > > > >>>> > > >> intuitive
> > > > > > >>>> > > >> > > >>>>>>> place to me
> > > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> in
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that
> is
> > > the
> > > > > > >>>> entrance
> > > > > > >>>> > of the
> > > > > > >>>> > > >> > > >>>>>>> entire Source
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> connector
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> logic.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts,
> I
> > am
> > > > > > >>>> wondering
> > > > > > >>>> > if the
> > > > > > >>>> > > >> > > >>>>>>> following
> > > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand
> > by
> > > > the
> > > > > > >>>> users.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Change
> "withSplitsAlignment"
> > to
> > > > > > >>>> internal
> > > > > > >>>> > > >> interface,
> > > > > > >>>> > > >> > > >>>>>>> let both
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from
> it,
> > > > with
> > > > > a
> > > > > > >>>> default
> > > > > > >>>> > > >> no-op
> > > > > > >>>> > > >> > > >>>>>>> > > implementation.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > > > > > >>>> > > >> supportSplitsAlignment()"
> > > > > > >>>> > > >> > > to
> > > > > > >>>> > > >> > > >>>>>>> the
> > > > > > >>>> > > >> > > >>>>>>> > Source
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> > > > > > implementation
> > > > > > >>>> > returning
> > > > > > >>>> > > >> > > >>>>>>> false. Sources
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> have
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment
> logic
> > > can
> > > > > > >>>> change
> > > > > > >>>> > this to
> > > > > > >>>> > > >> > > >>>>>>> return true,
> > > > > > >>>> > > >> > > >>>>>>> > and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits()
> > > methods
> > > > in
> > > > > > the
> > > > > > >>>> > > >> > SourceReader
> > > > > > >>>> > > >> > > /
> > > > > > >>>> > > >> > > >>>>>>> > > SplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> if
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> needed.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new
> > > optional
> > > > > > >>>> feature is
> > > > > > >>>> > > >> going
> > > > > > >>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> be added
> > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature
> > requires
> > > > the
> > > > > > >>>> > awareness
> > > > > > >>>> > > >> from
> > > > > > >>>> > > >> > > >>>>>>> Flink, we
> > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> more
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05
> PM
> > > > Dawid
> > > > > > >>>> > Wysakowicz <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > >>>> dwysakowicz@apache.org>
> > > > > > >>>> > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > > > > > >>>> > dwysakowicz@apache.org>
> > > > > > >>>> > > >> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > >>>> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > > >>>> `AlignedSplitReader`
> > > > > > >>>> > > >> > > interface
> > > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > > >>>> > > >> > > >>>>>>> > > >> stop &
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > > implemented
> > > > > > for
> > > > > > >>>> > Kafka and
> > > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > > >>>> > > >> > > >>>>>>> > > >> correct?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know
> > though,
> > > > > those
> > > > > > >>>> are
> > > > > > >>>> > the
> > > > > > >>>> > > >> only
> > > > > > >>>> > > >> > > >>>>>>> sources
> > > > > > >>>> > > >> > > >>>>>>> > which
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> consume
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple
> > splits
> > > > and
> > > > > > >>>> thus
> > > > > > >>>> > > >> alignment
> > > > > > >>>> > > >> > > >>>>>>> applies.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting"
> split
> > > > > > alignment
> > > > > > >>>> in
> > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> > does
> > > > not
> > > > > > >>>> implement
> > > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > > WithSplitsAlignment
> > > > > > >>>> can
> > > > > > >>>> > > >> instead
> > > > > > >>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> added to
> > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > > > > >>>> > KafkaSourceReader) to
> > > > > > >>>> > > >> > make
> > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports
> > it.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern.
> > Hmm, I
> > > > > think
> > > > > > >>>> we
> > > > > > >>>> > could
> > > > > > >>>> > > >> > > >>>>>>> actually do
> > > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > > >>>> > > >> > > >>>>>>> > > >> Given
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of
> > the
> > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > > > > > >>>> > > >> > > >>>>>>> > > >> rather
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the
> > > > > > >>>> corresponding
> > > > > > >>>> > method
> > > > > > >>>> > > >> of
> > > > > > >>>> > > >> > > >>>>>>> > SplitFetcher),
> > > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the
> > > actual
> > > > > > source
> > > > > > >>>> > > >> > > >>>>>>> implementations. This
> > > > > > >>>> > > >> > > >>>>>>> > > >> solution
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> has the downside though.
> > Authors
> > > of
> > > > > new
> > > > > > >>>> > sources
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > > >>>>>>> have to do
> > > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> things: extend from
> > > > > AlignedSplitReader
> > > > > > >>>> and
> > > > > > >>>> > > >> implement
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> > > > > > >>>> AlignedSplitReader.
> > > > > > >>>> > I
> > > > > > >>>> > > >> would
> > > > > > >>>> > > >> > > be
> > > > > > >>>> > > >> > > >>>>>>> fine
> > > > > > >>>> > > >> > > >>>>>>> > with
> > > > > > >>>> > > >> > > >>>>>>> > > >> such
> > > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others
> > > think?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> > motivation
> > > > > > >>>> section, is
> > > > > > >>>> > it
> > > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > > partitions
> > > > > > >>>> and
> > > > > > >>>> > one of
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> partition
> > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > > > > situation,
> > > > > > >>>> > watermark
> > > > > > >>>> > > >> > for
> > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> > reading
> > > > > recent
> > > > > > >>>> data
> > > > > > >>>> > from
> > > > > > >>>> > > >> > other
> > > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> > won't
> > > > > hold
> > > > > > >>>> back
> > > > > > >>>> > the
> > > > > > >>>> > > >> > overall
> > > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that
> we
> > > may
> > > > > have
> > > > > > >>>> late
> > > > > > >>>> > data
> > > > > > >>>> > > >> for
> > > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the
> > watermark.
> > > > > > >>>> Watermark of
> > > > > > >>>> > an
> > > > > > >>>> > > >> > > >>>>>>> operator is the
> > > > > > >>>> > > >> > > >>>>>>> > > >> minimum
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all
> splits[1]
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about
> > the
> > > > > > >>>> restart. Say
> > > > > > >>>> > > >> split
> > > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > > completed.
> > > > > job
> > > > > > >>>> > failed and
> > > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > > > > decision
> > > > > > >>>> is not
> > > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> > until
> > > > we
> > > > > > get
> > > > > > >>>> a
> > > > > > >>>> > cycle
> > > > > > >>>> > > >> of
> > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > > saying
> > > > > this
> > > > > > >>>> > corner is
> > > > > > >>>> > > >> a
> > > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is
> correct.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark
> alignment
> > is
> > > > > sort
> > > > > > >>>> of a
> > > > > > >>>> > > >> general
> > > > > > >>>> > > >> > use
> > > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods
> to
> > > > > > >>>> SourceReader
> > > > > > >>>> > > >> directly
> > > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface
> > of
> > > > > > >>>> > > >> WithSplitAssignment?
> > > > > > >>>> > > >> > > We
> > > > > > >>>> > > >> > > >>>>>>> can
> > > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > > backwards
> > > > > > >>>> > > >> compatibility
> > > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide
> a
> > > > > default
> > > > > > >>>> > > >> > implementation.
> > > > > > >>>> > > >> > > >>>>>>> How would
> > > > > > >>>> > > >> > > >>>>>>> > we
> > > > > > >>>> > > >> > > >>>>>>> > > >> do
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a
> no-op?
> > > Is
> > > > it
> > > > > > >>>> better
> > > > > > >>>> > than
> > > > > > >>>> > > >> > > having
> > > > > > >>>> > > >> > > >>>>>>> an
> > > > > > >>>> > > >> > > >>>>>>> > opt-in
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface? The default
> > > > implementation
> > > > > > >>>> would
> > > > > > >>>> > have
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> added
> > > > > > >>>> > > >> > > >>>>>>> > > >> exclusively
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> in
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader
> > > interface.
> > > > By
> > > > > > >>>> the way
> > > > > > >>>> > > >> notice
> > > > > > >>>> > > >> > > >>>>>>> > > SourceReaderBase
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> does extend from
> > > > WithSplitsAlignment,
> > > > > > so
> > > > > > >>>> > > >> effectively
> > > > > > >>>> > > >> > > all
> > > > > > >>>> > > >> > > >>>>>>> > > >> implementations
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To
> > be
> > > > > > honest I
> > > > > > >>>> > think
> > > > > > >>>> > > >> it is
> > > > > > >>>> > > >> > > >>>>>>> impossible
> > > > > > >>>> > > >> > > >>>>>>> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader
> > > > interface
> > > > > > >>>> directly
> > > > > > >>>> > by
> > > > > > >>>> > > >> end
> > > > > > >>>> > > >> > > >>>>>>> users.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > > SplitReader
> > > > > > >>>> interface
> > > > > > >>>> > > >> > probably
> > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling
> at
> > > the
> > > > > > split
> > > > > > >>>> > > >> > granularity.
> > > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > > >>>> > > >> > > >>>>>>> > add
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> > public
> > > > > > >>>> interface
> > > > > > >>>> > > >> section as
> > > > > > >>>> > > >> > > >>>>>>> well?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the
> > > > beginning.
> > > > > > See
> > > > > > >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using
> the
> > > > method
> > > > > > >>>> name
> > > > > > >>>> > > >> > assignSplits
> > > > > > >>>> > > >> > > >>>>>>> here,
> > > > > > >>>> > > >> > > >>>>>>> > given
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the
> > > split
> > > > > > >>>> > assignments? It
> > > > > > >>>> > > >> > > seems
> > > > > > >>>> > > >> > > >>>>>>> > something
> > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> The method's called
> > > *alignSplits*,
> > > > > not
> > > > > > >>>> > assign. Do
> > > > > > >>>> > > >> you
> > > > > > >>>> > > >> > > >>>>>>> still
> > > > > > >>>> > > >> > > >>>>>>> > prefer
> > > > > > >>>> > > >> > > >>>>>>> > > a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> different name for that?
> > > > Personally,
> > > > > I
> > > > > > am
> > > > > > >>>> > open for
> > > > > > >>>> > > >> > > >>>>>>> suggestions
> > > > > > >>>> > > >> > > >>>>>>> > > here.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >>
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket
> Qin
> > > > > wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the
> effort,
> > > > > > >>>> Sebastion. I
> > > > > > >>>> > think
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> motivation
> > > > > > >>>> > > >> > > >>>>>>> > > >> makes a
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few
> > > > suggestions
> > > > > /
> > > > > > >>>> > questions.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark
> alignment
> > is
> > > > > sort
> > > > > > >>>> of a
> > > > > > >>>> > > >> general
> > > > > > >>>> > > >> > use
> > > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods
> to
> > > > > > >>>> SourceReader
> > > > > > >>>> > > >> directly
> > > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface
> > of
> > > > > > >>>> > > >> WithSplitAssignment?
> > > > > > >>>> > > >> > > We
> > > > > > >>>> > > >> > > >>>>>>> can
> > > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > > backwards
> > > > > > >>>> > > >> compatibility
> > > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > > SplitReader
> > > > > > >>>> interface
> > > > > > >>>> > > >> > probably
> > > > > > >>>> > > >> > > >>>>>>> also
> > > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling
> at
> > > the
> > > > > > split
> > > > > > >>>> > > >> > granularity.
> > > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > > >>>> > > >> > > >>>>>>> > add
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> > public
> > > > > > >>>> interface
> > > > > > >>>> > > >> section as
> > > > > > >>>> > > >> > > >>>>>>> well?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using
> the
> > > > method
> > > > > > >>>> name
> > > > > > >>>> > > >> > assignSplits
> > > > > > >>>> > > >> > > >>>>>>> here,
> > > > > > >>>> > > >> > > >>>>>>> > given
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the
> > > split
> > > > > > >>>> > assignments? It
> > > > > > >>>> > > >> > > seems
> > > > > > >>>> > > >> > > >>>>>>> > something
> > > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > > >>>> > > >> > > >>>>>>> more
> > > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39
> > PM
> > > > > Steven
> > > > > > >>>> Wu <
> > > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> > > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> > > > > > stevenz3wu@gmail.com>
> > > > > > >>>> <
> > > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > > > > > >>>> > > >> > > >>>>>>> > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> However, a single source
> > operator
> > > > may
> > > > > > >>>> read
> > > > > > >>>> > data
> > > > > > >>>> > > >> from
> > > > > > >>>> > > >> > > >>>>>>> multiple
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g.,
> > multiple
> > > > > Kafka
> > > > > > >>>> > > >> partitions,
> > > > > > >>>> > > >> > > such
> > > > > > >>>> > > >> > > >>>>>>> that
> > > > > > >>>> > > >> > > >>>>>>> > even
> > > > > > >>>> > > >> > > >>>>>>> > > >> with
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the
> source
> > > > > operator
> > > > > > >>>> may
> > > > > > >>>> > need
> > > > > > >>>> > > >> to
> > > > > > >>>> > > >> > > >>>>>>> buffer
> > > > > > >>>> > > >> > > >>>>>>> > > excessive
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> amount
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits
> data
> > > > > faster
> > > > > > >>>> than
> > > > > > >>>> > > >> another.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> > motivation
> > > > > > >>>> section, is
> > > > > > >>>> > it
> > > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > > partitions
> > > > > > >>>> and
> > > > > > >>>> > one of
> > > > > > >>>> > > >> the
> > > > > > >>>> > > >> > > >>>>>>> partition
> > > > > > >>>> > > >> > > >>>>>>> > is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > > > > situation,
> > > > > > >>>> > watermark
> > > > > > >>>> > > >> > for
> > > > > > >>>> > > >> > > >>>>>>> this
> > > > > > >>>> > > >> > > >>>>>>> > source
> > > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> > reading
> > > > > recent
> > > > > > >>>> data
> > > > > > >>>> > from
> > > > > > >>>> > > >> > other
> > > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> > won't
> > > > > hold
> > > > > > >>>> back
> > > > > > >>>> > the
> > > > > > >>>> > > >> > overall
> > > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that
> we
> > > may
> > > > > have
> > > > > > >>>> late
> > > > > > >>>> > data
> > > > > > >>>> > > >> for
> > > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about
> > the
> > > > > > >>>> restart. Say
> > > > > > >>>> > > >> split
> > > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > > completed.
> > > > > job
> > > > > > >>>> > failed and
> > > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > > > > decision
> > > > > > >>>> is not
> > > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> > until
> > > > we
> > > > > > get
> > > > > > >>>> a
> > > > > > >>>> > cycle
> > > > > > >>>> > > >> of
> > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > > saying
> > > > > this
> > > > > > >>>> > corner is
> > > > > > >>>> > > >> a
> > > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20
> AM
> > > > > Thomas
> > > > > > >>>> Weise <
> > > > > > >>>> > > >> > > >>>>>>> thw@apache.org> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <thw@apache.org
> >
> > <
> > > > > > >>>> thw@apache.org>
> > > > > > >>>> > <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting"
> split
> > > > > > alignment
> > > > > > >>>> in
> > > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> > does
> > > > not
> > > > > > >>>> implement
> > > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > > WithSplitsAlignment
> > > > > > >>>> can
> > > > > > >>>> > > >> instead
> > > > > > >>>> > > >> > be
> > > > > > >>>> > > >> > > >>>>>>> added to
> > > > > > >>>> > > >> > > >>>>>>> > the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > > > > >>>> > KafkaSourceReader) to
> > > > > > >>>> > > >> > make
> > > > > > >>>> > > >> > > >>>>>>> it
> > > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports
> > it.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57
> AM
> > > > > > >>>> Konstantin
> > > > > > >>>> > Knauf <
> > > > > > >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <
> > > knaufk@apache.org
> > > > >
> > > > > <
> > > > > > >>>> > > >> > knaufk@apache.org
> > > > > > >>>> > > >> > > >
> > > > > > >>>> > > >> > > >>>>>>> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > > >>>> `AlignedSplitReader`
> > > > > > >>>> > > >> > > interface
> > > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > > >>>> > > >> > > >>>>>>> > > >> stop
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> &
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > > implemented
> > > > > > for
> > > > > > >>>> > Kafka and
> > > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> correct?
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it
> is
> > > > > valuable
> > > > > > >>>> to
> > > > > > >>>> > > >> complete
> > > > > > >>>> > > >> > the
> > > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> aligned
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36
> > PM
> > > > > Dawid
> > > > > > >>>> > Wysakowicz
> > > > > > >>>> > > >> <
> > > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > > >>>> dwysakowicz@apache.org>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked
> > on
> > > > it,
> > > > > I
> > > > > > >>>> > support it
> > > > > > >>>> > > >> ;)
> > > > > > >>>> > > >> > I
> > > > > > >>>> > > >> > > >>>>>>> think we
> > > > > > >>>> > > >> > > >>>>>>> > can
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish,
> as
> > > > there
> > > > > > >>>> are no
> > > > > > >>>> > > >> concerns
> > > > > > >>>> > > >> > > so
> > > > > > >>>> > > >> > > >>>>>>> far.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27,
> Sebastian
> > > > > Mattheis
> > > > > > >>>> wrote:
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a
> > discussion
> > > > on
> > > > > > >>>> FLIP 217
> > > > > > >>>> > [1]
> > > > > > >>>> > > >> for
> > > > > > >>>> > > >> > > an
> > > > > > >>>> > > >> > > >>>>>>> > extension
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to
> perform
> > > > > > alignment
> > > > > > >>>> also
> > > > > > >>>> > in
> > > > > > >>>> > > >> > > >>>>>>> SplitReaders. To
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> so,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to
> > > > suspend
> > > > > > and
> > > > > > >>>> > resume
> > > > > > >>>> > > >> > reading
> > > > > > >>>> > > >> > > >>>>>>> from
> > > > > > >>>> > > >> > > >>>>>>> > split
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator
> > > > coordinates
> > > > > > and
> > > > > > >>>> > controlls
> > > > > > >>>> > > >> > > >>>>>>> suspend and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> To
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> gather information about
> > current
> > > > > > >>>> watermarks
> > > > > > >>>> > of the
> > > > > > >>>> > > >> > > >>>>>>> SplitReaders,
> > > > > > >>>> > > >> > > >>>>>>> > we
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> extend
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the internal
> > > > > WatermarkOutputMulitplexer
> > > > > > >>>> and
> > > > > > >>>> > report
> > > > > > >>>> > > >> > > >>>>>>> watermarks to
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP
> > [2],
> > > > > > >>>> prototyped
> > > > > > >>>> > by
> > > > > > >>>> > > >> Arvid
> > > > > > >>>> > > >> > > >>>>>>> Heise and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> revised
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> and
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz
> > (He
> > > > did
> > > > > > >>>> most of
> > > > > > >>>> > the
> > > > > > >>>> > > >> > work.)
> > > > > > >>>> > > >> > > >>>>>>> and me.
> > > > > > >>>> > > >> > > >>>>>>> > The
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> changes
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a
> > way
> > > > > that
> > > > > > if
> > > > > > >>>> > affected
> > > > > > >>>> > > >> > > >>>>>>> components do
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> not
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the
> > > > behavior
> > > > > is
> > > > > > >>>> as
> > > > > > >>>> > before.
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >>
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> [2]
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> --
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> twitter.com/snntrablehttps://github.com/knaufk
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > > >>>> > > >> > > >>>>>>> > >
> > > > > > >>>> > > >> > > >>>>>>> >
> > > > > > >>>> > > >> > > >>>>>>>
> > > > > > >>>> > > >> > > >>>>>>
> > > > > > >>>> > > >> > >
> > > > > > >>>> > > >> >
> > > > > > >>>> > > >>
> > > > > > >>>> > > >
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi,

I mean I'm fine with throwing an exception by default in Flink 1.16 in the
"Case 2", but I think we need to provide a way to workaround it for example
via a feature toggle, if it's an easy thing to do. And it seems to be a
simple thing.

However this is orthogonal to the `supportsPausingSplits()` issue. I don't
have a big preference whether
  a) the exception should originate on JM, using `default boolean
supportsPausingSplits() { return false; }` (as currently proposed in the
FLIP),
  b) or on the TM from `pauseOrResumeSplits()` throwing
`UnsupportedOperationException` as you are proposing.

a) fails earlier, so it's more user friendly from this perspective, but it
provides more possibilities for bugs/inconsistencies for connector
developers, since `supportsPausingSplits()` would have to be kept in sync
with `pauseOrResumeSplits()`.

Best,
Piotrek

śr., 29 cze 2022 o 15:27 Becket Qin <be...@gmail.com> napisał(a):

> Hi Piotr,
>
> Just to make sure we are on the same page. There are two cases for the
> existing FLIP-182 users:
>
> Case 1: Each source reader only has one split assigned. This is the
> targeted case for FLIP-182.
> Case 2: Each source reader has multiple splits assigned. This is the flaky
> case that may or may not work.
>
> With solution 1, the users of case 1 won't be impacted. The users in case 2
> will receive an exception which they won't get at the moment.
>
> Do you mean we should not throw an exception in case 2? Personally I feel
> that is OK and could have been done in FLIP-182 itself because it's not a
> designed use case. As a user I may see a big variation of the job state
> sizes from time to time and I am not able to rely on this feature to plan
> my resources and uphold the SLA.
>
> That said, if you have a strong opinion on this, I am fine with having the
> configuration like "allow.coarse-grained.watermark.alignment" with the
> default value set to false, given that a configuration is much easier to
> deprecate than a method.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Wed, Jun 29, 2022 at 8:02 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
> > Thanks for the explanation.
> >
> > > 2. It is fully compatible with FLIP-182, if we consider it as the right
> > > thing to throw an exception for readers reading from multiple splits
> > > without supporting split pausing.
> >
> > I think that's fine. But the question is should we provide a workaround
> for
> > existing users? IMO if it's easy to do, we should.
> >
> > > I actually think neither solution 1 or 2 breaks FLIP-182 users,
> >
> > They do. User has currently a working Flink 1.15 deployment, where
> > watermark alignment maybe is not behaving ideally, but it's working to
> some
> > extent and you are proposing to throw them an exception after upgrading
> > Flink, without any workaround (short of implementing a feature, which is
> a
> > very problematic requirement). Given that costly upgrades are one of the
> > major complaints, I would be definitely in favor of option 2. Given the
> > most likely small actually affected user base, I would be +1 for
> solution 2
> > with throwing an exception by default.
> >
> > Best,
> > Piotrek
> >
> >
> > śr., 29 cze 2022 o 12:55 Becket Qin <be...@gmail.com> napisał(a):
> >
> > > Hi Piotr,
> > >
> > > Please see the reply inline below:
> > >
> > > On Wed, Jun 29, 2022 at 5:11 PM Piotr Nowojski <pn...@apache.org>
> > > wrote:
> > >
> > > > Hi Becket,
> > > >
> > > > > My main concern of having a supportsPausingSplits() knob
> > > >
> > > > What is the problem with `supportsPausingSplits()` that you see?  Do
> > you
> > > > want to remove it?
> > > >
> > > Just to make sure we are on the same page, I assume we are talking
> about
> > > this supportingPausingSplits() method in the Source interface. If we go
> > > with the obligatory features addition path, having this method seems
> > > misleading. And also, later on at some point when we see all the
> sources
> > > have implemented this feature, we will have to worry about deprecating
> > this
> > > method, which is backwards incompatible.
> > >
> > >
> > > > Also I don't understand your proposal for Solution 1. How do you want
> > to
> > > > decide whether to throw an exception? For that we would need to have
> > > > `supportsPausingSplits()`, right?
> > > >
> > >
> > > What I am thinking is the following:
> > >
> > > 1. The Flink framework always assumes split pausing is supported and
> just
> > > naively invokes SourceReader#pauseOrResumeSplits().
> > > 2. The SourceReaderBase will basically again try to ask the SplitReader
> > to
> > > pause the splits.
> > > 3. Because the default implementation throws an
> > > UnsupportedOperationException, if the source developer did not override
> > it,
> > > this exception will be thrown and bubbled up.
> > > 4. After catching this exception, the SourceReaderBase will just check
> if
> > > there is only one split that is currently assigned to the split reader.
> > If
> > > so, it swallows the exception, stops polling the split reader and
> returns
> > > NOTHING_AVAILABLE. This is the same as the current logic in the
> > > SourceOperator. If we are not comfortable with moving this logic to the
> > > SourceReaderBase, we can also just keep the logic there and simply let
> > > SourceOperator remember if there are more than one split assigned to
> the
> > > source reader, when SourceOperator.handleAddSplitsEvent() is invoked.
> > >
> > > This way the existing FLIP-182 users won't be impacted by this FLIP.
> For
> > > those source readers that only have one split assigned, it works fine
> > > without any change. For those source readers with multiple splits
> > assigned,
> > > they are already in a limp state with unpredictable side effects. We
> > might
> > > as well let them know this instead of pretending the
> > > coarse-grained watermark alignment works fine for them.
> > >
> > > The advantage of this solution is that we don't have to do anything
> after
> > > this. That would work fine as the final state, as in:
> > > 1. We have already done the best we can do for the Sources that do not
> > > support split pausing.
> > > 2. It is fully compatible with FLIP-182, if we consider it as the right
> > > thing to throw an exception for readers reading from multiple splits
> > > without supporting split pausing.
> > > 3. There is nothing to deprecate in the future.
> > >
> > >
> > > >
> > > > If so, I would prefer solution 2, to provide a graceful migration
> path
> > > for
> > > > any users that are already using FLIP-182 with multiple splits per
> > > > operator. I don't think there are many of those, but such a flag
> seems
> > > easy
> > > > to implement while making migration easier. Having said that,
> > technically
> > > > we could change the behaviour and start throwing an exception always
> in
> > > > such a case, as this feature is marked as Experimental.
> > > >
> > >
> > > I actually think neither solution 1 or 2 breaks FLIP-182 users, but
> > > solution 2 needs a deprecation process for the option in the future.
> > >
> > >
> > > >
> > > > Best,
> > > > Piotrek
> > > >
> > > > śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com>
> napisał(a):
> > > >
> > > > > Hi Sebastian,
> > > > >
> > > > > Regarding the question,
> > > > >
> > > > > >
> > > > > > @Becket: I'm not sure about the intention of solution 1. Can you
> > > > explain
> > > > > > that a bit more? In particular, I don't understand: "The
> reasoning
> > > > behind
> > > > > > this solution is that existing users should only use the
> > > > > > coarse watermark alignment when a source reader only reads from a
> > > > single
> > > > > > split." Why should a user not use coarse watermark alignment when
> > > > source
> > > > > > reader reads from multiple splits? The split alignment uses the
> > > "coarse
> > > > > > watermark", i.e., maxDesiredWatermark, as described in the FLIP
> for
> > > > > > alignment.
> > > > >
> > > > >
> > > > > Imagine you have a source reader reading from two splits, and the
> > > > > watermarks look like the following:
> > > > > 1. Watermark of Split 1: 10:00 AM Jun 29,
> > > > > 2. Watermark of Split 2: 11:00 AM Jun 29
> > > > > 3. maxDesiredWatermark:10:30 AM Jun 29
> > > > >
> > > > > At this point, the source reader's watermark is 10:00 AM which is
> > lower
> > > > > than the maxDesiredWatermark, so the source reader won't be paused
> > from
> > > > > reading. However, because the source reader cannot specify which
> > split
> > > to
> > > > > read from, if it continues to read, the watermark gap between the
> two
> > > > > splits may become even bigger. This essentially fails the main
> > purpose
> > > of
> > > > > watermark alignment - to reduce the number of records buffered in
> the
> > > > > state. This does not necessarily happen, but this is not what
> > FLIP-182
> > > > was
> > > > > designed for to begin with. So I'd rather avoid extending the
> feature
> > > > > to that case.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> > > > > sebastian@ververica.com>
> > > > > wrote:
> > > > >
> > > > > > (Sorry I didn't send to the list but only to Becket. My bad and
> > > thanks
> > > > > > Piotr. Next attempt:)
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Thanks for pushing the FLIP. I would drive it and would be happy
> to
> > > get
> > > > > > back to you, @Thomas, for reviews. (Sorry for low responsiveness,
> > > there
> > > > > > were several efforts with high priority on my side ...) As next
> > > step, I
> > > > > > would revise the FLIP to get the discussion concluded.
> > > > > >
> > > > > > However, as Becket mentioned I feel that some things are still
> not
> > > > clear
> > > > > > yet:
> > > > > >
> > > > > > Re: Thomas
> > > > > >>
> > > > > >> However, from a user perspective, should the split level
> alignment
> > > be
> > > > > >>> an opt-in feature, at least for a few releases? If yes, then we
> > > would
> > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > understand
> > > > > >>> won't be part of the revised FLIP. Such control may be
> > beneficial:
> > > > > >>
> > > > > >>
> > > > > >>> * Compare runtime behavior with split level alignment on/off
> > > > > >>> * Allow use of sources that don't implement pausing splits yet
> > > > > >>
> > > > > >>
> > > > > >>> The second point would, from the user's perspective, be
> necessary
> > > for
> > > > > >>> backward compatibility? While the interface aspect and source
> > > > > >>> compatibility has been discussed in great detail, I don't think
> > it
> > > > > >>> would be desirable if an application that already uses
> alignment
> > > > fails
> > > > > >>> after upgrading to the new Flink version, forcing users to lock
> > > step
> > > > > >>> modify sources for the new non-optional split level alignment.
> > > > > >>
> > > > > >>
> > > > > >>> So I think clarification of the compatibility aspect on the
> FLIP
> > > page
> > > > > >>> would be necessary.
> > > > > >>
> > > > > >>
> > > > > >> That is a good point. Currently FLIP-182 is marked as
> > experimental.
> > > So
> > > > > >> technically speaking it could change. That said, I agree that it
> > > would
> > > > > be
> > > > > >> good to not break the existing sources.
> > > > > >>
> > > > > >> My main concern of having a supportsPausingSplits() knob is that
> > > this
> > > > > >> might be something requiring code change on future derepcation.
> I
> > am
> > > > > >> thinking of two potential solutions:
> > > > > >>
> > > > > >> Solution 1:
> > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked,
> if
> > > the
> > > > > >> source reader only has one split assigned, the source reader
> > simply
> > > > > stops
> > > > > >> polling but just returns NOTHING_AVAILABLE. If there are more
> than
> > > one
> > > > > >> splits assigned, it throws an exception with a message such as
> > "The
> > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > alignment
> > > > > >> when assigned a single split. There are more than one split
> > assigned
> > > > to
> > > > > the
> > > > > >> SplitReader".
> > > > > >> The reasoning behind this solution is that existing users should
> > > only
> > > > > use
> > > > > >> the coarse watermark alignment when a source reader only reads
> > from
> > > a
> > > > > >> single split. Reading from more than one split might have
> unwanted
> > > > side
> > > > > >> effects, so we might as well throw an exception in this case.
> > > > > >>
> > > > > >> Solution 2:
> > > > > >> Having a configuration
> > "enable.coarse-grained.watermark.alignment",
> > > > the
> > > > > >> default value is false. Once it is set to true, we will allow
> > > > > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > > > > >> This solution allows users to keep the current FLIP-182
> behavior,
> > > with
> > > > > >> the risk of side effects.
> > > > > >>
> > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > >> coarse-grained watermark alignment could be frustrating to the
> > users
> > > > > >> when more than one split is assigned. So we might as well not
> > > support
> > > > > it at
> > > > > >> all. And also there is nothing to deprecate in the future with
> > this
> > > > > >> solution.
> > > > > >>
> > > > > >> What do you think?
> > > > > >>
> > > > > >
> > > > > > @Thomas: My understanding is that you intend a simple switch/knob
> > to
> > > > test
> > > > > > w/ and w/o (split) watermark alignment, right? Isn't the
> > > coarse-grained
> > > > > w/
> > > > > > vs w/o watermark alignment sufficient for that? Or do you think
> > that
> > > > > > switching watermark aligment explicitly on split level is
> required?
> > > > > >
> > > > > > @Becket: I'm not sure about the intention of solution 1. Can you
> > > > explain
> > > > > > that a bit more? In particular, I don't understand: "The
> reasoning
> > > > behind
> > > > > > this solution is that existing users should only use the coarse
> > > > watermark
> > > > > > alignment when a source reader only reads from a single split."
> Why
> > > > > > should a user not use coarse watermark alignment when source
> reader
> > > > reads
> > > > > > from multiple splits? The split alignment uses the "coarse
> > > watermark",
> > > > > > i.e., maxDesiredWatermark, as described in the FLIP for
> alignment.
> > > > > >
> > > > > > Could you please clarify?
> > > > > >
> > > > > > Regards,
> > > > > > Sebastian
> > > > > >
> > > > > > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <becket.qin@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> Thanks for the feedback, Thomas and Steve. And thanks Piotr for
> > the
> > > > > >> patient and detailed discussion.
> > > > > >>
> > > > > >> Let's move forward with option 1 then.
> > > > > >>
> > > > > >> Re: Thomas
> > > > > >>
> > > > > >> However, from a user perspective, should the split level
> alignment
> > > be
> > > > > >>> an opt-in feature, at least for a few releases? If yes, then we
> > > would
> > > > > >>> require a knob similar to supportsPausingSplits(), which I
> > > understand
> > > > > >>> won't be part of the revised FLIP. Such control may be
> > beneficial:
> > > > > >>
> > > > > >>
> > > > > >>> * Compare runtime behavior with split level alignment on/off
> > > > > >>> * Allow use of sources that don't implement pausing splits yet
> > > > > >>
> > > > > >>
> > > > > >>> The second point would, from the user's perspective, be
> necessary
> > > for
> > > > > >>> backward compatibility? While the interface aspect and source
> > > > > >>> compatibility has been discussed in great detail, I don't think
> > it
> > > > > >>> would be desirable if an application that already uses
> alignment
> > > > fails
> > > > > >>> after upgrading to the new Flink version, forcing users to lock
> > > step
> > > > > >>> modify sources for the new non-optional split level alignment.
> > > > > >>
> > > > > >>
> > > > > >>> So I think clarification of the compatibility aspect on the
> FLIP
> > > page
> > > > > >>> would be necessary.
> > > > > >>
> > > > > >>
> > > > > >> That is a good point. Currently FLIP-182 is marked as
> > experimental.
> > > So
> > > > > >> technically speaking it could change. That said, I agree that it
> > > would
> > > > > be
> > > > > >> good to not break the existing sources.
> > > > > >>
> > > > > >> My main concern of having a supportsPausingSplits() knob is that
> > > this
> > > > > >> might be something requiring code change on future derepcation.
> I
> > am
> > > > > >> thinking of two potential solutions:
> > > > > >>
> > > > > >> Solution 1:
> > > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked,
> if
> > > the
> > > > > >> source reader only has one split assigned, the source reader
> > simply
> > > > > stops
> > > > > >> polling but just returns NOTHING_AVAILABLE. If there are more
> than
> > > one
> > > > > >> splits assigned, it throws an exception with a message such as
> > "The
> > > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > > alignment
> > > > > when
> > > > > >> assigned a single split. There are more than one split assigned
> to
> > > the
> > > > > >> SplitReader".
> > > > > >> The reasoning behind this solution is that existing users should
> > > only
> > > > > use
> > > > > >> the coarse watermark alignment when a source reader only reads
> > from
> > > a
> > > > > >> single split. Reading from more than one split might have
> unwanted
> > > > side
> > > > > >> effects, so we might as well throw an exception in this case.
> > > > > >>
> > > > > >> Solution 2:
> > > > > >> Having a configuration
> > "enable.coarse-grained.watermark.alignment",
> > > > the
> > > > > >> default value is false. Once it is set to true, we will allow
> > > > > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > > > > >> This solution allows users to keep the current FLIP-182
> behavior,
> > > with
> > > > > >> the risk of side effects.
> > > > > >>
> > > > > >> Personally speaking, I feel solution 1 seems better because
> > > > > >> coarse-grained watermark alignment could be frustrating to the
> > users
> > > > > when
> > > > > >> more than one split is assigned. So we might as well not support
> > it
> > > at
> > > > > all.
> > > > > >> And also there is nothing to deprecate in the future with this
> > > > solution.
> > > > > >>
> > > > > >> What do you think?
> > > > > >>
> > > > > >> Thank,
> > > > > >>
> > > > > >> Jiangjie (Becket) Qin
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <
> > > pnowojski@apache.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> It looks like option 1 wins overall? So let's go with that.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Piotrek
> > > > > >>>
> > > > > >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com>
> > > > napisał(a):
> > > > > >>>
> > > > > >>>> Both option 1 (default impl in base interface) and option 2
> > > > > (decorative
> > > > > >>>> interface) are pretty common patterns. I would also be fine
> with
> > > > > either.
> > > > > >>>> The important thing is that an exception is thrown if a source
> > > > doesn't
> > > > > >>>> support the alignment capability.
> > > > > >>>>
> > > > > >>>> The other point is that we can validate the source capability
> if
> > > > > >>>> alignment
> > > > > >>>> is enabled in WatermarkStrategy. I believe either option can
> > > achieve
> > > > > >>>> this
> > > > > >>>> goal too.
> > > > > >>>> public interface WatermarkStrategy<T> {
> > > > > >>>>     WatermarkStrategy<T> withWatermarkAlignment(String
> > > > watermarkGroup,
> > > > > >>>> Duration maxAllowedWatermarkDrift);
> > > > > >>>> }
> > > > > >>>>
> > > > > >>>> If I have to pick one, I am slightly favoring option 1 (base
> > > > > >>>> interface). As
> > > > > >>>> watermark is already an essential concept of source, maybe
> > > watermark
> > > > > >>>> alignment capability can also be a property of the base
> > > > source/reader
> > > > > >>>> interface.
> > > > > >>>>
> > > > > >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org>
> > > > wrote:
> > > > > >>>>
> > > > > >>>> > Hi everyone,
> > > > > >>>> >
> > > > > >>>> > Thank you for all the effort that went into this discussion.
> > The
> > > > > split
> > > > > >>>> > level watermark alignment will be an important feature for
> > Flink
> > > > > that
> > > > > >>>> > will address operational problems for various use cases.
> From
> > > > > reading
> > > > > >>>> > through this thread it appears that not too much remains to
> > > bring
> > > > > this
> > > > > >>>> > FLIP to acceptance and allow development to move forward. I
> > > would
> > > > > like
> > > > > >>>> > to contribute if possible.
> > > > > >>>> >
> > > > > >>>> > Regarding option 1 vs. option 2: I don't have a strong
> > > preference,
> > > > > >>>> > perhaps slightly leaning towards option 1.
> > > > > >>>> >
> > > > > >>>> > However, from a user perspective, should the split level
> > > alignment
> > > > > be
> > > > > >>>> > an opt-in feature, at least for a few releases? If yes, then
> > we
> > > > > would
> > > > > >>>> > require a knob similar to supportsPausingSplits(), which I
> > > > > understand
> > > > > >>>> > won't be part of the revised FLIP. Such control may be
> > > beneficial:
> > > > > >>>> >
> > > > > >>>> > * Compare runtime behavior with split level alignment on/off
> > > > > >>>> > * Allow use of sources that don't implement pausing splits
> yet
> > > > > >>>> >
> > > > > >>>> > The second point would, from the user's perspective, be
> > > necessary
> > > > > for
> > > > > >>>> > backward compatibility? While the interface aspect and
> source
> > > > > >>>> > compatibility has been discussed in great detail, I don't
> > think
> > > it
> > > > > >>>> > would be desirable if an application that already uses
> > alignment
> > > > > fails
> > > > > >>>> > after upgrading to the new Flink version, forcing users to
> > lock
> > > > step
> > > > > >>>> > modify sources for the new non-optional split level
> alignment.
> > > > > >>>> >
> > > > > >>>> > So I think clarification of the compatibility aspect on the
> > FLIP
> > > > > page
> > > > > >>>> > would be necessary.
> > > > > >>>> >
> > > > > >>>> > Thanks,
> > > > > >>>> > Thomas
> > > > > >>>> >
> > > > > >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> > > > > >>>> piotr.nowojski@gmail.com>
> > > > > >>>> > wrote:
> > > > > >>>> > >
> > > > > >>>> > > Hi Becket,
> > > > > >>>> > >
> > > > > >>>> > > Thanks for summing this up. Just one correction:
> > > > > >>>> > >
> > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > >>>> > > >   e) It is OK that the code itself in option 2 indicates
> > the
> > > > > >>>> developers
> > > > > >>>> > > that a feature is optional. We will rely on the
> > documentation
> > > to
> > > > > >>>> correct
> > > > > >>>> > > that and clarify that the feature is actually obligatory.
> > > > > >>>> > >
> > > > > >>>> > > I would say based on a) and b) that feature would be still
> > > > > >>>> optional. So
> > > > > >>>> > > both the implementation and the documentation would be
> > saying
> > > > > that.
> > > > > >>>> We
> > > > > >>>> > > could add a mention to the docs and release notes, that
> this
> > > > > >>>> feature will
> > > > > >>>> > > be obligatory in the next major release and plan such a
> > > release
> > > > > >>>> > accordingly.
> > > > > >>>> > >
> > > > > >>>> > > Re the option 1., as you mentioned:
> > > > > >>>> > > > As for option 1: For developers, the feature is still
> > > optional
> > > > > >>>> due to
> > > > > >>>> > the
> > > > > >>>> > > default implementation in the interface, regardless of
> what
> > > the
> > > > > >>>> default
> > > > > >>>> > > implementation does, because the code compiles without
> > > > overriding
> > > > > >>>> these
> > > > > >>>> > > methods
> > > > > >>>> > >
> > > > > >>>> > > Also importantly, the code will work in most cases.
> > > > > >>>> > >
> > > > > >>>> > > > Obligatory: Jobs may fail if these methods are not
> > > implemented
> > > > > >>>> > properly.
> > > > > >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common
> > > > pattern
> > > > > in
> > > > > >>>> > Java,
> > > > > >>>> > > e.g. Iterator.remove() by default throws
> > > > > >>>> "UnsupportedOperationException",
> > > > > >>>> > > informing the implementation that things may go wrong if
> > this
> > > > > >>>> method is
> > > > > >>>> > not
> > > > > >>>> > > implemented.
> > > > > >>>> > >
> > > > > >>>> > > For me `Iterator#remove()` is an optional feature.
> > > Personally, I
> > > > > >>>> don't
> > > > > >>>> > > remember if I have ever implemented it.
> > > > > >>>> > >
> > > > > >>>> > > Best,
> > > > > >>>> > > Piotrek
> > > > > >>>> > >
> > > > > >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <becket.qin@gmail.com
> >
> > > > > >>>> napisał(a):
> > > > > >>>> > >
> > > > > >>>> > > > I had an offline discussion with Piotr and here is the
> > > > summary.
> > > > > >>>> Please
> > > > > >>>> > > > correct me if I miss something, Piotr.
> > > > > >>>> > > >
> > > > > >>>> > > > There are two things we would like to seek more opinions
> > > from
> > > > > the
> > > > > >>>> > > > community, so we can make progress on this FLIP.
> > > > > >>>> > > >
> > > > > >>>> > > > 1. The General pattern to add obligatory features to
> > > existing
> > > > > >>>> > interfaces.
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> ***********************************************************************************
> > > > > >>>> > > > For interfaces exposed to the developers for
> > implementation,
> > > > > they
> > > > > >>>> are
> > > > > >>>> > > > either intended to be *optional* or *obligatory. *While
> it
> > > is
> > > > > >>>> quite
> > > > > >>>> > clear
> > > > > >>>> > > > about how to convey that intention when creating the
> > > > interfaces,
> > > > > >>>> it is
> > > > > >>>> > not
> > > > > >>>> > > > as commonly agreed when we are adding new features to an
> > > > > existing
> > > > > >>>> > > > interface. In general, Flink uses decorative interfaces
> > when
> > > > > >>>> adding
> > > > > >>>> > > > optional features to existing interfaces. Both Piotr
> and I
> > > > agree
> > > > > >>>> that
> > > > > >>>> > looks
> > > > > >>>> > > > good.
> > > > > >>>> > > >
> > > > > >>>> > > > Different opinions are mainly about how to add
> obligatory
> > > > > >>>> features to
> > > > > >>>> > the
> > > > > >>>> > > > existing interfaces, probably due to different
> > > understandings
> > > > of
> > > > > >>>> > > > "obligatory".
> > > > > >>>> > > >
> > > > > >>>> > > > We have discussed about four options:
> > > > > >>>> > > >
> > > > > >>>> > > > *Option 1:*
> > > > > >>>> > > >
> > > > > >>>> > > >    - Just add a new method to the existing interface.
> > > > > >>>> > > >    - For backwards compatibility, the method would have
> a
> > > > > default
> > > > > >>>> > > >    implementation throwing
> > "UnsupportedOperationException".
> > > > > >>>> > > >    - In the next major version, remove the default
> > > > > implementation.
> > > > > >>>> > > >    - For the developers, any method with a default
> > > > > implementation
> > > > > >>>> > > >    throwing an "UnsupportedOperationException" should be
> > > taken
> > > > > as
> > > > > >>>> > obligatory.
> > > > > >>>> > > >
> > > > > >>>> > > > *Option 2:*
> > > > > >>>> > > >
> > > > > >>>> > > >    - Always make the features optional by adding a
> > > decorative
> > > > > >>>> > interface,
> > > > > >>>> > > >    just like ordinary optional features.
> > > > > >>>> > > >    - Inform the developers via documentation that this
> > > feature
> > > > > is
> > > > > >>>> > > >    obligatory, although it looks like optional from the
> > > code.
> > > > > >>>> > > >    - In case the developers did not implement the
> > decorative
> > > > > >>>> interface,
> > > > > >>>> > > >    throw an exception
> > > > > >>>> > > >    - In the next major version, move the methods in the
> > > > > decorative
> > > > > >>>> > > >    interface to the base interface, and deprecate the
> > > > decorative
> > > > > >>>> > interface.
> > > > > >>>> > > >
> > > > > >>>> > > > *Option 3:*
> > > > > >>>> > > >
> > > > > >>>> > > >    - Always bump the major version when a new obligatory
> > > > feature
> > > > > >>>> is
> > > > > >>>> > > >    added, even if we may have to do it frequently.
> > > > > >>>> > > >
> > > > > >>>> > > > *Option 4:*
> > > > > >>>> > > >
> > > > > >>>> > > >    - Add a V2, V3... of the interface affected by the
> new
> > > > > >>>> obligatory
> > > > > >>>> > > >    feature.
> > > > > >>>> > > >    - In the next major versions, deprecate old versions
> of
> > > the
> > > > > >>>> > interfaces.
> > > > > >>>> > > >
> > > > > >>>> > > > Both Piotr and me agreed that option 3 and option 4
> have a
> > > big
> > > > > >>>> side
> > > > > >>>> > effect
> > > > > >>>> > > > and should be avoided. We have different preference
> > between
> > > > > >>>> option 1
> > > > > >>>> > and
> > > > > >>>> > > > option 2.
> > > > > >>>> > > >
> > > > > >>>> > > > Personally I prefer option 1, the reasons are:
> > > > > >>>> > > >   a) simple and intuitive. Java 8 introduced the default
> > > impl
> > > > in
> > > > > >>>> > > > interfaces exactly for interface evolving, and this is a
> > > > common
> > > > > >>>> > pattern in
> > > > > >>>> > > > many projects.
> > > > > >>>> > > >   b) prominent to the developers that the feature is
> > > expected
> > > > to
> > > > > >>>> be
> > > > > >>>> > > > implemented, because it explicitly throws an exception
> in
> > > the
> > > > > >>>> default
> > > > > >>>> > impl.
> > > > > >>>> > > >   c) low maintenance overhead - the Flink framework can
> > > always
> > > > > >>>> assume
> > > > > >>>> > the
> > > > > >>>> > > > method exists, so no special handling logic is needed.
> > > > > >>>> > > >   d) communicate a clear semantic boundary between
> > optional
> > > > and
> > > > > >>>> > obligatory
> > > > > >>>> > > > features in the Flink to the developers.
> > > > > >>>> > > >       - Optional: Jobs still run without exception if
> > these
> > > > > >>>> methods are
> > > > > >>>> > > > not implemented. e.g. all the SupportsXXXPushDown
> > > interfaces.
> > > > > >>>> > > >       - Obligatory: Jobs may fail if these methods are
> not
> > > > > >>>> implemented
> > > > > >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This
> > is a
> > > > > >>>> common
> > > > > >>>> > pattern
> > > > > >>>> > > > in Java, e.g. Iterator.remove() by default throws
> > > > > >>>> > > > "UnsupportedOperationException", informing the
> > > implementation
> > > > > that
> > > > > >>>> > things
> > > > > >>>> > > > may go wrong if this method is not implemented.
> > > > > >>>> > > >
> > > > > >>>> > > > As for option 2, Although the API itself sounds clean,
> it
> > > > > misleads
> > > > > >>>> > people
> > > > > >>>> > > > to think of an obligatory feature to be optional - from
> > the
> > > > code
> > > > > >>>> the
> > > > > >>>> > > > feature is optional, but the documents say it is
> > obligatory.
> > > > We
> > > > > >>>> > probably
> > > > > >>>> > > > should avoid such code-doc inconsistency, as people will
> > be
> > > > > >>>> confused.
> > > > > >>>> > And I
> > > > > >>>> > > > would actually be bewildered that sometimes not
> > implementing
> > > > an
> > > > > >>>> > "optional"
> > > > > >>>> > > > feature is fine, but sometimes it causes the jobs to
> fail.
> > > > > >>>> > > >
> > > > > >>>> > > > In response to the argument that the method with a
> default
> > > > > >>>> > implementation
> > > > > >>>> > > > is always optional, if that is true, it actually means
> all
> > > the
> > > > > >>>> > interfaces
> > > > > >>>> > > > should be immutable once they are created. If we want to
> > > add a
> > > > > >>>> method
> > > > > >>>> > to an
> > > > > >>>> > > > existing interface, for backwards compatibility, we will
> > > have
> > > > to
> > > > > >>>> > provide a
> > > > > >>>> > > > default implementation. And the fact it has a default
> > > > > >>>> implementation
> > > > > >>>> > > > indicates the method is optional. If that method is
> > > optional,
> > > > it
> > > > > >>>> should
> > > > > >>>> > > > reside in a separate decorative interface, otherwise it
> > > clogs
> > > > > that
> > > > > >>>> > existing
> > > > > >>>> > > > interface. Therefore, people should never add a method
> to
> > an
> > > > > >>>> existing
> > > > > >>>> > > > interface. I find this conclusion a bit extreme.
> > > > > >>>> > > >
> > > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > > >>>> > > >     a) Obligatory methods are the methods that fail the
> > code
> > > > > >>>> > compilation
> > > > > >>>> > > > if not implemented.
> > > > > >>>> > > >     b) All obligatory methods should reside in the base
> > > > > interface,
> > > > > >>>> > without
> > > > > >>>> > > > a default implementation. And all the optional methods
> > > should
> > > > be
> > > > > >>>> in
> > > > > >>>> > > > decorative interfaces. This is a clean API.
> > > > > >>>> > > >     c) due to b), there isn't a viable solution to add
> an
> > > > > >>>> obligatory
> > > > > >>>> > > > method to an existing interface in a backwards
> compatible
> > > way.
> > > > > >>>> Unless
> > > > > >>>> > we
> > > > > >>>> > > > are OK with breaking backwards compatibility, all the
> > > > interfaces
> > > > > >>>> > should be
> > > > > >>>> > > > treated as immutable. As a compromise, we might as well
> > just
> > > > > >>>> treat all
> > > > > >>>> > the
> > > > > >>>> > > > features added later as optional features. This way we
> > keep
> > > > the
> > > > > >>>> API
> > > > > >>>> > clean.
> > > > > >>>> > > >     d) based on b) and c), option 2 has a clean API,
> while
> > > > > option
> > > > > >>>> 1
> > > > > >>>> > does
> > > > > >>>> > > > not.
> > > > > >>>> > > >     e) It is OK that the code itself in option 2
> indicates
> > > the
> > > > > >>>> > developers
> > > > > >>>> > > > that a feature is optional. We will rely on the
> > > documentation
> > > > to
> > > > > >>>> > correct
> > > > > >>>> > > > that and clarify that the feature is actually
> obligatory.
> > > > > >>>> > > >     f) Regarding the effectiveness of making people
> aware
> > > that
> > > > > the
> > > > > >>>> > feature
> > > > > >>>> > > > is obligatory, Option 1 and Option 2 are similar. For
> > people
> > > > > that
> > > > > >>>> do
> > > > > >>>> > not
> > > > > >>>> > > > read the release note / documentation, they will mistake
> > the
> > > > > >>>> feature
> > > > > >>>> > to be
> > > > > >>>> > > > optional anyways.
> > > > > >>>> > > >
> > > > > >>>> > > > As for option 1: For developers, the feature is still
> > > optional
> > > > > >>>> due to
> > > > > >>>> > the
> > > > > >>>> > > > default implementation in the interface, regardless of
> > what
> > > > the
> > > > > >>>> default
> > > > > >>>> > > > implementation does, because the code compiles without
> > > > > overriding
> > > > > >>>> these
> > > > > >>>> > > > methods. Also, another problem of this option is that
> for
> > > > users
> > > > > >>>> that
> > > > > >>>> > do not
> > > > > >>>> > > > know about the history of the interface, they may be
> > > confused
> > > > by
> > > > > >>>> the
> > > > > >>>> > > > default implementation throwing an exception.
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> > > > 2. For this particular FLIP, should it be optional or
> not?
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> ***********************************************************************************
> > > > > >>>> > > > As mentioned in the previous email, I feel this FLIP
> > should
> > > be
> > > > > >>>> > obligatory,
> > > > > >>>> > > > for the following reasons:
> > > > > >>>> > > > 1. The Flink framework exposes the watermark alignment
> API
> > > to
> > > > > the
> > > > > >>>> end
> > > > > >>>> > > > users. From the end users' perspective, the feature
> should
> > > be
> > > > > >>>> available
> > > > > >>>> > > > regardless of the implementation details in the
> > pluggables.
> > > > This
> > > > > >>>> is
> > > > > >>>> > true
> > > > > >>>> > > > for any other methods exposed as the Flink API.
> > > > > >>>> > > > 2. If a Source is not pausable, the end user should
> > receive
> > > an
> > > > > >>>> > exception
> > > > > >>>> > > > when enable the watermark alignment, (both Piotr and me
> > > agree
> > > > on
> > > > > >>>> > this). In
> > > > > >>>> > > > that case, it meets my criteria of obligatory feature
> > > because
> > > > > not
> > > > > >>>> > > > implementing the feature causes a framework API to throw
> > > > > >>>> exception and
> > > > > >>>> > > > fails the job.
> > > > > >>>> > > >
> > > > > >>>> > > > On the other hand, Piotr does not have a strong opinion
> > > > > regarding
> > > > > >>>> > whether
> > > > > >>>> > > > this feature should be optional or not.
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> > > > Thanks for reading through this long email. So basically
> > in
> > > > > order
> > > > > >>>> to
> > > > > >>>> > make
> > > > > >>>> > > > progress on this FLIP, we want to see what do people
> feel
> > > > about
> > > > > >>>> the
> > > > > >>>> > above
> > > > > >>>> > > > two topics.
> > > > > >>>> > > >
> > > > > >>>> > > > Thanks,
> > > > > >>>> > > >
> > > > > >>>> > > > Jiangjie (Becket) Qin
> > > > > >>>> > > >
> > > > > >>>> > > >
> > > > > >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> > > > > >>>> pnowojski@apache.org>
> > > > > >>>> > > > wrote:
> > > > > >>>> > > >
> > > > > >>>> > > >> Hi Becket,
> > > > > >>>> > > >>
> > > > > >>>> > > >> I still sustain what I wrote before:
> > > > > >>>> > > >> > I think I would still vote soft -1 on this option,
> but
> > I
> > > > > >>>> wouldn't
> > > > > >>>> > block
> > > > > >>>> > > >> it in case I am out-voted.
> > > > > >>>> > > >>
> > > > > >>>> > > >> > I think it might be helpful to agree on the
> definition
> > of
> > > > > >>>> optional
> > > > > >>>> > in
> > > > > >>>> > > >> our
> > > > > >>>> > > >> case.
> > > > > >>>> > > >>
> > > > > >>>> > > >> For me it doesn't matter whether a default method
> > throwing
> > > an
> > > > > >>>> > exception we
> > > > > >>>> > > >> call optional or non-optional. As long as we keep it
> this
> > > > way,
> > > > > >>>> the
> > > > > >>>> > effect
> > > > > >>>> > > >> is the same. It's effectively a method that a user
> > doesn't
> > > > have
> > > > > >>>> to
> > > > > >>>> > > >> implement. If interface/system allows some methods to
> be
> > > not
> > > > > >>>> > implemented,
> > > > > >>>> > > >> some users will do just that, regardless if we call it
> > and
> > > > > >>>> document as
> > > > > >>>> > > >> non-optional. And at the same time it's clogging the
> base
> > > > > >>>> interface.
> > > > > >>>> > > >>
> > > > > >>>> > > >> By the way, just the need for a java-doc/documentation
> > > > > >>>> explaining the
> > > > > >>>> > > >> existence of some construct is a bad smell (code should
> > be
> > > > > >>>> > > >> self-documenting
> > > > > >>>> > > >> and default method throwing an
> > > UnsupportedOperationException
> > > > is
> > > > > >>>> not).
> > > > > >>>> > > >>
> > > > > >>>> > > >> > Please note that so far we do not assume whether the
> > > > feature
> > > > > >>>> is in
> > > > > >>>> > > >> > the original API or it is added later. A newly added
> > > > feature
> > > > > >>>> can
> > > > > >>>> > also be
> > > > > >>>> > > >> > non-optional, although it might take some time for
> all
> > > the
> > > > > >>>> pluggable
> > > > > >>>> > > >> > developers to catch up, and they should still work if
> > the
> > > > new
> > > > > >>>> > feature is
> > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> never
> > > > > expect
> > > > > >>>> an
> > > > > >>>> > > >> optional
> > > > > >>>> > > >> > feature to catch up, because leaving it unimplemented
> > is
> > > > also
> > > > > >>>> > blessed.
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > Let's take the checkpointing as an example. Imagine
> > Flink
> > > > did
> > > > > >>>> not
> > > > > >>>> > > >> support
> > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > trying
> > > to
> > > > > add
> > > > > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > > > > >>>> configuration
> > > > > >>>> > to
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > end users. In the meantime, will we tell the
> pluggable
> > > > (e.g.
> > > > > >>>> > operators,
> > > > > >>>> > > >> > connectors) developers that methods like
> > > "snapshotState()"
> > > > is
> > > > > >>>> > optional?
> > > > > >>>> > > >> If
> > > > > >>>> > > >> > we do that, the availability of checkpointing in
> Flink
> > > > would
> > > > > be
> > > > > >>>> > severely
> > > > > >>>> > > >> > weakened. But apparently we should still allow the
> > > existing
> > > > > >>>> > > >> implementations
> > > > > >>>> > > >> > to work without checkpointing. It looks to me that
> > adding
> > > > the
> > > > > >>>> > method to
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > pluggable interfaces with a default implementation
> > > throwing
> > > > > >>>> > > >> > "UnsupportedOperationException" would be the solution
> > > here.
> > > > > >>>> Please
> > > > > >>>> > note
> > > > > >>>> > > >> > that in this case, having the default implementation
> > does
> > > > not
> > > > > >>>> mean
> > > > > >>>> > this
> > > > > >>>> > > >> is
> > > > > >>>> > > >> > optional. It is just the technique to support
> backwards
> > > > > >>>> > compatibility in
> > > > > >>>> > > >> > the feature evolution. The fact that this method is
> in
> > > the
> > > > > base
> > > > > >>>> > > >> interface
> > > > > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > > > > >>>> implement it.
> > > > > >>>> > > >>
> > > > > >>>> > > >> I would soft vote -1 for having the default method
> > throwing
> > > > > >>>> > > >> UnsupportedOperationException as one of thing for this
> > > > > (FLIP-217)
> > > > > >>>> > special
> > > > > >>>> > > >> circumstances.
> > > > > >>>> > > >>
> > > > > >>>> > > >> At the moment, without thinking this over too much, I
> > would
> > > > > vote
> > > > > >>>> > harder -1
> > > > > >>>> > > >> for having this as a general rule when adding new
> > features.
> > > > If
> > > > > we
> > > > > >>>> > ever end
> > > > > >>>> > > >> up with an API that is littered with default methods
> > > throwing
> > > > > >>>> > > >> UnsupportedOperationException that are documented as
> "non
> > > > > >>>> optional" it
> > > > > >>>> > > >> would be IMO a big design failure. I would be ok-ish
> with
> > > > that,
> > > > > >>>> only
> > > > > >>>> > if
> > > > > >>>> > > >> that was a temporary thing and we had an aggressive
> plan
> > to
> > > > > >>>> release
> > > > > >>>> > more
> > > > > >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...)
> > breaking
> > > > API
> > > > > >>>> > > >> compatibility, that would get rid of those default
> > methods.
> > > > > >>>> Adding
> > > > > >>>> > > >> checkpointing and methods like "snapshotState()" would
> > IMO
> > > > > easily
> > > > > >>>> > justify
> > > > > >>>> > > >> a
> > > > > >>>> > > >> new major Flink release. In that case we could add
> those
> > > > > methods
> > > > > >>>> with
> > > > > >>>> > > >> default implementation for some transition period, a
> one
> > or
> > > > two
> > > > > >>>> minor
> > > > > >>>> > > >> releases, followed by a clean up in a major release.
> > > However
> > > > I
> > > > > >>>> would
> > > > > >>>> > still
> > > > > >>>> > > >> argue that it would be cleaner/better to add a
> decorative
> > > > > >>>> interface
> > > > > >>>> > like
> > > > > >>>> > > >> `CheckpointedOperator` interface instead of adding
> those
> > > > > default
> > > > > >>>> > methods
> > > > > >>>> > > >> to
> > > > > >>>> > > >> the base `Operator` interface.
> > > > > >>>> > > >>
> > > > > >>>> > > >> I think I can sum up our disagreement here is that I
> > would
> > > > like
> > > > > >>>> to
> > > > > >>>> > keep
> > > > > >>>> > > >> the
> > > > > >>>> > > >> interfaces simpler, with only obligatory
> methods/features
> > > on
> > > > > one
> > > > > >>>> side
> > > > > >>>> > and
> > > > > >>>> > > >> clearly optional features on the other. While you would
> > > like
> > > > to
> > > > > >>>> add an
> > > > > >>>> > > >> extra third state in between those two?
> > > > > >>>> > > >>
> > > > > >>>> > > >> Best,
> > > > > >>>> > > >> Piotrek
> > > > > >>>> > > >>
> > > > > >>>> > > >>
> > > > > >>>> > > >>
> > > > > >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <
> > becket.qin@gmail.com
> > > >
> > > > > >>>> > napisał(a):
> > > > > >>>> > > >>
> > > > > >>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > It looks like the key problem is still whether the
> > > > > >>>> implementation of
> > > > > >>>> > > >> > pausable splits in the Sources should be optional or
> > not.
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > I think it might be helpful to agree on the
> definition
> > of
> > > > > >>>> optional
> > > > > >>>> > in
> > > > > >>>> > > >> our
> > > > > >>>> > > >> > case. To me:
> > > > > >>>> > > >> > Optional = "You CAN leave the method unimplemented,
> and
> > > > that
> > > > > is
> > > > > >>>> > fine."
> > > > > >>>> > > >> > Non-Optional = "You CAN leave the method
> unimplemented,
> > > but
> > > > > you
> > > > > >>>> > SHOULD
> > > > > >>>> > > >> NOT,
> > > > > >>>> > > >> > because people assume this works."
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > I think one sufficient condition of a Non-Optional
> > > feature
> > > > is
> > > > > >>>> that
> > > > > >>>> > if
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > feature is exposed through the framework API, Flink
> > > should
> > > > > >>>> expect
> > > > > >>>> > the
> > > > > >>>> > > >> > pluggables to support this feature by default.
> > Otherwise
> > > > the
> > > > > >>>> > > >> availability
> > > > > >>>> > > >> > of that feature becomes undefined.
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > Please note that so far we do not assume whether the
> > > > feature
> > > > > >>>> is in
> > > > > >>>> > > >> > the original API or it is added later. A newly added
> > > > feature
> > > > > >>>> can
> > > > > >>>> > also be
> > > > > >>>> > > >> > non-optional, although it might take some time for
> all
> > > the
> > > > > >>>> pluggable
> > > > > >>>> > > >> > developers to catch up, and they should still work if
> > the
> > > > new
> > > > > >>>> > feature is
> > > > > >>>> > > >> > not used until they catch up. In contrast, we may
> never
> > > > > expect
> > > > > >>>> an
> > > > > >>>> > > >> optional
> > > > > >>>> > > >> > feature to catch up, because leaving it unimplemented
> > is
> > > > also
> > > > > >>>> > blessed.
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > Let's take the checkpointing as an example. Imagine
> > Flink
> > > > did
> > > > > >>>> not
> > > > > >>>> > > >> support
> > > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> > trying
> > > to
> > > > > add
> > > > > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > > > > >>>> configuration
> > > > > >>>> > to
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > end users. In the meantime, will we tell the
> pluggable
> > > > (e.g.
> > > > > >>>> > operators,
> > > > > >>>> > > >> > connectors) developers that methods like
> > > "snapshotState()"
> > > > is
> > > > > >>>> > optional?
> > > > > >>>> > > >> If
> > > > > >>>> > > >> > we do that, the availability of checkpointing in
> Flink
> > > > would
> > > > > be
> > > > > >>>> > severely
> > > > > >>>> > > >> > weakened. But apparently we should still allow the
> > > existing
> > > > > >>>> > > >> implementations
> > > > > >>>> > > >> > to work without checkpointing. It looks to me that
> > adding
> > > > the
> > > > > >>>> > method to
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > pluggable interfaces with a default implementation
> > > throwing
> > > > > >>>> > > >> > "UnsupportedOperationException" would be the solution
> > > here.
> > > > > >>>> Please
> > > > > >>>> > note
> > > > > >>>> > > >> > that in this case, having the default implementation
> > does
> > > > not
> > > > > >>>> mean
> > > > > >>>> > this
> > > > > >>>> > > >> is
> > > > > >>>> > > >> > optional. It is just the technique to support
> backwards
> > > > > >>>> > compatibility in
> > > > > >>>> > > >> > the feature evolution. The fact that this method is
> in
> > > the
> > > > > base
> > > > > >>>> > > >> interface
> > > > > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > > > > >>>> implement it.
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > When it comes to this FLIP, I think it meets the
> > criteria
> > > > of
> > > > > >>>> > > >> non-optional
> > > > > >>>> > > >> > features, so we should just use the evolution path of
> > > > > >>>> non-optional
> > > > > >>>> > > >> > features.
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > Thanks,
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > Jiangjie (Becket) Qin
> > > > > >>>> > > >> >
> > > > > >>>> > > >> >
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > > > > >>>> > pnowojski@apache.org>
> > > > > >>>> > > >> > wrote:
> > > > > >>>> > > >> >
> > > > > >>>> > > >> > > Hi,
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> > > Actually previously I thought about having a
> > decorative
> > > > > >>>> interface
> > > > > >>>> > and
> > > > > >>>> > > >> > > whenever watermark alignment is enabled, checking
> > that
> > > > the
> > > > > >>>> source
> > > > > >>>> > > >> > > implements the decorative interface. If not,
> throwing
> > > an
> > > > > >>>> > exception.
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> > > The option with default methods in the source
> > > interfaces
> > > > > >>>> throwing
> > > > > >>>> > > >> > > `UnsupportedOperationException` I think still
> suffers
> > > > from
> > > > > >>>> the
> > > > > >>>> > same
> > > > > >>>> > > >> > > problems I mentioned before. It's still an optional
> > > > > >>>> implementation
> > > > > >>>> > > >> and at
> > > > > >>>> > > >> > > the same time it's clogging the base interface. I
> > > think I
> > > > > >>>> would
> > > > > >>>> > still
> > > > > >>>> > > >> > vote
> > > > > >>>> > > >> > > soft -1 on this option, but I wouldn't block it in
> > > case I
> > > > > am
> > > > > >>>> > > >> out-voted.
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> > > Best,
> > > > > >>>> > > >> > > Piotrek
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > > > > >>>> > sebastian@ververica.com>
> > > > > >>>> > > >> > > napisał(a):
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> > > > Hi Becket,
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > > Thanks a lot for your fast and detailed response.
> > For
> > > > me,
> > > > > >>>> it
> > > > > >>>> > > >> converges
> > > > > >>>> > > >> > > and
> > > > > >>>> > > >> > > > dropping the supportsX method sounds very
> > reasonable
> > > to
> > > > > me.
> > > > > >>>> > (Side
> > > > > >>>> > > >> note:
> > > > > >>>> > > >> > > > With "pausable splits" enabled as "default" I
> think
> > > we
> > > > > >>>> > > >> misunderstood.
> > > > > >>>> > > >> > As
> > > > > >>>> > > >> > > > you described now "default" I understand as that
> it
> > > > > should
> > > > > >>>> be
> > > > > >>>> > the
> > > > > >>>> > > >> new
> > > > > >>>> > > >> > > > recommended way of implementation, and I think
> that
> > > is
> > > > > >>>> fully
> > > > > >>>> > valid.
> > > > > >>>> > > >> > > Before,
> > > > > >>>> > > >> > > > I understood "default" here as the default
> > > > > implementation,
> > > > > >>>> i.e.,
> > > > > >>>> > > >> > throwing
> > > > > >>>> > > >> > > > UnsupportedOperationException, which is the exact
> > > > > >>>> opposite. :) )
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > > Nevertheless: As mentioned, an open question for
> me
> > > is
> > > > if
> > > > > >>>> > watermark
> > > > > >>>> > > >> > > > alignment should enforce pausable splits. For
> > > > > >>>> clarification, the
> > > > > >>>> > > >> > current
> > > > > >>>> > > >> > > > documentation [1] says:
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning
> across
> > > > tasks
> > > > > >>>> of the
> > > > > >>>> > same
> > > > > >>>> > > >> > > >> source and/or different sources. It does not
> > support
> > > > > >>>> aligning
> > > > > >>>> > > >> > > >> splits/partitions/shards in the same task.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> In a case where there are e.g. two Kafka
> > partitions
> > > > that
> > > > > >>>> > produce
> > > > > >>>> > > >> > > >> watermarks at different pace, that get assigned
> to
> > > the
> > > > > >>>> same
> > > > > >>>> > task
> > > > > >>>> > > >> > > watermark
> > > > > >>>> > > >> > > >> might not behave as expected. Fortunately, worst
> > > case
> > > > it
> > > > > >>>> > should not
> > > > > >>>> > > >> > > perform
> > > > > >>>> > > >> > > >> worse than without alignment.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> Given the limitation above, we suggest applying
> > > > > watermark
> > > > > >>>> > > >> alignment in
> > > > > >>>> > > >> > > >> two situations:
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>    1. You have two different sources (e.g. Kafka
> > and
> > > > > >>>> File) that
> > > > > >>>> > > >> > produce
> > > > > >>>> > > >> > > >>    watermarks at different speeds
> > > > > >>>> > > >> > > >>    2. You run your source with parallelism equal
> > to
> > > > the
> > > > > >>>> number
> > > > > >>>> > of
> > > > > >>>> > > >> > > >>    splits/shards/partitions, which results in
> > every
> > > > > >>>> subtask
> > > > > >>>> > being
> > > > > >>>> > > >> > > assigned a
> > > > > >>>> > > >> > > >>    single unit of work.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> I personally see no issue in implementing and I
> > see
> > > no
> > > > > >>>> reason
> > > > > >>>> > > >> against
> > > > > >>>> > > >> > > > implementing this dependency of watermark
> alignment
> > > and
> > > > > >>>> pausable
> > > > > >>>> > > >> > splits.
> > > > > >>>> > > >> > > (I
> > > > > >>>> > > >> > > > think this would even be a good path towards
> > shaping
> > > > > >>>> watermark
> > > > > >>>> > > >> > alignment
> > > > > >>>> > > >> > > in
> > > > > >>>> > > >> > > > 1.16.) However, "I don't see" means that I would
> be
> > > > happy
> > > > > >>>> to
> > > > > >>>> > hear
> > > > > >>>> > > >> > Dawid's
> > > > > >>>> > > >> > > > and Piotrek's opinions as they implemented
> > watermark
> > > > > >>>> alignment
> > > > > >>>> > > >> based on
> > > > > >>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
> > > > > >>>> > rationale/background
> > > > > >>>> > > >> > info
> > > > > >>>> > > >> > > > from their side.
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > > Regards,
> > > > > >>>> > > >> > > > Sebastian
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > > [1]
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> >
> > > > > >>>> > > >>
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > > > >>>> > > >> > > > [2]
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> >
> > > > > >>>> > > >>
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > > > > >>>> > becket.qin@gmail.com>
> > > > > >>>> > > >> > wrote:
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > >> +dev
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> Hi Sebastian,
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> Thank you for the summary. Please see the
> detailed
> > > > > replies
> > > > > >>>> > inline.
> > > > > >>>> > > >> As
> > > > > >>>> > > >> > a
> > > > > >>>> > > >> > > >> recap of my suggestions.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> 1. Pausable splits API.
> > > > > >>>> > > >> > > >>   a) Add default implementations to methods
> > > > > >>>> > "pauseOrResumeSplits"
> > > > > >>>> > > >> in
> > > > > >>>> > > >> > > both
> > > > > >>>> > > >> > > >> SourceReader and SplitReader where both default
> > > > > >>>> implementations
> > > > > >>>> > > >> throw
> > > > > >>>> > > >> > > >>  UnsupportedOperationException.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> 2. User story.
> > > > > >>>> > > >> > > >>     a) We tell users to enable the watermark
> > > alignment
> > > > > as
> > > > > >>>> they
> > > > > >>>> > > >> like.
> > > > > >>>> > > >> > > This
> > > > > >>>> > > >> > > >> is exactly what the current Flink API is.
> > > > > >>>> > > >> > > >>     b) We tell the source developers, please
> > > implement
> > > > > >>>> pausable
> > > > > >>>> > > >> > splits,
> > > > > >>>> > > >> > > >> otherwise bad things may happen. Think of it
> like
> > > you
> > > > > are
> > > > > >>>> > expected
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >> implement SourceReader#snapshotState() properly,
> > > > > otherwise
> > > > > >>>> > > >> exceptions
> > > > > >>>> > > >> > > will
> > > > > >>>> > > >> > > >> be thrown when users enable checkpointing.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> Thanks,
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian
> > Mattheis <
> > > > > >>>> > > >> > > >> sebastian@ververica.com> wrote:
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>> Hi Becket, Hi everybody,
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> I'm sorry if I misread the messages but I could
> > not
> > > > > >>>> derive an
> > > > > >>>> > > >> > agreement
> > > > > >>>> > > >> > > >>> from the mailing list. Nevertheless, if I
> > > understand
> > > > > you
> > > > > >>>> > right the
> > > > > >>>> > > >> > > >>> suggestion is:
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> * Add default implementations to methods
> > > > > >>>> > "pauseOrResumeSplits" in
> > > > > >>>> > > >> > both
> > > > > >>>> > > >> > > >>> SourceReader and SplitReader where both default
> > > > > >>>> > implementations
> > > > > >>>> > > >> throw
> > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >> Yes.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the
> Source
> > > > > >>>> interface.
> > > > > >>>> > (In
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>> following, I refer to supporting this as
> > "pausable
> > > > > >>>> splits".)
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >> We may no longer need this if pausable splits
> are
> > > > > >>>> expected to
> > > > > >>>> > be
> > > > > >>>> > > >> > > >> implemented by the source developers, i.e.
> > > > non-optional.
> > > > > >>>> Having
> > > > > >>>> > > >> this
> > > > > >>>> > > >> > > method
> > > > > >>>> > > >> > > >> would then be somewhat misleading as it looks
> like
> > > the
> > > > > >>>> sources
> > > > > >>>> > > >> that do
> > > > > >>>> > > >> > > not
> > > > > >>>> > > >> > > >> support pausable splits are also acceptable in
> the
> > > > long
> > > > > >>>> term.
> > > > > >>>> > So
> > > > > >>>> > > >> API
> > > > > >>>> > > >> > > wise,
> > > > > >>>> > > >> > > >> I'd say maybe we should remove this for this
> FLIP,
> > > > > >>>> although I
> > > > > >>>> > > >> believe
> > > > > >>>> > > >> > > this
> > > > > >>>> > > >> > > >> supportXXX pattern itself is still attractive
> for
> > > > > optional
> > > > > >>>> > > >> features.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> To make the conclusions explicit:
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in
> > > both
> > > > > >>>> > interfaces
> > > > > >>>> > > >> > > >>> SourceReader and SplitReader are optional where
> > the
> > > > > >>>> default is
> > > > > >>>> > > >> that
> > > > > >>>> > > >> > it
> > > > > >>>> > > >> > > >>> doesn't support it. (--> This means that the
> > > > > >>>> implementation is
> > > > > >>>> > > >> still
> > > > > >>>> > > >> > > >>> optional for the source developer.)
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >> It is optional for backwards compatibility with
> > > > existing
> > > > > >>>> > sources,
> > > > > >>>> > > >> as
> > > > > >>>> > > >> > > they
> > > > > >>>> > > >> > > >> may still compile without code change. But
> > starting
> > > > from
> > > > > >>>> this
> > > > > >>>> > FLIP,
> > > > > >>>> > > >> > > Flink
> > > > > >>>> > > >> > > >> will always optimistically assume that all the
> > > sources
> > > > > >>>> support
> > > > > >>>> > > >> > pausable
> > > > > >>>> > > >> > > >> splits. If a source does not support pausable
> > > splits,
> > > > it
> > > > > >>>> goes
> > > > > >>>> > to an
> > > > > >>>> > > >> > > error
> > > > > >>>> > > >> > > >> handling path when watermark alignment is
> enabled
> > on
> > > > it.
> > > > > >>>> This
> > > > > >>>> > is
> > > > > >>>> > > >> > > different
> > > > > >>>> > > >> > > >> from a usual optional feature, where no error is
> > > > > expected.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>> 2. If watermark alignment is enabled in the
> > > > application
> > > > > >>>> code
> > > > > >>>> > by
> > > > > >>>> > > >> > adding
> > > > > >>>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy
> > > while
> > > > > >>>> > > >> SourceReader or
> > > > > >>>> > > >> > > >>> SplitReader do not support pausableSplits, we
> > throw
> > > > an
> > > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >> Yes.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>> 3. With regard to your statement:
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>>> [...] basically means watermark alignment is
> an
> > > > > >>>> non-optional
> > > > > >>>> > > >> feature
> > > > > >>>> > > >> > > to
> > > > > >>>> > > >> > > >>>> the end users.
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> You actually mean that "pausable splits" are
> > > > > >>>> non-optional for
> > > > > >>>> > the
> > > > > >>>> > > >> app
> > > > > >>>> > > >> > > >>> developer if watermark alignment is enabled.
> > > However,
> > > > > >>>> > watermark
> > > > > >>>> > > >> > > alignment
> > > > > >>>> > > >> > > >>> is optional and can be enabled/disabled.
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >> Yes, watermark alignment can be enabled/disabled
> > in
> > > > > >>>> individual
> > > > > >>>> > > >> sources
> > > > > >>>> > > >> > > in
> > > > > >>>> > > >> > > >> Flink jobs, which basically means the code
> > > supporting
> > > > > >>>> watermark
> > > > > >>>> > > >> > > alignment
> > > > > >>>> > > >> > > >> has to already be there. That again means the
> > Source
> > > > > >>>> > developers are
> > > > > >>>> > > >> > also
> > > > > >>>> > > >> > > >> expected to support pausable splits by default.
> So
> > > > this
> > > > > >>>> way we
> > > > > >>>> > > >> > > essentially
> > > > > >>>> > > >> > > >> tell the end users that you may enable / disable
> > > this
> > > > > >>>> feature
> > > > > >>>> > as
> > > > > >>>> > > >> you
> > > > > >>>> > > >> > > wish,
> > > > > >>>> > > >> > > >> and tell the source developers that you SHOULD
> > > > implement
> > > > > >>>> this
> > > > > >>>> > > >> because
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >> end users may turn it on/off at will. And if the
> > > > source
> > > > > >>>> does
> > > > > >>>> > not
> > > > > >>>> > > >> > support
> > > > > >>>> > > >> > > >> pausable splits, that goes to an error handling
> > path
> > > > > when
> > > > > >>>> > watermark
> > > > > >>>> > > >> > > >> alignment is enabled on it. So users know they
> > have
> > > to
> > > > > >>>> > explicitly
> > > > > >>>> > > >> > > exclude
> > > > > >>>> > > >> > > >> this source.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> So far it's totally clear to me and I hope this
> > is
> > > > what
> > > > > >>>> you
> > > > > >>>> > mean.
> > > > > >>>> > > >> I
> > > > > >>>> > > >> > > also
> > > > > >>>> > > >> > > >>> agree with both statements:
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> So making that expectation aligned with the
> > source
> > > > > >>>> developers
> > > > > >>>> > > >> seems
> > > > > >>>> > > >> > > >>>> reasonable.
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> I think this is a simple and clean solution
> from
> > > both
> > > > > >>>> the end
> > > > > >>>> > user
> > > > > >>>> > > >> > and
> > > > > >>>> > > >> > > >>>> source developers' standpoint.
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> However, a last conclusion derives from 3. and
> is
> > > an
> > > > > open
> > > > > >>>> > question
> > > > > >>>> > > >> > for
> > > > > >>>> > > >> > > >>> me:
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> 4. The feature of "pausable splits" is now
> > tightly
> > > > > bound
> > > > > >>>> to
> > > > > >>>> > > >> watermark
> > > > > >>>> > > >> > > >>> alignment, i.e., if sources do not support
> > > "pausable
> > > > > >>>> splits"
> > > > > >>>> > one
> > > > > >>>> > > >> can
> > > > > >>>> > > >> > > not
> > > > > >>>> > > >> > > >>> enable watermark alignment for these sources.
> > This
> > > > > >>>> dependency
> > > > > >>>> > is
> > > > > >>>> > > >> not
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >>> current status of watermark alignment
> > > implementation
> > > > > >>>> because
> > > > > >>>> > it
> > > > > >>>> > > >> > is/was
> > > > > >>>> > > >> > > >>> implemented without pausable splits. Do we want
> > to
> > > > > >>>> introduce
> > > > > >>>> > this
> > > > > >>>> > > >> > > >>> dependency? (This is an open question. I cannot
> > > judge
> > > > > >>>> that.)
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >> The watermark alignment basically relies on the
> > > > pausable
> > > > > >>>> > splits,
> > > > > >>>> > > >> > right?
> > > > > >>>> > > >> > > >> So personally I found it quite reasonable that
> if
> > > the
> > > > > >>>> source
> > > > > >>>> > does
> > > > > >>>> > > >> not
> > > > > >>>> > > >> > > >> support pausable splits, end users cannot enable
> > > > > watermark
> > > > > >>>> > > >> alignment
> > > > > >>>> > > >> > on
> > > > > >>>> > > >> > > it.
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>
> > > > > >>>> > > >> > > >>> If something is wrong, please correct me.
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> Regards,
> > > > > >>>> > > >> > > >>> Sebastian
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> > > > > >>>> > becket.qin@gmail.com>
> > > > > >>>> > > >> > > wrote:
> > > > > >>>> > > >> > > >>>
> > > > > >>>> > > >> > > >>>> Hi Sebastian,
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>> Thanks for the reply and patient discussion. I
> > > agree
> > > > > >>>> this is
> > > > > >>>> > a
> > > > > >>>> > > >> > tricky
> > > > > >>>> > > >> > > >>>> decision.
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about
> > > Option
> > > > > c)
> > > > > >>>> > which I
> > > > > >>>> > > >> see
> > > > > >>>> > > >> > as
> > > > > >>>> > > >> > > >>>>> follows:
> > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > implementation
> > > > > >>>> makes the
> > > > > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> > > > default
> > > > > >>>> > > >> > implementation
> > > > > >>>> > > >> > > is and
> > > > > >>>> > > >> > > >>>>> will remain a way of making implementation
> > > optional
> > > > > >>>> because
> > > > > >>>> > > >> even in
> > > > > >>>> > > >> > > future
> > > > > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> > > > flavor"
> > > > > >>>> without
> > > > > >>>> > > >> > support
> > > > > >>>> > > >> > > for
> > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> find
> > it
> > > > > >>>> suboptimal
> > > > > >>>> > > >> that
> > > > > >>>> > > >> > > with
> > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no way
> > to
> > > > > check
> > > > > >>>> at
> > > > > >>>> > > >> runtime
> > > > > >>>> > > >> > if
> > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> > > > pausing.
> > > > > >>>> (To
> > > > > >>>> > do so,
> > > > > >>>> > > >> > one
> > > > > >>>> > > >> > > would
> > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it again
> > more
> > > > > >>>> > complicated.)\
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>> Based on the last few messages in the mailing
> > > list.
> > > > > >>>> Piotr
> > > > > >>>> > and I
> > > > > >>>> > > >> > > agreed
> > > > > >>>> > > >> > > >>>> that the default implementation should just
> > throw
> > > an
> > > > > >>>> > > >> > > >>>> UnsupportedOperationException if the source is
> > > > > >>>> unpausable. So
> > > > > >>>> > > >> this
> > > > > >>>> > > >> > > >>>> basically tells the Source developers that
> this
> > > > > feature
> > > > > >>>> is
> > > > > >>>> > > >> expected
> > > > > >>>> > > >> > > to be
> > > > > >>>> > > >> > > >>>> supported. Because we cannot prevent end users
> > > from
> > > > > >>>> putting
> > > > > >>>> > an
> > > > > >>>> > > >> > > unpausable
> > > > > >>>> > > >> > > >>>> source into the watermark alignment group,
> that
> > > > > >>>> basically
> > > > > >>>> > means
> > > > > >>>> > > >> > > watermark
> > > > > >>>> > > >> > > >>>> alignment is an non-optional feature to the
> end
> > > > users.
> > > > > >>>> So
> > > > > >>>> > making
> > > > > >>>> > > >> > that
> > > > > >>>> > > >> > > >>>> expectation aligned with the source developers
> > > seems
> > > > > >>>> > reasonable.
> > > > > >>>> > > >> > And
> > > > > >>>> > > >> > > if a
> > > > > >>>> > > >> > > >>>> source does not support this feature, the end
> > > users
> > > > > >>>> should
> > > > > >>>> > > >> > explicitly
> > > > > >>>> > > >> > > >>>> remove that source from the watermark
> alignment
> > > > group.
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>> Personally speaking I think this is a simple
> and
> > > > clean
> > > > > >>>> > solution
> > > > > >>>> > > >> from
> > > > > >>>> > > >> > > >>>> both the end user and source developers'
> > > standpoint.
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>> Does this address your concerns?
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>> Thanks,
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian
> > > Mattheis <
> > > > > >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> > > > > >>>> > > >> > > >>>>
> > > > > >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> > > > > >>>> suggestions/options
> > > > > >>>> > and
> > > > > >>>> > > >> we
> > > > > >>>> > > >> > > >>>>> would be okay either way because we find
> > neither
> > > > > >>>> solution is
> > > > > >>>> > > >> > perfect
> > > > > >>>> > > >> > > just
> > > > > >>>> > > >> > > >>>>> because of the already present complexity.
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
> > > > > >>>> SourceReader
> > > > > >>>> > and
> > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to be
> > used
> > > > by
> > > > > >>>> > > >> SourceReader
> > > > > >>>> > > >> > and
> > > > > >>>> > > >> > > >>>>> SplitReader
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP
> > [1],
> > > > it
> > > > > is
> > > > > >>>> > based on
> > > > > >>>> > > >> > > >>>>> Option c) which we find acceptable because
> the
> > > > > >>>> complexity
> > > > > >>>> > added
> > > > > >>>> > > >> is
> > > > > >>>> > > >> > > only a
> > > > > >>>> > > >> > > >>>>> single method.
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about
> > > Option
> > > > > c)
> > > > > >>>> > which I
> > > > > >>>> > > >> see
> > > > > >>>> > > >> > as
> > > > > >>>> > > >> > > >>>>> follows:
> > > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> > implementation
> > > > > >>>> makes the
> > > > > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> > > > default
> > > > > >>>> > > >> > implementation
> > > > > >>>> > > >> > > is and
> > > > > >>>> > > >> > > >>>>> will remain a way of making implementation
> > > optional
> > > > > >>>> because
> > > > > >>>> > > >> even in
> > > > > >>>> > > >> > > future
> > > > > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> > > > flavor"
> > > > > >>>> without
> > > > > >>>> > > >> > support
> > > > > >>>> > > >> > > for
> > > > > >>>> > > >> > > >>>>> pausable splits.
> > > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also
> find
> > it
> > > > > >>>> suboptimal
> > > > > >>>> > > >> that
> > > > > >>>> > > >> > > with
> > > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no way
> > to
> > > > > check
> > > > > >>>> at
> > > > > >>>> > > >> runtime
> > > > > >>>> > > >> > if
> > > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> > > > pausing.
> > > > > >>>> (To
> > > > > >>>> > do so,
> > > > > >>>> > > >> > one
> > > > > >>>> > > >> > > would
> > > > > >>>> > > >> > > >>>>> need a supportsX method which makes it again
> > more
> > > > > >>>> > complicated.)
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> However, we haven't changed it because Option
> > a)
> > > is
> > > > > >>>> also not
> > > > > >>>> > > >> > optimal
> > > > > >>>> > > >> > > >>>>> or straight-forward:
> > > > > >>>> > > >> > > >>>>> (1) We need to add two distinct yet similar
> > > > > decorative
> > > > > >>>> > > >> interfaces
> > > > > >>>> > > >> > > >>>>> since, as mentioned, the signatures of the
> > > methods
> > > > > are
> > > > > >>>> > > >> different.
> > > > > >>>> > > >> > For
> > > > > >>>> > > >> > > >>>>> example, we would need decorative interfaces
> > like
> > > > > >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > > > > >>>> > > >> > `SourceReaderWithPausableSplits`.
> > > > > >>>> > > >> > > >>>>> (2) As a consequence, we would need to
> somehow
> > > > > document
> > > > > >>>> > > >> how/where
> > > > > >>>> > > >> > to
> > > > > >>>> > > >> > > >>>>> implement both interfaces and how this
> relates
> > to
> > > > > each
> > > > > >>>> > other.
> > > > > >>>> > > >> This
> > > > > >>>> > > >> > > we could
> > > > > >>>> > > >> > > >>>>> solve by adding a note in the interface of
> > > > > >>>> SourceReader and
> > > > > >>>> > > >> > > SplitReader and
> > > > > >>>> > > >> > > >>>>> reference to the decorative interfaces but it
> > > still
> > > > > >>>> > increases
> > > > > >>>> > > >> > > complexity
> > > > > >>>> > > >> > > >>>>> too.
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> In summary, we see both as acceptable and
> > > preferred
> > > > > >>>> over
> > > > > >>>> > other
> > > > > >>>> > > >> > > >>>>> options. The question is if we can find a
> > > solution
> > > > or
> > > > > >>>> > compromise
> > > > > >>>> > > >> > > that is
> > > > > >>>> > > >> > > >>>>> acceptable for everybody to reach consensus.
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> Please let us know what you think because we
> > > would
> > > > be
> > > > > >>>> happy
> > > > > >>>> > if
> > > > > >>>> > > >> we
> > > > > >>>> > > >> > can
> > > > > >>>> > > >> > > >>>>> conclude the discussion to avoid dropping the
> > > > > >>>> initiative on
> > > > > >>>> > this
> > > > > >>>> > > >> > > FLIP.
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> Regards,
> > > > > >>>> > > >> > > >>>>> Sebastian
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> [1]
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> >
> > > > > >>>> > > >>
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > > > >>>> > > >> > > >>>>> (v. 12)
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr
> Nowojski
> > <
> > > > > >>>> > > >> > pnowojski@apache.org
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > >>>>> wrote:
> > > > > >>>> > > >> > > >>>>>
> > > > > >>>> > > >> > > >>>>>> Hi Guowei,
> > > > > >>>> > > >> > > >>>>>>
> > > > > >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
> > > > > >>>> > > >> > > >>>>>>
> > > > > >>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1]
> > which
> > > > has
> > > > > >>>> been
> > > > > >>>> > > >> already
> > > > > >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be
> > > enabled
> > > > > >>>> with the
> > > > > >>>> > > >> > watermark
> > > > > >>>> > > >> > > >>>>>> strategy which also configures drift and
> > update
> > > > > >>>> interval
> > > > > >>>> > > >> > > >>>>>>
> > > > > >>>> > > >> > > >>>>>> So by default watermark alignment is
> disabled,
> > > > > >>>> regardless
> > > > > >>>> > if a
> > > > > >>>> > > >> > > source
> > > > > >>>> > > >> > > >>>>>> supports it or not.
> > > > > >>>> > > >> > > >>>>>>
> > > > > >>>> > > >> > > >>>>>> Best,
> > > > > >>>> > > >> > > >>>>>> Piotrek
> > > > > >>>> > > >> > > >>>>>>
> > > > > >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> > > > > >>>> guowei.mgw@gmail.com>
> > > > > >>>> > > >> > > napisał(a):
> > > > > >>>> > > >> > > >>>>>>
> > > > > >>>> > > >> > > >>>>>>> Hi,
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>> We know that in the case of Bounded input
> > Flink
> > > > > >>>> supports
> > > > > >>>> > the
> > > > > >>>> > > >> > Batch
> > > > > >>>> > > >> > > >>>>>>> execution mode. Currently in Batch
> execution
> > > > mode,
> > > > > >>>> flink
> > > > > >>>> > is
> > > > > >>>> > > >> > > executed
> > > > > >>>> > > >> > > >>>>>>> on a
> > > > > >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps
> > > > > watermark
> > > > > >>>> > alignment
> > > > > >>>> > > >> > > might
> > > > > >>>> > > >> > > >>>>>>> not
> > > > > >>>> > > >> > > >>>>>>> gain much.
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>> So my question is: Is watermark alignment
> the
> > > > > default
> > > > > >>>> > > >> > behavior(for
> > > > > >>>> > > >> > > >>>>>>> implemented source only)? If so, have you
> > > > > considered
> > > > > >>>> > > >> evaluating
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> impact
> > > > > >>>> > > >> > > >>>>>>> of this behavior on the Batch execution
> mode?
> > > Or
> > > > > >>>> thinks
> > > > > >>>> > it is
> > > > > >>>> > > >> not
> > > > > >>>> > > >> > > >>>>>>> necessary.
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>> Correct me if I miss something.
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>> Best,
> > > > > >>>> > > >> > > >>>>>>> Guowei
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr
> > Nowojski <
> > > > > >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > > I feel that no matter which option we
> > > choose
> > > > > >>>> this can
> > > > > >>>> > not
> > > > > >>>> > > >> be
> > > > > >>>> > > >> > > >>>>>>> solved
> > > > > >>>> > > >> > > >>>>>>> > entirely in either of the options,
> because
> > of
> > > > the
> > > > > >>>> point
> > > > > >>>> > > >> above
> > > > > >>>> > > >> > and
> > > > > >>>> > > >> > > >>>>>>> because
> > > > > >>>> > > >> > > >>>>>>> > the signature of
> > > > SplitReader#pauseOrResumeSplits
> > > > > >>>> and
> > > > > >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are
> > slightly
> > > > > >>>> different
> > > > > >>>> > (one
> > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > >>>> > > >> > > >>>>>>> > splits with splitId the other one passes
> > the
> > > > > splits
> > > > > >>>> > > >> directly).
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case and
> > for
> > > > > >>>> features
> > > > > >>>> > that
> > > > > >>>> > > >> > need
> > > > > >>>> > > >> > > >>>>>>> to be
> > > > > >>>> > > >> > > >>>>>>> > implemented in more than one place.
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading
> > > from
> > > > a
> > > > > >>>> split
> > > > > >>>> > an
> > > > > >>>> > > >> > > optional
> > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > >>>> > > >> > > >>>>>>> > > other than that this was not included
> in
> > > the
> > > > > >>>> original
> > > > > >>>> > > >> > > interface?
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > An additional argument in favor of making
> > it
> > > > > >>>> optional
> > > > > >>>> > is to
> > > > > >>>> > > >> > > >>>>>>> simplify source
> > > > > >>>> > > >> > > >>>>>>> > implementation. But on its own I'm not
> sure
> > > if
> > > > > that
> > > > > >>>> > would be
> > > > > >>>> > > >> > > >>>>>>> enough to
> > > > > >>>> > > >> > > >>>>>>> > justify making this feature optional.
> > Maybe.
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > clearer
> > > > to
> > > > > >>>> just
> > > > > >>>> > let
> > > > > >>>> > > >> end
> > > > > >>>> > > >> > > >>>>>>> users and
> > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> implement
> > > this
> > > > > >>>> feature.
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > As I wrote above that would be an
> > interesting
> > > > > >>>> choice to
> > > > > >>>> > make
> > > > > >>>> > > >> > > (ease
> > > > > >>>> > > >> > > >>>>>>> of
> > > > > >>>> > > >> > > >>>>>>> > implementation for new users, vs system
> > > > > >>>> consistency).
> > > > > >>>> > > >> > Regardless
> > > > > >>>> > > >> > > >>>>>>> of that,
> > > > > >>>> > > >> > > >>>>>>> > yes, for me the main argument is the API
> > > > backward
> > > > > >>>> > > >> > compatibility.
> > > > > >>>> > > >> > > >>>>>>> But let's
> > > > > >>>> > > >> > > >>>>>>> > clear a couple of points:
> > > > > >>>> > > >> > > >>>>>>> > - The current proposal adding methods to
> > the
> > > > base
> > > > > >>>> > interface
> > > > > >>>> > > >> > with
> > > > > >>>> > > >> > > >>>>>>> default
> > > > > >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature.
> > Same
> > > as
> > > > > the
> > > > > >>>> > > >> decorative
> > > > > >>>> > > >> > > >>>>>>> version
> > > > > >>>> > > >> > > >>>>>>> > would be.
> > > > > >>>> > > >> > > >>>>>>> > - Decorative version could implement
> "throw
> > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> > > > > >>>> > > >> > > >>>>>>> > if user enabled watermark alignment just
> as
> > > > well
> > > > > >>>> and I
> > > > > >>>> > agree
> > > > > >>>> > > >> > > >>>>>>> that's a
> > > > > >>>> > > >> > > >>>>>>> > better option compared to logging a
> > warning.
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > Best,
> > > > > >>>> > > >> > > >>>>>>> > Piotrek
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > > > > >>>> > becket.qin@gmail.com>
> > > > > >>>> > > >> > > >>>>>>> napisał(a):
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient
> > > discussion,
> > > > > >>>> Piotr and
> > > > > >>>> > > >> Dawid.
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading
> > > from
> > > > a
> > > > > >>>> split
> > > > > >>>> > an
> > > > > >>>> > > >> > > optional
> > > > > >>>> > > >> > > >>>>>>> feature,
> > > > > >>>> > > >> > > >>>>>>> > > other than that this was not included
> in
> > > the
> > > > > >>>> original
> > > > > >>>> > > >> > > interface?
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > To be honest I am really worried about
> > the
> > > > > >>>> complexity
> > > > > >>>> > of
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> user story
> > > > > >>>> > > >> > > >>>>>>> > > here. Optional features like this have
> a
> > > high
> > > > > >>>> > overhead.
> > > > > >>>> > > >> > Imagine
> > > > > >>>> > > >> > > >>>>>>> this
> > > > > >>>> > > >> > > >>>>>>> > > feature is optional, now a user enabled
> > > > > watermark
> > > > > >>>> > > >> alignment
> > > > > >>>> > > >> > and
> > > > > >>>> > > >> > > >>>>>>> defined a
> > > > > >>>> > > >> > > >>>>>>> > > few watermark groups. Would it work?
> Hmm,
> > > > that
> > > > > >>>> > depends on
> > > > > >>>> > > >> > > >>>>>>> whether the
> > > > > >>>> > > >> > > >>>>>>> > > involved Source has implmemented this
> > > > feature.
> > > > > >>>> If the
> > > > > >>>> > > >> Sources
> > > > > >>>> > > >> > > >>>>>>> are well
> > > > > >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end
> > users
> > > > may
> > > > > >>>> have to
> > > > > >>>> > > >> look
> > > > > >>>> > > >> > > into
> > > > > >>>> > > >> > > >>>>>>> the code
> > > > > >>>> > > >> > > >>>>>>> > > of the Source to see whether the
> feature
> > is
> > > > > >>>> supported.
> > > > > >>>> > > >> Which
> > > > > >>>> > > >> > is
> > > > > >>>> > > >> > > >>>>>>> something
> > > > > >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> > clearer
> > > > to
> > > > > >>>> just
> > > > > >>>> > let
> > > > > >>>> > > >> end
> > > > > >>>> > > >> > > >>>>>>> users and
> > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will
> implement
> > > this
> > > > > >>>> feature.
> > > > > >>>> > > >> After
> > > > > >>>> > > >> > > all
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to the
> > end
> > > > > >>>> users. If
> > > > > >>>> > in
> > > > > >>>> > > >> some
> > > > > >>>> > > >> > > >>>>>>> rare cases,
> > > > > >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a
> clear
> > > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException
> > > > > >>>> > > >> > > >>>>>>> > will
> > > > > >>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly
> > > remove
> > > > > this
> > > > > >>>> > Source
> > > > > >>>> > > >> from
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >>>>>>> > watermark
> > > > > >>>> > > >> > > >>>>>>> > > group. I don't think we should have a
> > > warning
> > > > > >>>> message
> > > > > >>>> > > >> here,
> > > > > >>>> > > >> > as
> > > > > >>>> > > >> > > >>>>>>> they tend
> > > > > >>>> > > >> > > >>>>>>> > to
> > > > > >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do
> this,
> > we
> > > > > >>>> don't even
> > > > > >>>> > > >> need
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> > supportXXX
> > > > > >>>> > > >> > > >>>>>>> > > method in the Source for this feature.
> In
> > > > fact
> > > > > >>>> this is
> > > > > >>>> > > >> > exactly
> > > > > >>>> > > >> > > >>>>>>> how many
> > > > > >>>> > > >> > > >>>>>>> > > interfaces works today. For example,
> > > > > >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > > > >>>> > > >> > > >>>>>>> > not
> > > > > >>>> > > >> > > >>>>>>> > > supported by Pravega source because it
> > does
> > > > not
> > > > > >>>> > support
> > > > > >>>> > > >> > partial
> > > > > >>>> > > >> > > >>>>>>> failover.
> > > > > >>>> > > >> > > >>>>>>> > > In that case, it simply throws an
> > exception
> > > > to
> > > > > >>>> > trigger a
> > > > > >>>> > > >> > global
> > > > > >>>> > > >> > > >>>>>>> recovery.
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > The reason we add a default
> > implementation
> > > in
> > > > > >>>> this
> > > > > >>>> > case
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > > >>>>>>> just for
> > > > > >>>> > > >> > > >>>>>>> > the
> > > > > >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so the
> > old
> > > > > >>>> source can
> > > > > >>>> > > >> still
> > > > > >>>> > > >> > > >>>>>>> compile.
> > > > > >>>> > > >> > > >>>>>>> > Sure,
> > > > > >>>> > > >> > > >>>>>>> > > in short term, this feature might not
> be
> > > > > >>>> supported by
> > > > > >>>> > many
> > > > > >>>> > > >> > > >>>>>>> existing
> > > > > >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite
> > > visible
> > > > to
> > > > > >>>> the
> > > > > >>>> > source
> > > > > >>>> > > >> > > >>>>>>> developers
> > > > > >>>> > > >> > > >>>>>>> > that
> > > > > >>>> > > >> > > >>>>>>> > > they did not override the default impl
> > > which
> > > > > >>>> throws an
> > > > > >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > @Dawid,
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method
> > in
> > > > the
> > > > > >>>> Source
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> the single
> > > > > >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to
> > > implement
> > > > > >>>> this
> > > > > >>>> > > >> feature.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > I also don't find it entirely true.
> Half
> > of
> > > > the
> > > > > >>>> > classes
> > > > > >>>> > > >> are
> > > > > >>>> > > >> > > >>>>>>> theoretically
> > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from
> > the
> > > > > >>>> point of
> > > > > >>>> > view
> > > > > >>>> > > >> how
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do
> > not
> > > > > need
> > > > > >>>> to
> > > > > >>>> > use
> > > > > >>>> > > >> any
> > > > > >>>> > > >> > of
> > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would
> > be
> > > > > weird
> > > > > >>>> to
> > > > > >>>> > list
> > > > > >>>> > > >> > their
> > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs
> is
> > > to
> > > > > >>>> guide
> > > > > >>>> > users
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> implement the
> > > > > >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the
> > > preferred
> > > > > way
> > > > > >>>> to
> > > > > >>>> > > >> > implement a
> > > > > >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning
> > > that.
> > > > > >>>> Even the
> > > > > >>>> > > >> Java
> > > > > >>>> > > >> > > >>>>>>> language
> > > > > >>>> > > >> > > >>>>>>> > > documentation interfaces lists the
> konwn
> > > > > >>>> > implementations
> > > > > >>>> > > >> [1]
> > > > > >>>> > > >> > so
> > > > > >>>> > > >> > > >>>>>>> people
> > > > > >>>> > > >> > > >>>>>>> > can
> > > > > >>>> > > >> > > >>>>>>> > > leverage them. But for this particular
> > > case,
> > > > if
> > > > > >>>> we
> > > > > >>>> > make
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> feature
> > > > > >>>> > > >> > > >>>>>>> > > non-optional, we don't even need the
> > > > > supportXXX()
> > > > > >>>> > method
> > > > > >>>> > > >> for
> > > > > >>>> > > >> > > now.
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > Thanks,
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid
> > > > > Wysakowicz <
> > > > > >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > wrote:
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy
> > with
> > > > > >>>> whichever
> > > > > >>>> > > >> option
> > > > > >>>> > > >> > is
> > > > > >>>> > > >> > > >>>>>>> agreed in
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > > discussion.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from
> > the
> > > > > >>>> discussion
> > > > > >>>> > > >> > though:
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > @Becket:
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > The main argument for adding the
> > methods
> > > to
> > > > > the
> > > > > >>>> > > >> > SourceReader
> > > > > >>>> > > >> > > >>>>>>> is that
> > > > > >>>> > > >> > > >>>>>>> > > these
> > > > > >>>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL
> to
> > > > > >>>> SourceReader
> > > > > >>>> > > >> impl,
> > > > > >>>> > > >> > > i.e.
> > > > > >>>> > > >> > > >>>>>>> > starting
> > > > > >>>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders
> > > impl
> > > > > are
> > > > > >>>> > expected
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> support this
> > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > implementations
> > > > may
> > > > > >>>> not
> > > > > >>>> > have
> > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> distinguish
> > > the
> > > > > new
> > > > > >>>> > features
> > > > > >>>> > > >> > from
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > > > > >>>> interface is a
> > > > > >>>> > > >> > solution
> > > > > >>>> > > >> > > >>>>>>> to the
> > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not use
> it
> > > for
> > > > > the
> > > > > >>>> > features
> > > > > >>>> > > >> > that
> > > > > >>>> > > >> > > >>>>>>> are
> > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > I don't think that this feature is
> > > > > >>>> NON-OPTIONAL.
> > > > > >>>> > Even
> > > > > >>>> > > >> > though
> > > > > >>>> > > >> > > >>>>>>> > preferred, I
> > > > > >>>> > > >> > > >>>>>>> > > > still think it can be simply
> optional.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX()
> method
> > > in
> > > > > the
> > > > > >>>> > Source
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > > be
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > single
> > > > > >>>> > > >> > > >>>>>>> > > > source of truth regarding how to
> > > implement
> > > > > this
> > > > > >>>> > feature.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > I also don't find it entirely true.
> > Half
> > > of
> > > > > the
> > > > > >>>> > classes
> > > > > >>>> > > >> are
> > > > > >>>> > > >> > > >>>>>>> > theoretically
> > > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from
> > the
> > > > > >>>> point of
> > > > > >>>> > view
> > > > > >>>> > > >> how
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do
> > not
> > > > > need
> > > > > >>>> to
> > > > > >>>> > use
> > > > > >>>> > > >> any
> > > > > >>>> > > >> > of
> > > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would
> > be
> > > > > weird
> > > > > >>>> to
> > > > > >>>> > list
> > > > > >>>> > > >> > their
> > > > > >>>> > > >> > > >>>>>>> methods in
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > @Piotr
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > If we have all of the methods with
> > > default
> > > > > >>>> > > >> implementation
> > > > > >>>> > > >> > in
> > > > > >>>> > > >> > > >>>>>>> the base
> > > > > >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any
> > clue
> > > to
> > > > > >>>> the user
> > > > > >>>> > > >> which
> > > > > >>>> > > >> > > set
> > > > > >>>> > > >> > > >>>>>>> of
> > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > >>>> > > >> > > >>>>>>> > > > are required to be implemented at the
> > > same
> > > > > >>>> time.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > I feel that no matter which option we
> > > > choose
> > > > > >>>> this
> > > > > >>>> > can
> > > > > >>>> > > >> not
> > > > > >>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> solved
> > > > > >>>> > > >> > > >>>>>>> > > > entirely in either of the options,
> > > because
> > > > of
> > > > > >>>> the
> > > > > >>>> > point
> > > > > >>>> > > >> > above
> > > > > >>>> > > >> > > >>>>>>> and
> > > > > >>>> > > >> > > >>>>>>> > because
> > > > > >>>> > > >> > > >>>>>>> > > > the signature of
> > > > > >>>> SplitReader#pauseOrResumeSplits and
> > > > > >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are
> > > > slightly
> > > > > >>>> > different
> > > > > >>>> > > >> > (one
> > > > > >>>> > > >> > > >>>>>>> identifies
> > > > > >>>> > > >> > > >>>>>>> > > > splits with splitId the other one
> > passes
> > > > the
> > > > > >>>> splits
> > > > > >>>> > > >> > > directly).
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Best,
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Dawid
> > > > > >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin
> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the
> > > > > >>>> decorative
> > > > > >>>> > > >> > interfaces,
> > > > > >>>> > > >> > > >>>>>>> but I do
> > > > > >>>> > > >> > > >>>>>>> > > > think we should use them with
> caution.
> > > The
> > > > > main
> > > > > >>>> > argument
> > > > > >>>> > > >> > for
> > > > > >>>> > > >> > > >>>>>>> adding the
> > > > > >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that
> > these
> > > > > >>>> methods
> > > > > >>>> > are
> > > > > >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to
> > SourceReader
> > > > > impl,
> > > > > >>>> i.e.
> > > > > >>>> > > >> > starting
> > > > > >>>> > > >> > > >>>>>>> from this
> > > > > >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are
> > > > expected
> > > > > >>>> to
> > > > > >>>> > support
> > > > > >>>> > > >> > this
> > > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> > implementations
> > > > may
> > > > > >>>> not
> > > > > >>>> > have
> > > > > >>>> > > >> > > >>>>>>> implemented this
> > > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should
> distinguish
> > > the
> > > > > new
> > > > > >>>> > features
> > > > > >>>> > > >> > from
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > optional
> > > > > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > > > > >>>> interface is a
> > > > > >>>> > > >> > solution
> > > > > >>>> > > >> > > >>>>>>> to the
> > > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not use
> it
> > > for
> > > > > the
> > > > > >>>> > features
> > > > > >>>> > > >> > that
> > > > > >>>> > > >> > > >>>>>>> are
> > > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > That said, this feature is optional
> for
> > > > > >>>> > SplitReaders.
> > > > > >>>> > > >> > > Arguably
> > > > > >>>> > > >> > > >>>>>>> we can
> > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > >>>> > > >> > > >>>>>>> > > > a decorative interface for that, but
> > for
> > > > > >>>> simplicity
> > > > > >>>> > and
> > > > > >>>> > > >> > > >>>>>>> symmetry of the
> > > > > >>>> > > >> > > >>>>>>> > > > interface, personally I prefer just
> > > adding
> > > > a
> > > > > >>>> new
> > > > > >>>> > method.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Regarding the advantages you
> mentioned
> > > > about
> > > > > >>>> the
> > > > > >>>> > > >> decorative
> > > > > >>>> > > >> > > >>>>>>> interfaces,
> > > > > >>>> > > >> > > >>>>>>> > > > they would make sense if:
> > > > > >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > > > > >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative
> > interface
> > > > > >>>> involved
> > > > > >>>> > for a
> > > > > >>>> > > >> > > >>>>>>> feature.
> > > > > >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the
> > > methods
> > > > > are
> > > > > >>>> > grouped
> > > > > >>>> > > >> > > >>>>>>> together will
> > > > > >>>> > > >> > > >>>>>>> > not
> > > > > >>>> > > >> > > >>>>>>> > > > stand.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Compared with that, I think the
> current
> > > > > >>>> solution
> > > > > >>>> > works
> > > > > >>>> > > >> fine
> > > > > >>>> > > >> > > in
> > > > > >>>> > > >> > > >>>>>>> all
> > > > > >>>> > > >> > > >>>>>>> > cases,
> > > > > >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in
> > > Source,
> > > > > and
> > > > > >>>> > default
> > > > > >>>> > > >> > > >>>>>>> methods /
> > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces in base
> > > interfaces.".
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > The advantages are:
> > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > > > interface
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > In the current approach, the Java doc
> > of
> > > > the
> > > > > >>>> > > >> SupportXXX()
> > > > > >>>> > > >> > > >>>>>>> method in the
> > > > > >>>> > > >> > > >>>>>>> > > > Source would be the single source of
> > > truth
> > > > > >>>> regarding
> > > > > >>>> > > >> how to
> > > > > >>>> > > >> > > >>>>>>> implement
> > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > >>>> > > >> > > >>>>>>> > > > feature. It lists the method that has
> > to
> > > be
> > > > > >>>> > implemented
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> support this
> > > > > >>>> > > >> > > >>>>>>> > > > feature, regardless of how many
> > classes /
> > > > > >>>> > interfaces are
> > > > > >>>> > > >> > > >>>>>>> involved.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > When implementing the base interface,
> > > users
> > > > > do
> > > > > >>>> not
> > > > > >>>> > need
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> implement a
> > > > > >>>> > > >> > > >>>>>>> > > > method with default implementation.
> If
> > > they
> > > > > are
> > > > > >>>> > curious
> > > > > >>>> > > >> > what
> > > > > >>>> > > >> > > >>>>>>> the method
> > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > >>>> > > >> > > >>>>>>> > > > for, the java doc of that method
> simply
> > > > > points
> > > > > >>>> > users to
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > SupportXXX()
> > > > > >>>> > > >> > > >>>>>>> > > > method in the Source. It seems not
> > adding
> > > > > work
> > > > > >>>> to
> > > > > >>>> > the
> > > > > >>>> > > >> users
> > > > > >>>> > > >> > > >>>>>>> compared
> > > > > >>>> > > >> > > >>>>>>> > with
> > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much
> > > > better
> > > > > >>>> > > >> > discoverability.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > - all of the methods from a single
> > > feature
> > > > > are
> > > > > >>>> > grouped
> > > > > >>>> > > >> in a
> > > > > >>>> > > >> > > >>>>>>> single
> > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with
> > their
> > > > > >>>> dedicated
> > > > > >>>> > java
> > > > > >>>> > > >> > doc.
> > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> using
> > > the
> > > > > >>>> > decorator
> > > > > >>>> > > >> name
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > - if an optional feature requires two
> > > > methods
> > > > > >>>> to be
> > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > These two points are not true when
> > > multiple
> > > > > >>>> > components
> > > > > >>>> > > >> and
> > > > > >>>> > > >> > > >>>>>>> classes are
> > > > > >>>> > > >> > > >>>>>>> > > > involved collaboratively to provide a
> > > > > feature.
> > > > > >>>> In
> > > > > >>>> > our
> > > > > >>>> > > >> case,
> > > > > >>>> > > >> > > we
> > > > > >>>> > > >> > > >>>>>>> have
> > > > > >>>> > > >> > > >>>>>>> > both
> > > > > >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader
> involved.
> > > And
> > > > > >>>> there
> > > > > >>>> > might
> > > > > >>>> > > >> be
> > > > > >>>> > > >> > > other
> > > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > > >>>> > > >> > > >>>>>>> > > > on the JM side involved for some
> future
> > > > > >>>> features.
> > > > > >>>> > So the
> > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > >>>> > > >> > > >>>>>>> > > > can actually be scattered over the
> > > places.
> > > > > That
> > > > > >>>> > said, we
> > > > > >>>> > > >> > may
> > > > > >>>> > > >> > > >>>>>>> still use
> > > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces for each
> > component,
> > > > if
> > > > > >>>> the
> > > > > >>>> > > >> feature is
> > > > > >>>> > > >> > > >>>>>>> optional,
> > > > > >>>> > > >> > > >>>>>>> > > given
> > > > > >>>> > > >> > > >>>>>>> > > > there is a single source of truth for
> > the
> > > > > >>>> feature.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards
> > making
> > > > > life
> > > > > >>>> > easier
> > > > > >>>> > > >> for
> > > > > >>>> > > >> > new
> > > > > >>>> > > >> > > >>>>>>> users,
> > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the
> > (imo)
> > > > > >>>> slight
> > > > > >>>> > expense
> > > > > >>>> > > >> > for
> > > > > >>>> > > >> > > >>>>>>> the power
> > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > I actually think the current approach
> > is
> > > > > >>>> simpler,
> > > > > >>>> > more
> > > > > >>>> > > >> > > >>>>>>> extensible and
> > > > > >>>> > > >> > > >>>>>>> > > more
> > > > > >>>> > > >> > > >>>>>>> > > > general for all the users. Can you
> > > > articulate
> > > > > >>>> a bit
> > > > > >>>> > > >> more on
> > > > > >>>> > > >> > > >>>>>>> which part
> > > > > >>>> > > >> > > >>>>>>> > > you
> > > > > >>>> > > >> > > >>>>>>> > > > think makes users harder to
> understand?
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > There is another benefit of the
> > > decorative
> > > > > >>>> > interfaces
> > > > > >>>> > > >> which
> > > > > >>>> > > >> > > is
> > > > > >>>> > > >> > > >>>>>>> not
> > > > > >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth
> > considering
> > > > > here.
> > > > > >>>> > Usually
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > >>>> > > >> > > >>>>>>> > > > interfaces give slightly better
> > backwards
> > > > > >>>> > compatibility
> > > > > >>>> > > >> > than
> > > > > >>>> > > >> > > >>>>>>> the new
> > > > > >>>> > > >> > > >>>>>>> > > > default method in the interfaces.
> That
> > is
> > > > > when
> > > > > >>>> > users are
> > > > > >>>> > > >> > > using
> > > > > >>>> > > >> > > >>>>>>> a jar
> > > > > >>>> > > >> > > >>>>>>> > that
> > > > > >>>> > > >> > > >>>>>>> > > > was compiled with an older version of
> > > Flink
> > > > > >>>> which
> > > > > >>>> > does
> > > > > >>>> > > >> not
> > > > > >>>> > > >> > > >>>>>>> have the
> > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > >>>> > > >> > > >>>>>>> > > > method in the interfaces in
> question. A
> > > > > >>>> decorative
> > > > > >>>> > > >> > interface
> > > > > >>>> > > >> > > >>>>>>> may still
> > > > > >>>> > > >> > > >>>>>>> > > > provide backwards compatibility in
> that
> > > > case,
> > > > > >>>> while
> > > > > >>>> > > >> default
> > > > > >>>> > > >> > > >>>>>>> method impl
> > > > > >>>> > > >> > > >>>>>>> > > > cannot.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do not
> > > > > guarantee
> > > > > >>>> > custom
> > > > > >>>> > > >> > > >>>>>>> components
> > > > > >>>> > > >> > > >>>>>>> > > compiled
> > > > > >>>> > > >> > > >>>>>>> > > > with an older version can run with a
> > > newer
> > > > > >>>> version
> > > > > >>>> > of
> > > > > >>>> > > >> > Flink.
> > > > > >>>> > > >> > > A
> > > > > >>>> > > >> > > >>>>>>> > recompile
> > > > > >>>> > > >> > > >>>>>>> > > > with a newer version would be
> required.
> > > > That
> > > > > >>>> said,
> > > > > >>>> > if
> > > > > >>>> > > >> we do
> > > > > >>>> > > >> > > >>>>>>> care about
> > > > > >>>> > > >> > > >>>>>>> > > > this, we can just change the
> > > "supportXXX()"
> > > > > >>>> method
> > > > > >>>> > in
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> Source
> > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and
> leave
> > > the
> > > > > >>>> other
> > > > > >>>> > parts
> > > > > >>>> > > >> > > >>>>>>> unchanged.
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Thanks,
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr
> > > > > Nowojski <
> > > > > >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > > wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > >> Hi,
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I
> > was
> > > > OoO
> > > > > >>>> for
> > > > > >>>> > the
> > > > > >>>> > > >> last
> > > > > >>>> > > >> > > two
> > > > > >>>> > > >> > > >>>>>>> weeks.
> > > > > >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be
> > > > > actually
> > > > > >>>> > against
> > > > > >>>> > > >> > adding
> > > > > >>>> > > >> > > >>>>>>> those
> > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the
> reasons
> > > > > >>>> mentioned
> > > > > >>>> > above.
> > > > > >>>> > > >> > > >>>>>>> Clogging the
> > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > >>>> > > >> > > >>>>>>> > > >> interface for new users with tons of
> > > > methods
> > > > > >>>> that
> > > > > >>>> > they
> > > > > >>>> > > >> do
> > > > > >>>> > > >> > > not
> > > > > >>>> > > >> > > >>>>>>> need, do
> > > > > >>>> > > >> > > >>>>>>> > > not
> > > > > >>>> > > >> > > >>>>>>> > > >> understand and do not know what to
> do
> > > with
> > > > > >>>> them.
> > > > > >>>> > > >> Moreover,
> > > > > >>>> > > >> > > >>>>>>> such
> > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if
> a
> > > > > feature
> > > > > >>>> > requires
> > > > > >>>> > > >> two
> > > > > >>>> > > >> > > or
> > > > > >>>> > > >> > > >>>>>>> more
> > > > > >>>> > > >> > > >>>>>>> > > methods
> > > > > >>>> > > >> > > >>>>>>> > > >> to be implemented at the same time.
> If
> > > we
> > > > > >>>> have all
> > > > > >>>> > of
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> methods with
> > > > > >>>> > > >> > > >>>>>>> > > >> default implementation in the base
> > > > > interface,
> > > > > >>>> the
> > > > > >>>> > API
> > > > > >>>> > > >> > > doesn't
> > > > > >>>> > > >> > > >>>>>>> give any
> > > > > >>>> > > >> > > >>>>>>> > > >> clue
> > > > > >>>> > > >> > > >>>>>>> > > >> to the user which set of methods are
> > > > > required
> > > > > >>>> to be
> > > > > >>>> > > >> > > >>>>>>> implemented at the
> > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > >>>> > > >> > > >>>>>>> > > >> time.
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of
> > > > > decorative
> > > > > >>>> > > >> interfaces
> > > > > >>>> > > >> > is
> > > > > >>>> > > >> > > >>>>>>> which
> > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > >>>> > > >> > > >>>>>>> > > >> > they can decorate and which
> > > combinations
> > > > > of
> > > > > >>>> > multiple
> > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> > > > > >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new
> > > feature
> > > > > >>>> added
> > > > > >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned
> data
> > > > > >>>> aware), are
> > > > > >>>> > we
> > > > > >>>> > > >> > going
> > > > > >>>> > > >> > > >>>>>>> to create
> > > > > >>>> > > >> > > >>>>>>> > > >> another
> > > > > >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
> > > > > >>>> > SortedSplitReader or
> > > > > >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > > > > >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think
> the
> > > > > >>>> additional
> > > > > >>>> > > >> > decorative
> > > > > >>>> > > >> > > >>>>>>> interface
> > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually
> > increases
> > > > the
> > > > > >>>> > > >> understanding
> > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > >>>> > > >> > > >>>>>>> > users
> > > > > >>>> > > >> > > >>>>>>> > > >> > because they have to know what
> > > > decorative
> > > > > >>>> > interfaces
> > > > > >>>> > > >> are
> > > > > >>>> > > >> > > >>>>>>> there,
> > > > > >>>> > > >> > > >>>>>>> > which
> > > > > >>>> > > >> > > >>>>>>> > > >> > interface they can decorate and
> > which
> > > > > >>>> > combinations of
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are
> > > not.
> > > > > >>>> Ideally
> > > > > >>>> > we
> > > > > >>>> > > >> want
> > > > > >>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> avoid
> > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how
> > > embedding
> > > > > >>>> default
> > > > > >>>> > > >> methods
> > > > > >>>> > > >> > > in
> > > > > >>>> > > >> > > >>>>>>> the base
> > > > > >>>> > > >> > > >>>>>>> > > >> interface is solving the problem:
> what
> > > can
> > > > > be
> > > > > >>>> > combined
> > > > > >>>> > > >> or
> > > > > >>>> > > >> > > >>>>>>> not? If
> > > > > >>>> > > >> > > >>>>>>> > there
> > > > > >>>> > > >> > > >>>>>>> > > >> are
> > > > > >>>> > > >> > > >>>>>>> > > >> two conflicting features, having
> > > > decorative
> > > > > >>>> > interfaces
> > > > > >>>> > > >> > that
> > > > > >>>> > > >> > > >>>>>>> can not be
> > > > > >>>> > > >> > > >>>>>>> > > >> mixed together actually makes much
> > more
> > > > > sense
> > > > > >>>> to me
> > > > > >>>> > > >> rather
> > > > > >>>> > > >> > > >>>>>>> than having
> > > > > >>>> > > >> > > >>>>>>> > > >> them
> > > > > >>>> > > >> > > >>>>>>> > > >> all in one base class. How would you
> > > allow
> > > > > >>>> users to
> > > > > >>>> > > >> > > implement
> > > > > >>>> > > >> > > >>>>>>> only one
> > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > >>>> > > >> > > >>>>>>> > > >> those two features?
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes,
> there
> > > are
> > > > > >>>> > drawbacks:
> > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user discover what
> > > decorators
> > > > > are
> > > > > >>>> > there?
> > > > > >>>> > > >> > > >>>>>>> > > >> - how can a user know where the
> > > decorator
> > > > > can
> > > > > >>>> be
> > > > > >>>> > > >> applied
> > > > > >>>> > > >> > to?
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for more
> > > power
> > > > > >>>> users,
> > > > > >>>> > that
> > > > > >>>> > > >> can
> > > > > >>>> > > >> > > be
> > > > > >>>> > > >> > > >>>>>>> > mitigated
> > > > > >>>> > > >> > > >>>>>>> > > by
> > > > > >>>> > > >> > > >>>>>>> > > >> the documentation. For example
> listing
> > > all
> > > > > of
> > > > > >>>> the
> > > > > >>>> > > >> > decorators
> > > > > >>>> > > >> > > >>>>>>> with
> > > > > >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the
> docs
> > > and
> > > > in
> > > > > >>>> the
> > > > > >>>> > java
> > > > > >>>> > > >> > docs.
> > > > > >>>> > > >> > > >>>>>>> More
> > > > > >>>> > > >> > > >>>>>>> > > >> experienced users will be able to
> deal
> > > > with
> > > > > >>>> those
> > > > > >>>> > > >> issues
> > > > > >>>> > > >> > > >>>>>>> easier, as
> > > > > >>>> > > >> > > >>>>>>> > they
> > > > > >>>> > > >> > > >>>>>>> > > >> will already have some basic
> > > understanding
> > > > > of
> > > > > >>>> > Flink.
> > > > > >>>> > > >> Also
> > > > > >>>> > > >> > if
> > > > > >>>> > > >> > > >>>>>>> user has
> > > > > >>>> > > >> > > >>>>>>> > a
> > > > > >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he
> > will
> > > > > google
> > > > > >>>> > search a
> > > > > >>>> > > >> > > >>>>>>> potential
> > > > > >>>> > > >> > > >>>>>>> > > solution
> > > > > >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while
> doing
> > > > that
> > > > > >>>> he is
> > > > > >>>> > very
> > > > > >>>> > > >> > > likely
> > > > > >>>> > > >> > > >>>>>>> to
> > > > > >>>> > > >> > > >>>>>>> > > discover
> > > > > >>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway
> in
> > > the
> > > > > >>>> docs.
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> The advantages are:
> > > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > > > interface
> > > > > >>>> > > >> > > >>>>>>> > > >> - all of the methods from a single
> > > feature
> > > > > are
> > > > > >>>> > grouped
> > > > > >>>> > > >> in
> > > > > >>>> > > >> > a
> > > > > >>>> > > >> > > >>>>>>> single
> > > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with
> > their
> > > > > >>>> dedicated
> > > > > >>>> > java
> > > > > >>>> > > >> > doc.
> > > > > >>>> > > >> > > >>>>>>> It's also
> > > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help
> using
> > > the
> > > > > >>>> > decorator
> > > > > >>>> > > >> name
> > > > > >>>> > > >> > > >>>>>>> > > >> - if an optional feature requires
> two
> > > > > methods
> > > > > >>>> to be
> > > > > >>>> > > >> > > >>>>>>> implemented at
> > > > > >>>> > > >> > > >>>>>>> > once,
> > > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards
> > > making
> > > > > life
> > > > > >>>> > easier
> > > > > >>>> > > >> for
> > > > > >>>> > > >> > > new
> > > > > >>>> > > >> > > >>>>>>> users,
> > > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the
> > (imo)
> > > > > >>>> slight
> > > > > >>>> > expense
> > > > > >>>> > > >> > for
> > > > > >>>> > > >> > > >>>>>>> the power
> > > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> Piotrek
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin
> <
> > > > > >>>> > > >> becket.qin@gmail.com
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > napisał(a):
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and
> > > > Dawid,
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good
> > summary.
> > > > This
> > > > > >>>> is a
> > > > > >>>> > > >> really
> > > > > >>>> > > >> > > >>>>>>> helpful
> > > > > >>>> > > >> > > >>>>>>> > > >> discussion.
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that
> it
> > > > might
> > > > > >>>> still
> > > > > >>>> > be
> > > > > >>>> > > >> > better
> > > > > >>>> > > >> > > >>>>>>> to add the
> > > > > >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source
> > > > rather
> > > > > >>>> than
> > > > > >>>> > > >> > > SourceReader.
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are
> > trying
> > > > to
> > > > > do
> > > > > >>>> > here is
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > let
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > Flink
> > > > > >>>> > > >> > > >>>>>>> > > >> > framework know what the Source is
> > > > capable
> > > > > >>>> of. In
> > > > > >>>> > this
> > > > > >>>> > > >> > > FLIP,
> > > > > >>>> > > >> > > >>>>>>> it
> > > > > >>>> > > >> > > >>>>>>> > happens
> > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> > > >> > be the capability that only
> involves
> > > > > >>>> > SourceReader.
> > > > > >>>> > > >> But
> > > > > >>>> > > >> > in
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > future,
> > > > > >>>> > > >> > > >>>>>>> > > >> it is
> > > > > >>>> > > >> > > >>>>>>> > > >> > possible that another
> functionality
> > > > > involves
> > > > > >>>> > both the
> > > > > >>>> > > >> > > >>>>>>> > SplitEnumerator
> > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case,
> > following
> > > > the
> > > > > >>>> current
> > > > > >>>> > > >> > > approach,
> > > > > >>>> > > >> > > >>>>>>> we
> > > > > >>>> > > >> > > >>>>>>> > should
> > > > > >>>> > > >> > > >>>>>>> > > >> put
> > > > > >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
> > > > > >>>> > SplitEnumerator
> > > > > >>>> > > >> and
> > > > > >>>> > > >> > > >>>>>>> SourceReader.
> > > > > >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
> > > > > >>>> SourceReader,
> > > > > >>>> > then
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> JM would
> > > > > >>>> > > >> > > >>>>>>> > > have
> > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to
> > know
> > > > > >>>> whether
> > > > > >>>> > this
> > > > > >>>> > > >> > > feature
> > > > > >>>> > > >> > > >>>>>>> is
> > > > > >>>> > > >> > > >>>>>>> > > >> supported,
> > > > > >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we
> > put
> > > > the
> > > > > >>>> > > >> > "supportsXXX()"
> > > > > >>>> > > >> > > >>>>>>> method in
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > Source, we will break the
> > "symmetric"
> > > > > design
> > > > > >>>> > because
> > > > > >>>> > > >> > this
> > > > > >>>> > > >> > > >>>>>>> FLIP
> > > > > >>>> > > >> > > >>>>>>> > chose a
> > > > > >>>> > > >> > > >>>>>>> > > >> > different way.
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > This is also why I think
> > supportsXXX()
> > > > > >>>> method
> > > > > >>>> > seems a
> > > > > >>>> > > >> > good
> > > > > >>>> > > >> > > >>>>>>> thing to
> > > > > >>>> > > >> > > >>>>>>> > > >> have,
> > > > > >>>> > > >> > > >>>>>>> > > >> > because when there are a few
> > > interfaces
> > > > /
> > > > > >>>> methods
> > > > > >>>> > > >> that
> > > > > >>>> > > >> > are
> > > > > >>>> > > >> > > >>>>>>> expected
> > > > > >>>> > > >> > > >>>>>>> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> be
> > > > > >>>> > > >> > > >>>>>>> > > >> > implemented at the same time in
> > order
> > > to
> > > > > >>>> deliver
> > > > > >>>> > a
> > > > > >>>> > > >> > > feature,
> > > > > >>>> > > >> > > >>>>>>> it is
> > > > > >>>> > > >> > > >>>>>>> > > always
> > > > > >>>> > > >> > > >>>>>>> > > >> > good to have a single source of
> > truth
> > > to
> > > > > >>>> tell the
> > > > > >>>> > > >> > > framework
> > > > > >>>> > > >> > > >>>>>>> what to
> > > > > >>>> > > >> > > >>>>>>> > > do,
> > > > > >>>> > > >> > > >>>>>>> > > >> so
> > > > > >>>> > > >> > > >>>>>>> > > >> > the framework can do consistent
> > things
> > > > in
> > > > > >>>> > different
> > > > > >>>> > > >> > parts.
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> > > > > >>>> sebastian@ververica.com>
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b),
> i.e.
> > > > > >>>> > > >> AlignedSourceReader
> > > > > >>>> > > >> > +
> > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel
> > > awkward
> > > > > >>>> about is
> > > > > >>>> > > >> that we
> > > > > >>>> > > >> > > are
> > > > > >>>> > > >> > > >>>>>>> > > essentially
> > > > > >>>> > > >> > > >>>>>>> > > >> > expecting almost all the
> > SourceReader
> > > > > >>>> > > >> implementations to
> > > > > >>>> > > >> > > >>>>>>> extend
> > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which
> effectively
> > > > makes
> > > > > >>>> the
> > > > > >>>> > > >> > SourceReader
> > > > > >>>> > > >> > > >>>>>>> interface
> > > > > >>>> > > >> > > >>>>>>> > > >> > without the pausing support
> useless.
> > > So
> > > > > this
> > > > > >>>> > > >> indicates
> > > > > >>>> > > >> > > that
> > > > > >>>> > > >> > > >>>>>>> public
> > > > > >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or
> > > sub-interfaces
> > > > > >>>> for the
> > > > > >>>> > same
> > > > > >>>> > > >> > > >>>>>>> purpose) only
> > > > > >>>> > > >> > > >>>>>>> > > >> > make sense if the original
> interface
> > > is
> > > > > also
> > > > > >>>> > > >> expected to
> > > > > >>>> > > >> > > be
> > > > > >>>> > > >> > > >>>>>>> used.
> > > > > >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more
> sense
> > > to
> > > > > add
> > > > > >>>> the
> > > > > >>>> > > >> method
> > > > > >>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > original
> > > > > >>>> > > >> > > >>>>>>> > > >> > interface itself.
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > Cheers,
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM
> > Dawid
> > > > > >>>> Wysakowicz
> > > > > >>>> > <
> > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice
> > > > summary.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > I think most of your points
> > aligned
> > > > with
> > > > > >>>> the
> > > > > >>>> > > >> > suggestions
> > > > > >>>> > > >> > > >>>>>>> I made to
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing
> your
> > > > reply
> > > > > (I
> > > > > >>>> > believe
> > > > > >>>> > > >> we
> > > > > >>>> > > >> > > hit
> > > > > >>>> > > >> > > >>>>>>> enter
> > > > > >>>> > > >> > > >>>>>>> > > >> nearly at
> > > > > >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced
> offline
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > > > > >>>> > > >> supportsWatermarksSplitAlignment
> > > > > >>>> > > >> > > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express
> > the
> > > > > >>>> general
> > > > > >>>> > > >> > capability
> > > > > >>>> > > >> > > of
> > > > > >>>> > > >> > > >>>>>>> > pausing.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > > > > >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > > > >>>> > > >> > > >>>>>>> > > (option
> > > > > >>>> > > >> > > >>>>>>> > > >> b)
> > > > > >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the
> > methods
> > > > > >>>> (option
> > > > > >>>> > c), I
> > > > > >>>> > > >> > > >>>>>>> suggest to
> > > > > >>>> > > >> > > >>>>>>> > simply
> > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this
> is
> > > much
> > > > > >>>> > preferred
> > > > > >>>> > > >> > > approach
> > > > > >>>> > > >> > > >>>>>>> Becket,
> > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless
> there
> > > is
> > > > an
> > > > > >>>> > opposition
> > > > > >>>> > > >> > > let's
> > > > > >>>> > > >> > > >>>>>>> go with
> > > > > >>>> > > >> > > >>>>>>> > > this
> > > > > >>>> > > >> > > >>>>>>> > > >> > > option c.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Dawid
> > > > > >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian
> > > > Mattheis
> > > > > >>>> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent.
> Many
> > > > > thanks
> > > > > >>>> for
> > > > > >>>> > all
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> input and
> > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I
> > > > needed
> > > > > >>>> some
> > > > > >>>> > time
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> catch up and
> > > > > >>>> > > >> > > >>>>>>> > > >> > structure
> > > > > >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and
> I
> > > > wanted
> > > > > >>>> to
> > > > > >>>> > find a
> > > > > >>>> > > >> way
> > > > > >>>> > > >> > > to
> > > > > >>>> > > >> > > >>>>>>> > structure
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had
> > the
> > > > > >>>> feeling
> > > > > >>>> > that
> > > > > >>>> > > >> some
> > > > > >>>> > > >> > > >>>>>>> concerns
> > > > > >>>> > > >> > > >>>>>>> > got
> > > > > >>>> > > >> > > >>>>>>> > > >> lost
> > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my
> > attempt
> > > > and
> > > > > >>>> please
> > > > > >>>> > > >> correct
> > > > > >>>> > > >> > > me
> > > > > >>>> > > >> > > >>>>>>> if
> > > > > >>>> > > >> > > >>>>>>> > > >> something is
> > > > > >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried
> to
> > > > > >>>> collect and
> > > > > >>>> > > >> > assemble
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > > opinions,
> > > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to
> > the
> > > > > best
> > > > > >>>> of my
> > > > > >>>> > > >> > > >>>>>>> knowledge):
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
> > > > > >>>> (pause/resume
> > > > > >>>> > > >> > behavior)
> > > > > >>>> > > >> > > >>>>>>> be a
> > > > > >>>> > > >> > > >>>>>>> > general
> > > > > >>>> > > >> > > >>>>>>> > > >> > > capability?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any
> reason
> > no
> > > > to
> > > > > >>>> have
> > > > > >>>> > it a
> > > > > >>>> > > >> > > general
> > > > > >>>> > > >> > > >>>>>>> > capability
> > > > > >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit
> method
> > it
> > > > is
> > > > > >>>> > actually
> > > > > >>>> > > >> > > >>>>>>> independent of
> > > > > >>>> > > >> > > >>>>>>> > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to
> > have
> > > > it
> > > > > a
> > > > > >>>> > general
> > > > > >>>> > > >> > > >>>>>>> capability, we
> > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > >>>> > > >> > > >>>>>>> > > >> > > also agree on the right wording.
> > > Does
> > > > > >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > > > > >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is
> > > then
> > > > > >>>> actually
> > > > > >>>> > > >> meant?
> > > > > >>>> > > >> > (I
> > > > > >>>> > > >> > > >>>>>>> see it as
> > > > > >>>> > > >> > > >>>>>>> > > >> okay.
> > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea
> whilst
> > > > Arvid
> > > > > >>>> > suggested
> > > > > >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible
> do
> > > > > >>>> > enable/disable
> > > > > >>>> > > >> split
> > > > > >>>> > > >> > > >>>>>>> alignment?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > I would personally not disable
> the
> > > > split
> > > > > >>>> > alignment
> > > > > >>>> > > >> on
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >>>>>>> source
> > > > > >>>> > > >> > > >>>>>>> > > >> reader
> > > > > >>>> > > >> > > >>>>>>> > > >> > > side because if split alignment
> is
> > > > used
> > > > > >>>> for
> > > > > >>>> > some
> > > > > >>>> > > >> other
> > > > > >>>> > > >> > > >>>>>>> use case
> > > > > >>>> > > >> > > >>>>>>> > (see
> > > > > >>>> > > >> > > >>>>>>> > > >> A)
> > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > >>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
> > > > > >>>> other/future
> > > > > >>>> > use
> > > > > >>>> > > >> > cases.
> > > > > >>>> > > >> > > >>>>>>> Instead,
> > > > > >>>> > > >> > > >>>>>>> > I
> > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split
> > alignment"
> > > > > where
> > > > > >>>> I
> > > > > >>>> > think
> > > > > >>>> > > >> it
> > > > > >>>> > > >> > > >>>>>>> should
> > > > > >>>> > > >> > > >>>>>>> > disable
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for
> > > split
> > > > > >>>> > alignment.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a
> supportsX
> > > > > method?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define
> the
> > > > scope
> > > > > >>>> of a
> > > > > >>>> > > >> supportsX
> > > > > >>>> > > >> > > >>>>>>> method
> > > > > >>>> > > >> > > >>>>>>> > w.r.t.
> > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> > > >> > > the following questions: a)
> Where
> > is
> > > > it
> > > > > >>>> used?
> > > > > >>>> > and
> > > > > >>>> > > >> b)
> > > > > >>>> > > >> > > What
> > > > > >>>> > > >> > > >>>>>>> is the
> > > > > >>>> > > >> > > >>>>>>> > > >> expected
> > > > > >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not
> > > > straight-forward
> > > > > >>>> to
> > > > > >>>> > > >> provide a
> > > > > >>>> > > >> > > >>>>>>> meaningful
> > > > > >>>> > > >> > > >>>>>>> > > >> output,
> > > > > >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports
> > split
> > > > > >>>> alignment
> > > > > >>>> > but
> > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > >>>> > > >> > > >>>>>>> > not.
> > > > > >>>> > > >> > > >>>>>>> > > >> This
> > > > > >>>> > > >> > > >>>>>>> > > >> > is
> > > > > >>>> > > >> > > >>>>>>> > > >> > > because with the current
> > > > implementation,
> > > > > >>>> we can
> > > > > >>>> > > >> > > determine
> > > > > >>>> > > >> > > >>>>>>> whether
> > > > > >>>> > > >> > > >>>>>>> > > >> split
> > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported
> only
> > > > during
> > > > > >>>> > runtime
> > > > > >>>> > > >> and
> > > > > >>>> > > >> > > >>>>>>> specifically
> > > > > >>>> > > >> > > >>>>>>> > > >> > actually
> > > > > >>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits
> down
> > > the
> > > > > >>>> call
> > > > > >>>> > > >> hierarchy
> > > > > >>>> > > >> > up
> > > > > >>>> > > >> > > >>>>>>> to the
> > > > > >>>> > > >> > > >>>>>>> > > actual
> > > > > >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to
> > either
> > > > > >>>> raise an
> > > > > >>>> > > >> error or
> > > > > >>>> > > >> > > >>>>>>> warning if
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not
> > > supported
> > > > at
> > > > > >>>> some
> > > > > >>>> > > >> point. I
> > > > > >>>> > > >> > > >>>>>>> know we
> > > > > >>>> > > >> > > >>>>>>> > > should
> > > > > >>>> > > >> > > >>>>>>> > > >> > > carefully think about when this
> > > could
> > > > be
> > > > > >>>> the
> > > > > >>>> > case
> > > > > >>>> > > >> > > because
> > > > > >>>> > > >> > > >>>>>>> we don't
> > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such
> warnings.
> > > > > However,
> > > > > >>>> > warnings
> > > > > >>>> > > >> > > could
> > > > > >>>> > > >> > > >>>>>>> be an
> > > > > >>>> > > >> > > >>>>>>> > > >> indicator
> > > > > >>>> > > >> > > >>>>>>> > > >> > > for the user that for watermark
> > > split
> > > > > >>>> > alignment use
> > > > > >>>> > > >> > case
> > > > > >>>> > > >> > > >>>>>>> split
> > > > > >>>> > > >> > > >>>>>>> > > >> reading is
> > > > > >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion
> to
> > > > either
> > > > > >>>> > disable
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> trigger for
> > > > > >>>> > > >> > > >>>>>>> > > >> > watermark
> > > > > >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or
> to
> > > > > >>>> > use/implement a
> > > > > >>>> > > >> > source
> > > > > >>>> > > >> > > >>>>>>> and
> > > > > >>>> > > >> > > >>>>>>> > reader
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design
> interfaces?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the
> > > discussion
> > > > > >>>> with the
> > > > > >>>> > the
> > > > > >>>> > > >> > > various
> > > > > >>>> > > >> > > >>>>>>> > > >> possibilities
> > > > > >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and
> > > > emails, I
> > > > > >>>> would
> > > > > >>>> > > >> like to
> > > > > >>>> > > >> > > >>>>>>> summarize
> > > > > >>>> > > >> > > >>>>>>> > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> > > > > >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be
> consistent
> > > > > >>>> > ("symmetric"),
> > > > > >>>> > > >> i.e.,
> > > > > >>>> > > >> > > >>>>>>> similar
> > > > > >>>> > > >> > > >>>>>>> > > >> semantics
> > > > > >>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces
> > with
> > > > > >>>> similar
> > > > > >>>> > usage.
> > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which
> > > implementations
> > > > > >>>> implement
> > > > > >>>> > > >> > > >>>>>>> interfaces/support
> > > > > >>>> > > >> > > >>>>>>> > > >> > > behavior.
> > > > > >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
> > > > > >>>> implementations
> > > > > >>>> > and
> > > > > >>>> > > >> how
> > > > > >>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> implement
> > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired
> behavior.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the
> > > > > relations
> > > > > >>>> > between
> > > > > >>>> > > >> > > >>>>>>> relevant
> > > > > >>>> > > >> > > >>>>>>> > classes
> > > > > >>>> > > >> > > >>>>>>> > > of
> > > > > >>>> > > >> > > >>>>>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> > > > > >>>> SourceReaderBase
> > > > > >>>> > > >> (Internal)
> > > > > >>>> > > >> > > >>>>>>> <|-- ..
> > > > > >>>> > > >> > > >>>>>>> > <|--
> > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>--
> > > > SplitFetcherManager
> > > > > >>>> > (Internal)
> > > > > >>>> > > >> > <>--
> > > > > >>>> > > >> > > >>>>>>> > SplitFetcher
> > > > > >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader
> > (Public)
> > > > > <|--
> > > > > >>>> > > >> > MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A
> > <>--
> > > > B:
> > > > > A
> > > > > >>>> "has
> > > > > >>>> > a"
> > > > > >>>> > > >> B)
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> > > > > >>>> > SplitFetcherManager
> > > > > >>>> > > >> > > >>>>>>> implement most
> > > > > >>>> > > >> > > >>>>>>> > of
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just
> > > > because
> > > > > >>>> we
> > > > > >>>> > wanted
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> implement
> > > > > >>>> > > >> > > >>>>>>> > > split
> > > > > >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be
> > > > available
> > > > > as
> > > > > >>>> > kind of
> > > > > >>>> > > >> a
> > > > > >>>> > > >> > > >>>>>>> default. As a
> > > > > >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
> > > > > >>>> > implementation" for
> > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic for
> > > split
> > > > > >>>> > alignment.
> > > > > >>>> > > >> For
> > > > > >>>> > > >> > > that
> > > > > >>>> > > >> > > >>>>>>> reason,
> > > > > >>>> > > >> > > >>>>>>> > I
> > > > > >>>> > > >> > > >>>>>>> > > >> find
> > > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > > >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP
> > > default
> > > > > >>>> > > >> implementation
> > > > > >>>> > > >> > in
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > >>>> > > >> > > >>>>>>> > > >> for
> > > > > >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a
> > consequence,
> > > > > >>>> interface
> > > > > >>>> > > >> strategy
> > > > > >>>> > > >> > > c)
> > > > > >>>> > > >> > > >>>>>>> is
> > > > > >>>> > > >> > > >>>>>>> > > difficult
> > > > > >>>> > > >> > > >>>>>>> > > >> > > because this would require NOOP
> > > > default
> > > > > >>>> > > >> > implementations
> > > > > >>>> > > >> > > >>>>>>> in the
> > > > > >>>> > > >> > > >>>>>>> > > public
> > > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and
> > > > > >>>> SplitReader.
> > > > > >>>> > This is
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> same for
> > > > > >>>> > > >> > > >>>>>>> > > >> strategy
> > > > > >>>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP
> > > > default
> > > > > >>>> > > >> > implementation
> > > > > >>>> > > >> > > >>>>>>> in the
> > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid
> > > > > described
> > > > > >>>> > method
> > > > > >>>> > > >> > > >>>>>>> signatures of
> > > > > >>>> > > >> > > >>>>>>> > > >> alignSplit
> > > > > >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader
> > > > differ
> > > > > >>>> and it
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> extremely
> > > > > >>>> > > >> > > >>>>>>> > > >> > difficult
> > > > > >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the same
> > > (with
> > > > > even
> > > > > >>>> > > >> potential
> > > > > >>>> > > >> > > >>>>>>> performance
> > > > > >>>> > > >> > > >>>>>>> > > >> impact
> > > > > >>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups
> of
> > > > split
> > > > > >>>> ids).
> > > > > >>>> > > >> > Therefore,
> > > > > >>>> > > >> > > >>>>>>> having a
> > > > > >>>> > > >> > > >>>>>>> > > >> > symmetric
> > > > > >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of
> > strategy
> > > a)
> > > > > is
> > > > > >>>> > actually
> > > > > >>>> > > >> not
> > > > > >>>> > > >> > > >>>>>>> possible
> > > > > >>>> > > >> > > >>>>>>> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> > having
> > > > > >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with
> > > > different
> > > > > >>>> method
> > > > > >>>> > > >> > > >>>>>>> signatures is
> > > > > >>>> > > >> > > >>>>>>> > > >> confusing.
> > > > > >>>> > > >> > > >>>>>>> > > >> > My
> > > > > >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best
> > with
> > > > > >>>> strategy b)
> > > > > >>>> > > >> which
> > > > > >>>> > > >> > > >>>>>>> means to
> > > > > >>>> > > >> > > >>>>>>> > have
> > > > > >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that
> > > > inherit
> > > > > >>>> from
> > > > > >>>> > the
> > > > > >>>> > > >> > parent
> > > > > >>>> > > >> > > >>>>>>> > interface:
> > > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|--
> > > AlignedSourceReader,
> > > > > >>>> > SplitReader
> > > > > >>>> > > >> > <|--
> > > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100%
> > sure
> > > > what
> > > > > >>>> the
> > > > > >>>> > > >> > > implications
> > > > > >>>> > > >> > > >>>>>>> are and
> > > > > >>>> > > >> > > >>>>>>> > if
> > > > > >>>> > > >> > > >>>>>>> > > >> this
> > > > > >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest
> > > that
> > > > > >>>> Dawid
> > > > > >>>> > and I
> > > > > >>>> > > >> just
> > > > > >>>> > > >> > > >>>>>>> try to
> > > > > >>>> > > >> > > >>>>>>> > > >> implement
> > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > # Summary
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me
> know
> > > your
> > > > > >>>> > > >> perspectives.
> > > > > >>>> > > >> > > >>>>>>> Please
> > > > > >>>> > > >> > > >>>>>>> > correct
> > > > > >>>> > > >> > > >>>>>>> > > >> me,
> > > > > >>>> > > >> > > >>>>>>> > > >> > if
> > > > > >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I
> > > > misunderstood
> > > > > >>>> > > >> something. My
> > > > > >>>> > > >> > > >>>>>>> perspective
> > > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > > >>>> > > >> > > >>>>>>> > > >> > be:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable
> watermark
> > > > > trigger
> > > > > >>>> for
> > > > > >>>> > split
> > > > > >>>> > > >> > > >>>>>>> alignment)
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM
> > > Dawid
> > > > > >>>> > Wysakowicz <
> > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > > wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > capability
> > > > > >>>> > approach, I
> > > > > >>>> > > >> > fear
> > > > > >>>> > > >> > > >>>>>>> that it
> > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> > > Sources
> > > > > can
> > > > > >>>> > always
> > > > > >>>> > > >> be
> > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > > > alignment.
> > > > > >>>> So
> > > > > >>>> > it's
> > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even
> > though
> > > > > they
> > > > > >>>> do
> > > > > >>>> > not
> > > > > >>>> > > >> use
> > > > > >>>> > > >> > any
> > > > > >>>> > > >> > > >>>>>>> of the
> > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should
> just
> > > > > issue a
> > > > > >>>> > > >> warning, if
> > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> section).
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX"
> method
> > > > would
> > > > > >>>> be for
> > > > > >>>> > > >> telling
> > > > > >>>> > > >> > > if
> > > > > >>>> > > >> > > >>>>>>> it
> > > > > >>>> > > >> > > >>>>>>> > > supports
> > > > > >>>> > > >> > > >>>>>>> > > >> > that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment.
> Cross-task
> > > > > >>>> alignment
> > > > > >>>> > would
> > > > > >>>> > > >> > always
> > > > > >>>> > > >> > > >>>>>>> be
> > > > > >>>> > > >> > > >>>>>>> > > supported.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I
> > > > believe
> > > > > >>>> to be
> > > > > >>>> > > >> closest
> > > > > >>>> > > >> > > to a
> > > > > >>>> > > >> > > >>>>>>> > consensus
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do
> you
> > > mind
> > > > > >>>> taking a
> > > > > >>>> > > >> look?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind
> addressing
> > > the
> > > > > >>>> nits?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid
> Heise
> > > > wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of
> 1b).
> > I
> > > > > fully
> > > > > >>>> agree
> > > > > >>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> decorator
> > > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also
> > not
> > > a
> > > > > big
> > > > > >>>> fan
> > > > > >>>> > of
> > > > > >>>> > > >> > > >>>>>>> overloading the
> > > > > >>>> > > >> > > >>>>>>> > > base
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to
> > > > implement
> > > > > as
> > > > > >>>> > is). The
> > > > > >>>> > > >> > > usual
> > > > > >>>> > > >> > > >>>>>>> feedback
> > > > > >>>> > > >> > > >>>>>>> > > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are
> > > always
> > > > > that
> > > > > >>>> > they are
> > > > > >>>> > > >> > > >>>>>>> overwhelming
> > > > > >>>> > > >> > > >>>>>>> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> too
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd
> > > also
> > > > > not
> > > > > >>>> > oppose
> > > > > >>>> > > >> 1c)
> > > > > >>>> > > >> > as
> > > > > >>>> > > >> > > >>>>>>> scattered
> > > > > >>>> > > >> > > >>>>>>> > > >> > interfaces
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just
> > > dislike
> > > > > >>>> 1a) and
> > > > > >>>> > 1d).
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > > capability
> > > > > >>>> > approach, I
> > > > > >>>> > > >> > fear
> > > > > >>>> > > >> > > >>>>>>> that it
> > > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> > > Sources
> > > > > can
> > > > > >>>> > always
> > > > > >>>> > > >> be
> > > > > >>>> > > >> > > >>>>>>> aligned
> > > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > > > alignment.
> > > > > >>>> So
> > > > > >>>> > it's
> > > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even
> > though
> > > > > they
> > > > > >>>> do
> > > > > >>>> > not
> > > > > >>>> > > >> use
> > > > > >>>> > > >> > any
> > > > > >>>> > > >> > > >>>>>>> of the
> > > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should
> just
> > > > > issue a
> > > > > >>>> > > >> warning, if
> > > > > >>>> > > >> > > >>>>>>> they handle
> > > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation
> section).
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to
> > > > > >>>> facilitate
> > > > > >>>> > future
> > > > > >>>> > > >> use
> > > > > >>>> > > >> > > >>>>>>> cases makes
> > > > > >>>> > > >> > > >>>>>>> > > >> sense
> > > > > >>>> > > >> > > >>>>>>> > > >> > but
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is
> > > > chosen)
> > > > > >>>> should
> > > > > >>>> > be
> > > > > >>>> > > >> > > adjusted
> > > > > >>>> > > >> > > >>>>>>> > > >> accordingly.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> > > > > >>>> > PausingSourceReader
> > > > > >>>> > > >> and
> > > > > >>>> > > >> > > I'd
> > > > > >>>> > > >> > > >>>>>>> go for
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's
> > > > proposal
> > > > > >>>> > afaik). We
> > > > > >>>> > > >> > could
> > > > > >>>> > > >> > > >>>>>>> also
> > > > > >>>> > > >> > > >>>>>>> > split
> > > > > >>>> > > >> > > >>>>>>> > > it
> > > > > >>>> > > >> > > >>>>>>> > > >> > into
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit.
> While
> > > > > >>>> > > >> pauseOrResumeSplits
> > > > > >>>> > > >> > > >>>>>>> may allow
> > > > > >>>> > > >> > > >>>>>>> > > >> Sources
> > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library
> > > calls
> > > > > (as
> > > > > >>>> > written
> > > > > >>>> > > >> in
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >>>>>>> > Javadoc),
> > > > > >>>> > > >> > > >>>>>>> > > >> both
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it
> and
> > > I'm
> > > > > not
> > > > > >>>> > sure if
> > > > > >>>> > > >> > there
> > > > > >>>> > > >> > > >>>>>>> is a
> > > > > >>>> > > >> > > >>>>>>> > system
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> can.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with
> > > "pause".
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in
> > the
> > > > > >>>> capability
> > > > > >>>> > > >> > section:
> > > > > >>>> > > >> > > >>>>>>> Sources
> > > > > >>>> > > >> > > >>>>>>> > that
> > > > > >>>> > > >> > > >>>>>>> > > >> > adopt
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used
> in
> > > > > earlier
> > > > > >>>> > > >> versions. So
> > > > > >>>> > > >> > > it
> > > > > >>>> > > >> > > >>>>>>> feels
> > > > > >>>> > > >> > > >>>>>>> > like
> > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > >>>> > > >> > > >>>>>>> > > >> > are
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old
> > > sources
> > > > > can
> > > > > >>>> be
> > > > > >>>> > used
> > > > > >>>> > > >> > after
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > change);
> > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API
> > > > addition.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what
> > > happens
> > > > > >>>> when all
> > > > > >>>> > > >> splits
> > > > > >>>> > > >> > > are
> > > > > >>>> > > >> > > >>>>>>> paused.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how
> > the
> > > 3
> > > > > >>>> flavors
> > > > > >>>> > of
> > > > > >>>> > > >> > > >>>>>>> SourceReaderBase
> > > > > >>>> > > >> > > >>>>>>> > > >> > interact
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes
> sense
> > to
> > > > > >>>> include
> > > > > >>>> > Kafka
> > > > > >>>> > > >> and
> > > > > >>>> > > >> > > >>>>>>> Pulsar in
> > > > > >>>> > > >> > > >>>>>>> > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > FLIP.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather
> immediate
> > > > > >>>> follow-up
> > > > > >>>> > work.
> > > > > >>>> > > >> > (could
> > > > > >>>> > > >> > > >>>>>>> be in the
> > > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52
> PM
> > > > Dawid
> > > > > >>>> > Wysakowicz
> > > > > >>>> > > >> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > >>>> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > > > >>>> SourceReader,
> > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > > > > >>>> SplitReader,
> > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > > > >>>> > AlignedSourceReader"
> > > > > >>>> > > >> > and
> > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader",
> > or
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > > > >>>> SourceReader"
> > > > > >>>> > and
> > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > according
> > > > > to
> > > > > >>>> Dawid
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > > be:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > > > >>>> SourceReader"
> > > > > >>>> > and
> > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> although
> > > > > >>>> personally
> > > > > >>>> > > >> > speaking I
> > > > > >>>> > > >> > > >>>>>>> think it
> > > > > >>>> > > >> > > >>>>>>> > is
> > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the
> > > split
> > > > > >>>> reader
> > > > > >>>> > with
> > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to
> > have
> > > > it
> > > > > >>>> > aligned as
> > > > > >>>> > > >> > much
> > > > > >>>> > > >> > > as
> > > > > >>>> > > >> > > >>>>>>> > possible.
> > > > > >>>> > > >> > > >>>>>>> > > >> I'd
> > > > > >>>> > > >> > > >>>>>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can
> > merge
> > > > > >>>> > > >> > AlignedSplitReader
> > > > > >>>> > > >> > > >>>>>>> with
> > > > > >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin
> > > > wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to
> > align
> > > > > other
> > > > > >>>> > things in
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> future?
> > > > > >>>> > > >> > > >>>>>>> > For
> > > > > >>>> > > >> > > >>>>>>> > > >> > example,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global
> rate
> > > > > >>>> limits. This
> > > > > >>>> > > >> > feature
> > > > > >>>> > > >> > > >>>>>>> may also
> > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > > > >>>> interface to
> > > > > >>>> > > >> support
> > > > > >>>> > > >> > > >>>>>>> align other
> > > > > >>>> > > >> > > >>>>>>> > > >> > things, I
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> such
> > as
> > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming
> > wise,
> > > it
> > > > > >>>> would
> > > > > >>>> > > >> usually
> > > > > >>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> more
> > > > > >>>> > > >> > > >>>>>>> > > >> extensible
> > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> just describe what the method
> > > > actually
> > > > > >>>> does,
> > > > > >>>> > > >> instead
> > > > > >>>> > > >> > of
> > > > > >>>> > > >> > > >>>>>>> assuming
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For
> > example,
> > > > in
> > > > > >>>> this
> > > > > >>>> > case,
> > > > > >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible
> because
> > > this
> > > > > >>>> can be
> > > > > >>>> > used
> > > > > >>>> > > >> for
> > > > > >>>> > > >> > > >>>>>>> any kind of
> > > > > >>>> > > >> > > >>>>>>> > > >> flow
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark
> > alignment
> > > or
> > > > > >>>> simple
> > > > > >>>> > rate
> > > > > >>>> > > >> > > >>>>>>> limiting.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion
> on
> > > > this.
> > > > > I
> > > > > >>>> think
> > > > > >>>> > > >> they
> > > > > >>>> > > >> > > have
> > > > > >>>> > > >> > > >>>>>>> their
> > > > > >>>> > > >> > > >>>>>>> > own
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > > > Interfaces
> > > > > >>>> for
> > > > > >>>> > > >> extending
> > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > TableSource/TableSink,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather
> > than
> > > > > >>>> methods in
> > > > > >>>> > > >> this
> > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > >>>> > > >> > > >>>>>>> > you
> > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > ability
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > Interfaces
> > > > > can
> > > > > >>>> help
> > > > > >>>> > to
> > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > >>>> > > >> > > >>>>>>> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want
> > to
> > > > have
> > > > > >>>> an
> > > > > >>>> > > >> ability.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative
> > interfaces
> > > if
> > > > > >>>> this is
> > > > > >>>> > a
> > > > > >>>> > > >> > general
> > > > > >>>> > > >> > > >>>>>>> design
> > > > > >>>> > > >> > > >>>>>>> > > >> pattern
> > > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the other components in Flink.
> > But
> > > it
> > > > > >>>> looks
> > > > > >>>> > like
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> current API
> > > > > >>>> > > >> > > >>>>>>> > > >> > proposal
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is
> > essentially
> > > > > >>>> > > >> "MySourceReader
> > > > > >>>> > > >> > > >>>>>>> implements
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader,
> > WithSplitsAlignment",
> > > > > >>>> along with
> > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> implements
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API
> symmetric?
> > > I'd
> > > > > >>>> > consider any
> > > > > >>>> > > >> > one
> > > > > >>>> > > >> > > >>>>>>> of the
> > > > > >>>> > > >> > > >>>>>>> > > >> following
> > > > > >>>> > > >> > > >>>>>>> > > >> > as
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > > > >>>> SourceReader,
> > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > > > > >>>> SplitReader,
> > > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > > >>>> > > >> > > >>>>>>> > > or
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > > > >>>> > AlignedSourceReader"
> > > > > >>>> > > >> > and
> > > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader",
> > or
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > > > >>>> SourceReader"
> > > > > >>>> > and
> > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > > according
> > > > > to
> > > > > >>>> Dawid
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > > be:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > > > >>>> SourceReader"
> > > > > >>>> > and
> > > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API,
> although
> > > > > >>>> personally
> > > > > >>>> > > >> > speaking I
> > > > > >>>> > > >> > > >>>>>>> think it
> > > > > >>>> > > >> > > >>>>>>> > is
> > > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the
> > > split
> > > > > >>>> reader
> > > > > >>>> > with
> > > > > >>>> > > >> > > >>>>>>> default impl.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> > > > > >>>> dwysakowicz@apache.org> <
> > > > > >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > >>>> dwysakowicz@apache.org
> > > > > >>>> > >,
> > > > > >>>> > > >> > thanks
> > > > > >>>> > > >> > > >>>>>>> for the
> > > > > >>>> > > >> > > >>>>>>> > > reply.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't
> > have a
> > > > > >>>> preference
> > > > > >>>> > > >> and I
> > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > >>>> > > >> > > >>>>>>> > of
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > alignment
> > > I
> > > > am
> > > > > >>>> fine
> > > > > >>>> > > >> > following
> > > > > >>>> > > >> > > >>>>>>> your
> > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending
> > > from
> > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > there,
> > > > not
> > > > > >>>> in the
> > > > > >>>> > > >> Source
> > > > > >>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the
> > > > "supportsXXX"
> > > > > >>>> in
> > > > > >>>> > Source
> > > > > >>>> > > >> is
> > > > > >>>> > > >> > > >>>>>>> that this
> > > > > >>>> > > >> > > >>>>>>> > > allows
> > > > > >>>> > > >> > > >>>>>>> > > >> > some
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For
> example,
> > > if a
> > > > > >>>> user
> > > > > >>>> > enabled
> > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by
> the
> > > > > Source,
> > > > > >>>> an
> > > > > >>>> > > >> exception
> > > > > >>>> > > >> > > >>>>>>> can be
> > > > > >>>> > > >> > > >>>>>>> > thrown
> > > > > >>>> > > >> > > >>>>>>> > > >> at
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in
> general
> > > > > >>>> useful. That
> > > > > >>>> > > >> said,
> > > > > >>>> > > >> > I
> > > > > >>>> > > >> > > >>>>>>> agree that
> > > > > >>>> > > >> > > >>>>>>> > > API
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better
> to
> > > put
> > > > > the
> > > > > >>>> two
> > > > > >>>> > > >> methods
> > > > > >>>> > > >> > > >>>>>>> together.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM
> > > Jark
> > > > > Wu <
> > > > > >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <
> imjark@gmail.com
> > >
> > > <
> > > > > >>>> > > >> imjark@gmail.com>
> > > > > >>>> > > >> > > >>>>>>> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on
> > > > > FLIP-182.
> > > > > >>>> > Sorry I
> > > > > >>>> > > >> did
> > > > > >>>> > > >> > > >>>>>>> miss it.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns
> then.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40,
> > Dawid
> > > > > >>>> > Wysakowicz <
> > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> > > > > >>>> dwysakowicz@apache.org>
> > > > > >>>> > <
> > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always
> > align
> > > > with
> > > > > >>>> > watermarks
> > > > > >>>> > > >> > when
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > source
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > > > > watermark
> > > > > >>>> > > >> alignment
> > > > > >>>> > > >> > > even
> > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > throughput
> > > > > >>>> somehow. I
> > > > > >>>> > > >> agree
> > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> method
> > > for
> > > > > >>>> users to
> > > > > >>>> > > >> > > configure
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > source
> > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > calculate
> > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > >>>> maxDesiredWatermark
> > > > > >>>> > will
> > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> > > resume.
> > > > > Can
> > > > > >>>> users
> > > > > >>>> > > >> > > configure
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous
> > > > FLIP[1]
> > > > > >>>> which
> > > > > >>>> > has
> > > > > >>>> > > >> > been
> > > > > >>>> > > >> > > >>>>>>> already
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be
> > > enabled
> > > > > >>>> with the
> > > > > >>>> > > >> > > watermark
> > > > > >>>> > > >> > > >>>>>>> strategy
> > > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and
> update
> > > > > >>>> interval.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > > > >>>> interface to
> > > > > >>>> > > >> support
> > > > > >>>> > > >> > > >>>>>>> align other
> > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> such
> > as
> > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the
> > > other
> > > > > >>>> hand
> > > > > >>>> > putting
> > > > > >>>> > > >> > all
> > > > > >>>> > > >> > > >>>>>>> methods,
> > > > > >>>> > > >> > > >>>>>>> > > even
> > > > > >>>> > > >> > > >>>>>>> > > >> > with
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for
> > enabling
> > > > > >>>> certain
> > > > > >>>> > > >> features,
> > > > > >>>> > > >> > > >>>>>>> makes the
> > > > > >>>> > > >> > > >>>>>>> > > entry
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new
> > source
> > > > > >>>> higher.
> > > > > >>>> > > >> Instead of
> > > > > >>>> > > >> > > >>>>>>> focusing on
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> basic
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and required properties of the
> > > > Source,
> > > > > >>>> the
> > > > > >>>> > person
> > > > > >>>> > > >> > > >>>>>>> implementing a
> > > > > >>>> > > >> > > >>>>>>> > > >> source
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to
> > figure
> > > > out
> > > > > >>>> what
> > > > > >>>> > all
> > > > > >>>> > > >> of
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> extra
> > > > > >>>> > > >> > > >>>>>>> > > >> features
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> are
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with
> them.
> > It
> > > > > >>>> makes it
> > > > > >>>> > also
> > > > > >>>> > > >> > > harder
> > > > > >>>> > > >> > > >>>>>>> to
> > > > > >>>> > > >> > > >>>>>>> > > organize
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as
> Jark
> > > > said.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't
> > have a
> > > > > >>>> preference
> > > > > >>>> > > >> and I
> > > > > >>>> > > >> > > >>>>>>> agree most
> > > > > >>>> > > >> > > >>>>>>> > of
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> > alignment
> > > I
> > > > am
> > > > > >>>> fine
> > > > > >>>> > > >> > following
> > > > > >>>> > > >> > > >>>>>>> your
> > > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending
> > > from
> > > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> > there,
> > > > not
> > > > > >>>> in the
> > > > > >>>> > > >> Source
> > > > > >>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> keep the
> > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really
> > > > > unfortunate
> > > > > >>>> the
> > > > > >>>> > > >> > > >>>>>>> "alignSplits"
> > > > > >>>> > > >> > > >>>>>>> > methods
> > > > > >>>> > > >> > > >>>>>>> > > >> > differ
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and
> > > > > >>>> SpitReader. The
> > > > > >>>> > > >> reason
> > > > > >>>> > > >> > > for
> > > > > >>>> > > >> > > >>>>>>> that is
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only
> with
> > > > > >>>> SplitIds,
> > > > > >>>> > whereas
> > > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > > >>>> > > >> > > >>>>>>> > > needs
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I
> > > found
> > > > > the
> > > > > >>>> > > >> discrepancy
> > > > > >>>> > > >> > > >>>>>>> acceptable
> > > > > >>>> > > >> > > >>>>>>> > for
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> > > > > >>>> significantly,
> > > > > >>>> > > >> especially
> > > > > >>>> > > >> > > as
> > > > > >>>> > > >> > > >>>>>>> they
> > > > > >>>> > > >> > > >>>>>>> > would
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> highly
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we
> > > would
> > > > > >>>> have to
> > > > > >>>> > > >> perform
> > > > > >>>> > > >> > > >>>>>>> additional
> > > > > >>>> > > >> > > >>>>>>> > > >> > lookups.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a
> > > > secondary
> > > > > >>>> > interface.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu
> > wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid
> and
> > > > > >>>> Sebastian!
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor
> questions
> > > > > (maybe I
> > > > > >>>> > missed
> > > > > >>>> > > >> > > >>>>>>> something).
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always
> > align
> > > > with
> > > > > >>>> > watermarks
> > > > > >>>> > > >> > when
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > source
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > > > > watermark
> > > > > >>>> > > >> alignment
> > > > > >>>> > > >> > > even
> > > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> > throughput
> > > > > >>>> somehow. I
> > > > > >>>> > > >> agree
> > > > > >>>> > > >> > > >>>>>>> with Becket
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()`
> method
> > > for
> > > > > >>>> users to
> > > > > >>>> > > >> > > configure
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > source
> > > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> > calculate
> > > > > >>>> > > >> > > maxDesiredWatermark?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > > >>>> maxDesiredWatermark
> > > > > >>>> > will
> > > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> > > resume.
> > > > > Can
> > > > > >>>> users
> > > > > >>>> > > >> > > configure
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to
> > align
> > > > > other
> > > > > >>>> > things in
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> future?
> > > > > >>>> > > >> > > >>>>>>> > For
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> example,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global
> rate
> > > > > >>>> limits. This
> > > > > >>>> > > >> > feature
> > > > > >>>> > > >> > > >>>>>>> may also
> > > > > >>>> > > >> > > >>>>>>> > > need
> > > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > > > >>>> interface to
> > > > > >>>> > > >> support
> > > > > >>>> > > >> > > >>>>>>> align other
> > > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods,
> such
> > as
> > > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion
> on
> > > > this.
> > > > > I
> > > > > >>>> think
> > > > > >>>> > > >> they
> > > > > >>>> > > >> > > have
> > > > > >>>> > > >> > > >>>>>>> their
> > > > > >>>> > > >> > > >>>>>>> > own
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > > > Interfaces
> > > > > >>>> for
> > > > > >>>> > > >> extending
> > > > > >>>> > > >> > > >>>>>>> abilities
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > > TableSource/TableSink,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather
> > than
> > > > > >>>> methods in
> > > > > >>>> > > >> this
> > > > > >>>> > > >> > > >>>>>>> case. When
> > > > > >>>> > > >> > > >>>>>>> > you
> > > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> > ability
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > > Interfaces
> > > > > can
> > > > > >>>> help
> > > > > >>>> > to
> > > > > >>>> > > >> > > >>>>>>> organize them
> > > > > >>>> > > >> > > >>>>>>> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want
> > to
> > > > have
> > > > > >>>> an
> > > > > >>>> > > >> ability.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13,
> > > Becket
> > > > > Qin
> > > > > >>>> <
> > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > >>>> > > >> > > >>>>>>> > <
> > > > > >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> > > > > becket.qin@gmail.com>
> > > > > >>>> <
> > > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > > >>>> > > >> > > >>>>>>> > <
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation.
> > > Apologies
> > > > > >>>> that I
> > > > > >>>> > > >> somehow
> > > > > >>>> > > >> > > >>>>>>> misread a
> > > > > >>>> > > >> > > >>>>>>> > > bunch
> > > > > >>>> > > >> > > >>>>>>> > > >> of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were
> > > > "assign".
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default
> > > > implementation,
> > > > > >>>> I was
> > > > > >>>> > > >> > thinking
> > > > > >>>> > > >> > > >>>>>>> of the
> > > > > >>>> > > >> > > >>>>>>> > > default
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little
> > > worried
> > > > > >>>> about
> > > > > >>>> > the
> > > > > >>>> > > >> > > >>>>>>> proliferation of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> decorative
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most
> > > > important
> > > > > >>>> thing
> > > > > >>>> > about
> > > > > >>>> > > >> > > >>>>>>> interfaces is
> > > > > >>>> > > >> > > >>>>>>> > > that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> they
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this
> > > > case, I
> > > > > >>>> prefer
> > > > > >>>> > > >> adding
> > > > > >>>> > > >> > > >>>>>>> new method
> > > > > >>>> > > >> > > >>>>>>> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the
> > > following
> > > > > >>>> reasons:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback
> of
> > > > > >>>> decorative
> > > > > >>>> > > >> > interfaces
> > > > > >>>> > > >> > > >>>>>>> is which
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> > > > > combinations
> > > > > >>>> of
> > > > > >>>> > > >> multiple
> > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP,
> > the
> > > > > >>>> > > >> > withSplitsAlignment
> > > > > >>>> > > >> > > >>>>>>> interface
> > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > >>>> > > >> > > >>>>>>> > > >> > only
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader
> > > which
> > > > > >>>> means it
> > > > > >>>> > > >> can't
> > > > > >>>> > > >> > > >>>>>>> decorate any
> > > > > >>>> > > >> > > >>>>>>> > > >> other
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface
> > design
> > > > > >>>> > perspective, a
> > > > > >>>> > > >> > > >>>>>>> natural
> > > > > >>>> > > >> > > >>>>>>> > question
> > > > > >>>> > > >> > > >>>>>>> > > >> is
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> why
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader"
> > extend
> > > > > >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > >>>> > > >> > > >>>>>>> > > >> > also
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split
> > > reader
> > > > > >>>> > implementing
> > > > > >>>> > > >> > both
> > > > > >>>> > > >> > > >>>>>>> > SplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work,
> > > > because
> > > > > a
> > > > > >>>> > source
> > > > > >>>> > > >> > reader
> > > > > >>>> > > >> > > >>>>>>> > implementing
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and
> > > withSplitsAlignment
> > > > > >>>> works.
> > > > > >>>> > So why
> > > > > >>>> > > >> > > isn't
> > > > > >>>> > > >> > > >>>>>>> there an
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the
> > > > future,
> > > > > if
> > > > > >>>> > there
> > > > > >>>> > > >> is a
> > > > > >>>> > > >> > > new
> > > > > >>>> > > >> > > >>>>>>> feature
> > > > > >>>> > > >> > > >>>>>>> > > >> added
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned
> > > data
> > > > > >>>> aware),
> > > > > >>>> > are
> > > > > >>>> > > >> we
> > > > > >>>> > > >> > > >>>>>>> going to
> > > > > >>>> > > >> > > >>>>>>> > create
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> another
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such
> as
> > > > > >>>> > > >> SortedSplitReader or
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I
> think
> > > the
> > > > > >>>> > additional
> > > > > >>>> > > >> > > >>>>>>> decorative
> > > > > >>>> > > >> > > >>>>>>> > > interface
> > > > > >>>> > > >> > > >>>>>>> > > >> > like
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually
> > > > increases
> > > > > >>>> the
> > > > > >>>> > > >> > > understanding
> > > > > >>>> > > >> > > >>>>>>> cost of
> > > > > >>>> > > >> > > >>>>>>> > > >> users
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> because they have to know what
> > > > > decorative
> > > > > >>>> > > >> interfaces
> > > > > >>>> > > >> > > are
> > > > > >>>> > > >> > > >>>>>>> there,
> > > > > >>>> > > >> > > >>>>>>> > > which
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and
> > > which
> > > > > >>>> > > >> combinations of
> > > > > >>>> > > >> > > the
> > > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which
> > are
> > > > not.
> > > > > >>>> > Ideally we
> > > > > >>>> > > >> > want
> > > > > >>>> > > >> > > >>>>>>> to avoid
> > > > > >>>> > > >> > > >>>>>>> > > >> that.
> > > > > >>>> > > >> > > >>>>>>> > > >> > To
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing
> > having
> > > an
> > > > > >>>> > interface of
> > > > > >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as
> an
> > > > > >>>> internal
> > > > > >>>> > > >> interface
> > > > > >>>> > > >> > > and
> > > > > >>>> > > >> > > >>>>>>> let
> > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the
> > > > > >>>> SourceReader
> > > > > >>>> > with a
> > > > > >>>> > > >> > > default
> > > > > >>>> > > >> > > >>>>>>> > > >> implementation
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic
> > > > branching
> > > > > >>>> in the
> > > > > >>>> > > >> source
> > > > > >>>> > > >> > > >>>>>>> logic,
> > > > > >>>> > > >> > > >>>>>>> > > >> especially
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the
> vast
> > > > > >>>> majority of
> > > > > >>>> > the
> > > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all,
> > would
> > > > just
> > > > > >>>> extend
> > > > > >>>> > > >> from
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new method
> to
> > > the
> > > > > >>>> > interface
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > > >>>>>>> effectively
> > > > > >>>> > > >> > > >>>>>>> > > >> give
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but
> > simpler.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle
> > that
> > > > may
> > > > > be
> > > > > >>>> > worth
> > > > > >>>> > > >> > > >>>>>>> discussing is how
> > > > > >>>> > > >> > > >>>>>>> > > do
> > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> let
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell
> > > Flink
> > > > > >>>> what
> > > > > >>>> > > >> capability
> > > > > >>>> > > >> > > is
> > > > > >>>> > > >> > > >>>>>>> > supported
> > > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> what
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I
> > feel
> > > > the
> > > > > >>>> most
> > > > > >>>> > > >> intuitive
> > > > > >>>> > > >> > > >>>>>>> place to me
> > > > > >>>> > > >> > > >>>>>>> > > is
> > > > > >>>> > > >> > > >>>>>>> > > >> in
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is
> > the
> > > > > >>>> entrance
> > > > > >>>> > of the
> > > > > >>>> > > >> > > >>>>>>> entire Source
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> connector
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> logic.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I
> am
> > > > > >>>> wondering
> > > > > >>>> > if the
> > > > > >>>> > > >> > > >>>>>>> following
> > > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand
> by
> > > the
> > > > > >>>> users.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment"
> to
> > > > > >>>> internal
> > > > > >>>> > > >> interface,
> > > > > >>>> > > >> > > >>>>>>> let both
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it,
> > > with
> > > > a
> > > > > >>>> default
> > > > > >>>> > > >> no-op
> > > > > >>>> > > >> > > >>>>>>> > > implementation.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > > > > >>>> > > >> supportSplitsAlignment()"
> > > > > >>>> > > >> > > to
> > > > > >>>> > > >> > > >>>>>>> the
> > > > > >>>> > > >> > > >>>>>>> > Source
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> > > > > implementation
> > > > > >>>> > returning
> > > > > >>>> > > >> > > >>>>>>> false. Sources
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> have
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic
> > can
> > > > > >>>> change
> > > > > >>>> > this to
> > > > > >>>> > > >> > > >>>>>>> return true,
> > > > > >>>> > > >> > > >>>>>>> > and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits()
> > methods
> > > in
> > > > > the
> > > > > >>>> > > >> > SourceReader
> > > > > >>>> > > >> > > /
> > > > > >>>> > > >> > > >>>>>>> > > SplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> if
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> needed.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new
> > optional
> > > > > >>>> feature is
> > > > > >>>> > > >> going
> > > > > >>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> be added
> > > > > >>>> > > >> > > >>>>>>> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature
> requires
> > > the
> > > > > >>>> > awareness
> > > > > >>>> > > >> from
> > > > > >>>> > > >> > > >>>>>>> Flink, we
> > > > > >>>> > > >> > > >>>>>>> > can
> > > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> more
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM
> > > Dawid
> > > > > >>>> > Wysakowicz <
> > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > >>>> dwysakowicz@apache.org>
> > > > > >>>> > <
> > > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > > > > >>>> > dwysakowicz@apache.org>
> > > > > >>>> > > >> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > >>>> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > >>>> `AlignedSplitReader`
> > > > > >>>> > > >> > > interface
> > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > >>>> > > >> > > >>>>>>> > > >> stop &
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > implemented
> > > > > for
> > > > > >>>> > Kafka and
> > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > >>>> > > >> > > >>>>>>> > > >> correct?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know
> though,
> > > > those
> > > > > >>>> are
> > > > > >>>> > the
> > > > > >>>> > > >> only
> > > > > >>>> > > >> > > >>>>>>> sources
> > > > > >>>> > > >> > > >>>>>>> > which
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> consume
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple
> splits
> > > and
> > > > > >>>> thus
> > > > > >>>> > > >> alignment
> > > > > >>>> > > >> > > >>>>>>> applies.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > > > > alignment
> > > > > >>>> in
> > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> does
> > > not
> > > > > >>>> implement
> > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > WithSplitsAlignment
> > > > > >>>> can
> > > > > >>>> > > >> instead
> > > > > >>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> added to
> > > > > >>>> > > >> > > >>>>>>> > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > > > >>>> > KafkaSourceReader) to
> > > > > >>>> > > >> > make
> > > > > >>>> > > >> > > >>>>>>> it
> > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports
> it.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern.
> Hmm, I
> > > > think
> > > > > >>>> we
> > > > > >>>> > could
> > > > > >>>> > > >> > > >>>>>>> actually do
> > > > > >>>> > > >> > > >>>>>>> > that.
> > > > > >>>> > > >> > > >>>>>>> > > >> Given
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of
> the
> > > > > >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > > > > >>>> > > >> > > >>>>>>> > > >> rather
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the
> > > > > >>>> corresponding
> > > > > >>>> > method
> > > > > >>>> > > >> of
> > > > > >>>> > > >> > > >>>>>>> > SplitFetcher),
> > > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the
> > actual
> > > > > source
> > > > > >>>> > > >> > > >>>>>>> implementations. This
> > > > > >>>> > > >> > > >>>>>>> > > >> solution
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> has the downside though.
> Authors
> > of
> > > > new
> > > > > >>>> > sources
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > > >>>>>>> have to do
> > > > > >>>> > > >> > > >>>>>>> > > two
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> things: extend from
> > > > AlignedSplitReader
> > > > > >>>> and
> > > > > >>>> > > >> implement
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> > > > > >>>> AlignedSplitReader.
> > > > > >>>> > I
> > > > > >>>> > > >> would
> > > > > >>>> > > >> > > be
> > > > > >>>> > > >> > > >>>>>>> fine
> > > > > >>>> > > >> > > >>>>>>> > with
> > > > > >>>> > > >> > > >>>>>>> > > >> such
> > > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others
> > think?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> motivation
> > > > > >>>> section, is
> > > > > >>>> > it
> > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > partitions
> > > > > >>>> and
> > > > > >>>> > one of
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> partition
> > > > > >>>> > > >> > > >>>>>>> > is
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > > > situation,
> > > > > >>>> > watermark
> > > > > >>>> > > >> > for
> > > > > >>>> > > >> > > >>>>>>> this
> > > > > >>>> > > >> > > >>>>>>> > source
> > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> reading
> > > > recent
> > > > > >>>> data
> > > > > >>>> > from
> > > > > >>>> > > >> > other
> > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> won't
> > > > hold
> > > > > >>>> back
> > > > > >>>> > the
> > > > > >>>> > > >> > overall
> > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we
> > may
> > > > have
> > > > > >>>> late
> > > > > >>>> > data
> > > > > >>>> > > >> for
> > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the
> watermark.
> > > > > >>>> Watermark of
> > > > > >>>> > an
> > > > > >>>> > > >> > > >>>>>>> operator is the
> > > > > >>>> > > >> > > >>>>>>> > > >> minimum
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about
> the
> > > > > >>>> restart. Say
> > > > > >>>> > > >> split
> > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > completed.
> > > > job
> > > > > >>>> > failed and
> > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > > > decision
> > > > > >>>> is not
> > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> until
> > > we
> > > > > get
> > > > > >>>> a
> > > > > >>>> > cycle
> > > > > >>>> > > >> of
> > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > saying
> > > > this
> > > > > >>>> > corner is
> > > > > >>>> > > >> a
> > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment
> is
> > > > sort
> > > > > >>>> of a
> > > > > >>>> > > >> general
> > > > > >>>> > > >> > use
> > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > > > > >>>> SourceReader
> > > > > >>>> > > >> directly
> > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface
> of
> > > > > >>>> > > >> WithSplitAssignment?
> > > > > >>>> > > >> > > We
> > > > > >>>> > > >> > > >>>>>>> can
> > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > backwards
> > > > > >>>> > > >> compatibility
> > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a
> > > > default
> > > > > >>>> > > >> > implementation.
> > > > > >>>> > > >> > > >>>>>>> How would
> > > > > >>>> > > >> > > >>>>>>> > we
> > > > > >>>> > > >> > > >>>>>>> > > >> do
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op?
> > Is
> > > it
> > > > > >>>> better
> > > > > >>>> > than
> > > > > >>>> > > >> > > having
> > > > > >>>> > > >> > > >>>>>>> an
> > > > > >>>> > > >> > > >>>>>>> > opt-in
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface? The default
> > > implementation
> > > > > >>>> would
> > > > > >>>> > have
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> added
> > > > > >>>> > > >> > > >>>>>>> > > >> exclusively
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> in
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader
> > interface.
> > > By
> > > > > >>>> the way
> > > > > >>>> > > >> notice
> > > > > >>>> > > >> > > >>>>>>> > > SourceReaderBase
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> does extend from
> > > WithSplitsAlignment,
> > > > > so
> > > > > >>>> > > >> effectively
> > > > > >>>> > > >> > > all
> > > > > >>>> > > >> > > >>>>>>> > > >> implementations
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To
> be
> > > > > honest I
> > > > > >>>> > think
> > > > > >>>> > > >> it is
> > > > > >>>> > > >> > > >>>>>>> impossible
> > > > > >>>> > > >> > > >>>>>>> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader
> > > interface
> > > > > >>>> directly
> > > > > >>>> > by
> > > > > >>>> > > >> end
> > > > > >>>> > > >> > > >>>>>>> users.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > SplitReader
> > > > > >>>> interface
> > > > > >>>> > > >> > probably
> > > > > >>>> > > >> > > >>>>>>> also
> > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at
> > the
> > > > > split
> > > > > >>>> > > >> > granularity.
> > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > >>>> > > >> > > >>>>>>> > add
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> public
> > > > > >>>> interface
> > > > > >>>> > > >> section as
> > > > > >>>> > > >> > > >>>>>>> well?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the
> > > beginning.
> > > > > See
> > > > > >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the
> > > method
> > > > > >>>> name
> > > > > >>>> > > >> > assignSplits
> > > > > >>>> > > >> > > >>>>>>> here,
> > > > > >>>> > > >> > > >>>>>>> > given
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the
> > split
> > > > > >>>> > assignments? It
> > > > > >>>> > > >> > > seems
> > > > > >>>> > > >> > > >>>>>>> > something
> > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > >>>> > > >> > > >>>>>>> more
> > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> The method's called
> > *alignSplits*,
> > > > not
> > > > > >>>> > assign. Do
> > > > > >>>> > > >> you
> > > > > >>>> > > >> > > >>>>>>> still
> > > > > >>>> > > >> > > >>>>>>> > prefer
> > > > > >>>> > > >> > > >>>>>>> > > a
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> different name for that?
> > > Personally,
> > > > I
> > > > > am
> > > > > >>>> > open for
> > > > > >>>> > > >> > > >>>>>>> suggestions
> > > > > >>>> > > >> > > >>>>>>> > > here.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> >
> > > > > >>>> > > >>
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin
> > > > wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort,
> > > > > >>>> Sebastion. I
> > > > > >>>> > think
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> motivation
> > > > > >>>> > > >> > > >>>>>>> > > >> makes a
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few
> > > suggestions
> > > > /
> > > > > >>>> > questions.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment
> is
> > > > sort
> > > > > >>>> of a
> > > > > >>>> > > >> general
> > > > > >>>> > > >> > use
> > > > > >>>> > > >> > > >>>>>>> case, so
> > > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > > > > >>>> SourceReader
> > > > > >>>> > > >> directly
> > > > > >>>> > > >> > > >>>>>>> instead of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface
> of
> > > > > >>>> > > >> WithSplitAssignment?
> > > > > >>>> > > >> > > We
> > > > > >>>> > > >> > > >>>>>>> can
> > > > > >>>> > > >> > > >>>>>>> > > provide
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > > backwards
> > > > > >>>> > > >> compatibility
> > > > > >>>> > > >> > > >>>>>>> won't be an
> > > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > > SplitReader
> > > > > >>>> interface
> > > > > >>>> > > >> > probably
> > > > > >>>> > > >> > > >>>>>>> also
> > > > > >>>> > > >> > > >>>>>>> > needs
> > > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at
> > the
> > > > > split
> > > > > >>>> > > >> > granularity.
> > > > > >>>> > > >> > > >>>>>>> Can you
> > > > > >>>> > > >> > > >>>>>>> > add
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the
> public
> > > > > >>>> interface
> > > > > >>>> > > >> section as
> > > > > >>>> > > >> > > >>>>>>> well?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the
> > > method
> > > > > >>>> name
> > > > > >>>> > > >> > assignSplits
> > > > > >>>> > > >> > > >>>>>>> here,
> > > > > >>>> > > >> > > >>>>>>> > given
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the
> > split
> > > > > >>>> > assignments? It
> > > > > >>>> > > >> > > seems
> > > > > >>>> > > >> > > >>>>>>> > something
> > > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > > >>>> > > >> adjustSplitsThrottling() is
> > > > > >>>> > > >> > > >>>>>>> more
> > > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39
> PM
> > > > Steven
> > > > > >>>> Wu <
> > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> > > <
> > > > > >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> > > > > stevenz3wu@gmail.com>
> > > > > >>>> <
> > > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > > > > >>>> > > >> > > >>>>>>> > <
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> However, a single source
> operator
> > > may
> > > > > >>>> read
> > > > > >>>> > data
> > > > > >>>> > > >> from
> > > > > >>>> > > >> > > >>>>>>> multiple
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g.,
> multiple
> > > > Kafka
> > > > > >>>> > > >> partitions,
> > > > > >>>> > > >> > > such
> > > > > >>>> > > >> > > >>>>>>> that
> > > > > >>>> > > >> > > >>>>>>> > even
> > > > > >>>> > > >> > > >>>>>>> > > >> with
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source
> > > > operator
> > > > > >>>> may
> > > > > >>>> > need
> > > > > >>>> > > >> to
> > > > > >>>> > > >> > > >>>>>>> buffer
> > > > > >>>> > > >> > > >>>>>>> > > excessive
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> amount
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data
> > > > faster
> > > > > >>>> than
> > > > > >>>> > > >> another.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the
> motivation
> > > > > >>>> section, is
> > > > > >>>> > it
> > > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > > partitions
> > > > > >>>> and
> > > > > >>>> > one of
> > > > > >>>> > > >> the
> > > > > >>>> > > >> > > >>>>>>> partition
> > > > > >>>> > > >> > > >>>>>>> > is
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > > > situation,
> > > > > >>>> > watermark
> > > > > >>>> > > >> > for
> > > > > >>>> > > >> > > >>>>>>> this
> > > > > >>>> > > >> > > >>>>>>> > source
> > > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is
> reading
> > > > recent
> > > > > >>>> data
> > > > > >>>> > from
> > > > > >>>> > > >> > other
> > > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it
> won't
> > > > hold
> > > > > >>>> back
> > > > > >>>> > the
> > > > > >>>> > > >> > overall
> > > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > > >>>> > > >> > > >>>>>>> > > I
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we
> > may
> > > > have
> > > > > >>>> late
> > > > > >>>> > data
> > > > > >>>> > > >> for
> > > > > >>>> > > >> > > >>>>>>> this slow
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about
> the
> > > > > >>>> restart. Say
> > > > > >>>> > > >> split
> > > > > >>>> > > >> > > >>>>>>> alignment is
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> > completed.
> > > > job
> > > > > >>>> > failed and
> > > > > >>>> > > >> > > >>>>>>> restored from
> > > > > >>>> > > >> > > >>>>>>> > > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > > > decision
> > > > > >>>> is not
> > > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > > >>>> > > >> > > >>>>>>> > > initially
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced
> until
> > > we
> > > > > get
> > > > > >>>> a
> > > > > >>>> > cycle
> > > > > >>>> > > >> of
> > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> > saying
> > > > this
> > > > > >>>> > corner is
> > > > > >>>> > > >> a
> > > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM
> > > > Thomas
> > > > > >>>> Weise <
> > > > > >>>> > > >> > > >>>>>>> thw@apache.org> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org>
> <
> > > > > >>>> thw@apache.org>
> > > > > >>>> > <
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > > > > alignment
> > > > > >>>> in
> > > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > > >>>> > > >> > > >>>>>>> > > then
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader
> does
> > > not
> > > > > >>>> implement
> > > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > > WithSplitsAlignment
> > > > > >>>> can
> > > > > >>>> > > >> instead
> > > > > >>>> > > >> > be
> > > > > >>>> > > >> > > >>>>>>> added to
> > > > > >>>> > > >> > > >>>>>>> > the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > > > >>>> > KafkaSourceReader) to
> > > > > >>>> > > >> > make
> > > > > >>>> > > >> > > >>>>>>> it
> > > > > >>>> > > >> > > >>>>>>> > explicit
> > > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports
> it.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM
> > > > > >>>> Konstantin
> > > > > >>>> > Knauf <
> > > > > >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <
> > knaufk@apache.org
> > > >
> > > > <
> > > > > >>>> > > >> > knaufk@apache.org
> > > > > >>>> > > >> > > >
> > > > > >>>> > > >> > > >>>>>>> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > > >>>> `AlignedSplitReader`
> > > > > >>>> > > >> > > interface
> > > > > >>>> > > >> > > >>>>>>> (aka the
> > > > > >>>> > > >> > > >>>>>>> > > >> stop
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> &
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > > implemented
> > > > > for
> > > > > >>>> > Kafka and
> > > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> correct?
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is
> > > > valuable
> > > > > >>>> to
> > > > > >>>> > > >> complete
> > > > > >>>> > > >> > the
> > > > > >>>> > > >> > > >>>>>>> watermark
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> aligned
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36
> PM
> > > > Dawid
> > > > > >>>> > Wysakowicz
> > > > > >>>> > > >> <
> > > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > > >>>> dwysakowicz@apache.org>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked
> on
> > > it,
> > > > I
> > > > > >>>> > support it
> > > > > >>>> > > >> ;)
> > > > > >>>> > > >> > I
> > > > > >>>> > > >> > > >>>>>>> think we
> > > > > >>>> > > >> > > >>>>>>> > can
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as
> > > there
> > > > > >>>> are no
> > > > > >>>> > > >> concerns
> > > > > >>>> > > >> > > so
> > > > > >>>> > > >> > > >>>>>>> far.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian
> > > > Mattheis
> > > > > >>>> wrote:
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a
> discussion
> > > on
> > > > > >>>> FLIP 217
> > > > > >>>> > [1]
> > > > > >>>> > > >> for
> > > > > >>>> > > >> > > an
> > > > > >>>> > > >> > > >>>>>>> > extension
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform
> > > > > alignment
> > > > > >>>> also
> > > > > >>>> > in
> > > > > >>>> > > >> > > >>>>>>> SplitReaders. To
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> so,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to
> > > suspend
> > > > > and
> > > > > >>>> > resume
> > > > > >>>> > > >> > reading
> > > > > >>>> > > >> > > >>>>>>> from
> > > > > >>>> > > >> > > >>>>>>> > split
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> sources
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator
> > > coordinates
> > > > > and
> > > > > >>>> > controlls
> > > > > >>>> > > >> > > >>>>>>> suspend and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> resume.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> To
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> gather information about
> current
> > > > > >>>> watermarks
> > > > > >>>> > of the
> > > > > >>>> > > >> > > >>>>>>> SplitReaders,
> > > > > >>>> > > >> > > >>>>>>> > we
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> extend
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the internal
> > > > WatermarkOutputMulitplexer
> > > > > >>>> and
> > > > > >>>> > report
> > > > > >>>> > > >> > > >>>>>>> watermarks to
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP
> [2],
> > > > > >>>> prototyped
> > > > > >>>> > by
> > > > > >>>> > > >> Arvid
> > > > > >>>> > > >> > > >>>>>>> Heise and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> revised
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> and
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz
> (He
> > > did
> > > > > >>>> most of
> > > > > >>>> > the
> > > > > >>>> > > >> > work.)
> > > > > >>>> > > >> > > >>>>>>> and me.
> > > > > >>>> > > >> > > >>>>>>> > The
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> changes
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a
> way
> > > > that
> > > > > if
> > > > > >>>> > affected
> > > > > >>>> > > >> > > >>>>>>> components do
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> not
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the
> > > behavior
> > > > is
> > > > > >>>> as
> > > > > >>>> > before.
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> >
> > > > > >>>> > > >>
> > > > > >>>> >
> > > > > >>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> [2]
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> --
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> twitter.com/snntrablehttps://github.com/knaufk
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > > >>>> > > >> > > >>>>>>> > > >>
> > > > > >>>> > > >> > > >>>>>>> > > >
> > > > > >>>> > > >> > > >>>>>>> > >
> > > > > >>>> > > >> > > >>>>>>> >
> > > > > >>>> > > >> > > >>>>>>>
> > > > > >>>> > > >> > > >>>>>>
> > > > > >>>> > > >> > >
> > > > > >>>> > > >> >
> > > > > >>>> > > >>
> > > > > >>>> > > >
> > > > > >>>> >
> > > > > >>>>
> > > > > >>>
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Hi Piotr,

Just to make sure we are on the same page. There are two cases for the
existing FLIP-182 users:

Case 1: Each source reader only has one split assigned. This is the
targeted case for FLIP-182.
Case 2: Each source reader has multiple splits assigned. This is the flaky
case that may or may not work.

With solution 1, the users of case 1 won't be impacted. The users in case 2
will receive an exception which they won't get at the moment.

Do you mean we should not throw an exception in case 2? Personally I feel
that is OK and could have been done in FLIP-182 itself because it's not a
designed use case. As a user I may see a big variation of the job state
sizes from time to time and I am not able to rely on this feature to plan
my resources and uphold the SLA.

That said, if you have a strong opinion on this, I am fine with having the
configuration like "allow.coarse-grained.watermark.alignment" with the
default value set to false, given that a configuration is much easier to
deprecate than a method.

Thanks,

Jiangjie (Becket) Qin


On Wed, Jun 29, 2022 at 8:02 PM Piotr Nowojski <pn...@apache.org> wrote:

> Thanks for the explanation.
>
> > 2. It is fully compatible with FLIP-182, if we consider it as the right
> > thing to throw an exception for readers reading from multiple splits
> > without supporting split pausing.
>
> I think that's fine. But the question is should we provide a workaround for
> existing users? IMO if it's easy to do, we should.
>
> > I actually think neither solution 1 or 2 breaks FLIP-182 users,
>
> They do. User has currently a working Flink 1.15 deployment, where
> watermark alignment maybe is not behaving ideally, but it's working to some
> extent and you are proposing to throw them an exception after upgrading
> Flink, without any workaround (short of implementing a feature, which is a
> very problematic requirement). Given that costly upgrades are one of the
> major complaints, I would be definitely in favor of option 2. Given the
> most likely small actually affected user base, I would be +1 for solution 2
> with throwing an exception by default.
>
> Best,
> Piotrek
>
>
> śr., 29 cze 2022 o 12:55 Becket Qin <be...@gmail.com> napisał(a):
>
> > Hi Piotr,
> >
> > Please see the reply inline below:
> >
> > On Wed, Jun 29, 2022 at 5:11 PM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> > > Hi Becket,
> > >
> > > > My main concern of having a supportsPausingSplits() knob
> > >
> > > What is the problem with `supportsPausingSplits()` that you see?  Do
> you
> > > want to remove it?
> > >
> > Just to make sure we are on the same page, I assume we are talking about
> > this supportingPausingSplits() method in the Source interface. If we go
> > with the obligatory features addition path, having this method seems
> > misleading. And also, later on at some point when we see all the sources
> > have implemented this feature, we will have to worry about deprecating
> this
> > method, which is backwards incompatible.
> >
> >
> > > Also I don't understand your proposal for Solution 1. How do you want
> to
> > > decide whether to throw an exception? For that we would need to have
> > > `supportsPausingSplits()`, right?
> > >
> >
> > What I am thinking is the following:
> >
> > 1. The Flink framework always assumes split pausing is supported and just
> > naively invokes SourceReader#pauseOrResumeSplits().
> > 2. The SourceReaderBase will basically again try to ask the SplitReader
> to
> > pause the splits.
> > 3. Because the default implementation throws an
> > UnsupportedOperationException, if the source developer did not override
> it,
> > this exception will be thrown and bubbled up.
> > 4. After catching this exception, the SourceReaderBase will just check if
> > there is only one split that is currently assigned to the split reader.
> If
> > so, it swallows the exception, stops polling the split reader and returns
> > NOTHING_AVAILABLE. This is the same as the current logic in the
> > SourceOperator. If we are not comfortable with moving this logic to the
> > SourceReaderBase, we can also just keep the logic there and simply let
> > SourceOperator remember if there are more than one split assigned to the
> > source reader, when SourceOperator.handleAddSplitsEvent() is invoked.
> >
> > This way the existing FLIP-182 users won't be impacted by this FLIP. For
> > those source readers that only have one split assigned, it works fine
> > without any change. For those source readers with multiple splits
> assigned,
> > they are already in a limp state with unpredictable side effects. We
> might
> > as well let them know this instead of pretending the
> > coarse-grained watermark alignment works fine for them.
> >
> > The advantage of this solution is that we don't have to do anything after
> > this. That would work fine as the final state, as in:
> > 1. We have already done the best we can do for the Sources that do not
> > support split pausing.
> > 2. It is fully compatible with FLIP-182, if we consider it as the right
> > thing to throw an exception for readers reading from multiple splits
> > without supporting split pausing.
> > 3. There is nothing to deprecate in the future.
> >
> >
> > >
> > > If so, I would prefer solution 2, to provide a graceful migration path
> > for
> > > any users that are already using FLIP-182 with multiple splits per
> > > operator. I don't think there are many of those, but such a flag seems
> > easy
> > > to implement while making migration easier. Having said that,
> technically
> > > we could change the behaviour and start throwing an exception always in
> > > such a case, as this feature is marked as Experimental.
> > >
> >
> > I actually think neither solution 1 or 2 breaks FLIP-182 users, but
> > solution 2 needs a deprecation process for the option in the future.
> >
> >
> > >
> > > Best,
> > > Piotrek
> > >
> > > śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com> napisał(a):
> > >
> > > > Hi Sebastian,
> > > >
> > > > Regarding the question,
> > > >
> > > > >
> > > > > @Becket: I'm not sure about the intention of solution 1. Can you
> > > explain
> > > > > that a bit more? In particular, I don't understand: "The reasoning
> > > behind
> > > > > this solution is that existing users should only use the
> > > > > coarse watermark alignment when a source reader only reads from a
> > > single
> > > > > split." Why should a user not use coarse watermark alignment when
> > > source
> > > > > reader reads from multiple splits? The split alignment uses the
> > "coarse
> > > > > watermark", i.e., maxDesiredWatermark, as described in the FLIP for
> > > > > alignment.
> > > >
> > > >
> > > > Imagine you have a source reader reading from two splits, and the
> > > > watermarks look like the following:
> > > > 1. Watermark of Split 1: 10:00 AM Jun 29,
> > > > 2. Watermark of Split 2: 11:00 AM Jun 29
> > > > 3. maxDesiredWatermark:10:30 AM Jun 29
> > > >
> > > > At this point, the source reader's watermark is 10:00 AM which is
> lower
> > > > than the maxDesiredWatermark, so the source reader won't be paused
> from
> > > > reading. However, because the source reader cannot specify which
> split
> > to
> > > > read from, if it continues to read, the watermark gap between the two
> > > > splits may become even bigger. This essentially fails the main
> purpose
> > of
> > > > watermark alignment - to reduce the number of records buffered in the
> > > > state. This does not necessarily happen, but this is not what
> FLIP-182
> > > was
> > > > designed for to begin with. So I'd rather avoid extending the feature
> > > > to that case.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> > > > sebastian@ververica.com>
> > > > wrote:
> > > >
> > > > > (Sorry I didn't send to the list but only to Becket. My bad and
> > thanks
> > > > > Piotr. Next attempt:)
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Thanks for pushing the FLIP. I would drive it and would be happy to
> > get
> > > > > back to you, @Thomas, for reviews. (Sorry for low responsiveness,
> > there
> > > > > were several efforts with high priority on my side ...) As next
> > step, I
> > > > > would revise the FLIP to get the discussion concluded.
> > > > >
> > > > > However, as Becket mentioned I feel that some things are still not
> > > clear
> > > > > yet:
> > > > >
> > > > > Re: Thomas
> > > > >>
> > > > >> However, from a user perspective, should the split level alignment
> > be
> > > > >>> an opt-in feature, at least for a few releases? If yes, then we
> > would
> > > > >>> require a knob similar to supportsPausingSplits(), which I
> > understand
> > > > >>> won't be part of the revised FLIP. Such control may be
> beneficial:
> > > > >>
> > > > >>
> > > > >>> * Compare runtime behavior with split level alignment on/off
> > > > >>> * Allow use of sources that don't implement pausing splits yet
> > > > >>
> > > > >>
> > > > >>> The second point would, from the user's perspective, be necessary
> > for
> > > > >>> backward compatibility? While the interface aspect and source
> > > > >>> compatibility has been discussed in great detail, I don't think
> it
> > > > >>> would be desirable if an application that already uses alignment
> > > fails
> > > > >>> after upgrading to the new Flink version, forcing users to lock
> > step
> > > > >>> modify sources for the new non-optional split level alignment.
> > > > >>
> > > > >>
> > > > >>> So I think clarification of the compatibility aspect on the FLIP
> > page
> > > > >>> would be necessary.
> > > > >>
> > > > >>
> > > > >> That is a good point. Currently FLIP-182 is marked as
> experimental.
> > So
> > > > >> technically speaking it could change. That said, I agree that it
> > would
> > > > be
> > > > >> good to not break the existing sources.
> > > > >>
> > > > >> My main concern of having a supportsPausingSplits() knob is that
> > this
> > > > >> might be something requiring code change on future derepcation. I
> am
> > > > >> thinking of two potential solutions:
> > > > >>
> > > > >> Solution 1:
> > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if
> > the
> > > > >> source reader only has one split assigned, the source reader
> simply
> > > > stops
> > > > >> polling but just returns NOTHING_AVAILABLE. If there are more than
> > one
> > > > >> splits assigned, it throws an exception with a message such as
> "The
> > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > alignment
> > > > >> when assigned a single split. There are more than one split
> assigned
> > > to
> > > > the
> > > > >> SplitReader".
> > > > >> The reasoning behind this solution is that existing users should
> > only
> > > > use
> > > > >> the coarse watermark alignment when a source reader only reads
> from
> > a
> > > > >> single split. Reading from more than one split might have unwanted
> > > side
> > > > >> effects, so we might as well throw an exception in this case.
> > > > >>
> > > > >> Solution 2:
> > > > >> Having a configuration
> "enable.coarse-grained.watermark.alignment",
> > > the
> > > > >> default value is false. Once it is set to true, we will allow
> > > > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > > > >> This solution allows users to keep the current FLIP-182 behavior,
> > with
> > > > >> the risk of side effects.
> > > > >>
> > > > >> Personally speaking, I feel solution 1 seems better because
> > > > >> coarse-grained watermark alignment could be frustrating to the
> users
> > > > >> when more than one split is assigned. So we might as well not
> > support
> > > > it at
> > > > >> all. And also there is nothing to deprecate in the future with
> this
> > > > >> solution.
> > > > >>
> > > > >> What do you think?
> > > > >>
> > > > >
> > > > > @Thomas: My understanding is that you intend a simple switch/knob
> to
> > > test
> > > > > w/ and w/o (split) watermark alignment, right? Isn't the
> > coarse-grained
> > > > w/
> > > > > vs w/o watermark alignment sufficient for that? Or do you think
> that
> > > > > switching watermark aligment explicitly on split level is required?
> > > > >
> > > > > @Becket: I'm not sure about the intention of solution 1. Can you
> > > explain
> > > > > that a bit more? In particular, I don't understand: "The reasoning
> > > behind
> > > > > this solution is that existing users should only use the coarse
> > > watermark
> > > > > alignment when a source reader only reads from a single split." Why
> > > > > should a user not use coarse watermark alignment when source reader
> > > reads
> > > > > from multiple splits? The split alignment uses the "coarse
> > watermark",
> > > > > i.e., maxDesiredWatermark, as described in the FLIP for alignment.
> > > > >
> > > > > Could you please clarify?
> > > > >
> > > > > Regards,
> > > > > Sebastian
> > > > >
> > > > > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Thanks for the feedback, Thomas and Steve. And thanks Piotr for
> the
> > > > >> patient and detailed discussion.
> > > > >>
> > > > >> Let's move forward with option 1 then.
> > > > >>
> > > > >> Re: Thomas
> > > > >>
> > > > >> However, from a user perspective, should the split level alignment
> > be
> > > > >>> an opt-in feature, at least for a few releases? If yes, then we
> > would
> > > > >>> require a knob similar to supportsPausingSplits(), which I
> > understand
> > > > >>> won't be part of the revised FLIP. Such control may be
> beneficial:
> > > > >>
> > > > >>
> > > > >>> * Compare runtime behavior with split level alignment on/off
> > > > >>> * Allow use of sources that don't implement pausing splits yet
> > > > >>
> > > > >>
> > > > >>> The second point would, from the user's perspective, be necessary
> > for
> > > > >>> backward compatibility? While the interface aspect and source
> > > > >>> compatibility has been discussed in great detail, I don't think
> it
> > > > >>> would be desirable if an application that already uses alignment
> > > fails
> > > > >>> after upgrading to the new Flink version, forcing users to lock
> > step
> > > > >>> modify sources for the new non-optional split level alignment.
> > > > >>
> > > > >>
> > > > >>> So I think clarification of the compatibility aspect on the FLIP
> > page
> > > > >>> would be necessary.
> > > > >>
> > > > >>
> > > > >> That is a good point. Currently FLIP-182 is marked as
> experimental.
> > So
> > > > >> technically speaking it could change. That said, I agree that it
> > would
> > > > be
> > > > >> good to not break the existing sources.
> > > > >>
> > > > >> My main concern of having a supportsPausingSplits() knob is that
> > this
> > > > >> might be something requiring code change on future derepcation. I
> am
> > > > >> thinking of two potential solutions:
> > > > >>
> > > > >> Solution 1:
> > > > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if
> > the
> > > > >> source reader only has one split assigned, the source reader
> simply
> > > > stops
> > > > >> polling but just returns NOTHING_AVAILABLE. If there are more than
> > one
> > > > >> splits assigned, it throws an exception with a message such as
> "The
> > > > >> unpausable SplitReader CLASS_NAME only works with watermark
> > alignment
> > > > when
> > > > >> assigned a single split. There are more than one split assigned to
> > the
> > > > >> SplitReader".
> > > > >> The reasoning behind this solution is that existing users should
> > only
> > > > use
> > > > >> the coarse watermark alignment when a source reader only reads
> from
> > a
> > > > >> single split. Reading from more than one split might have unwanted
> > > side
> > > > >> effects, so we might as well throw an exception in this case.
> > > > >>
> > > > >> Solution 2:
> > > > >> Having a configuration
> "enable.coarse-grained.watermark.alignment",
> > > the
> > > > >> default value is false. Once it is set to true, we will allow
> > > > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > > > >> This solution allows users to keep the current FLIP-182 behavior,
> > with
> > > > >> the risk of side effects.
> > > > >>
> > > > >> Personally speaking, I feel solution 1 seems better because
> > > > >> coarse-grained watermark alignment could be frustrating to the
> users
> > > > when
> > > > >> more than one split is assigned. So we might as well not support
> it
> > at
> > > > all.
> > > > >> And also there is nothing to deprecate in the future with this
> > > solution.
> > > > >>
> > > > >> What do you think?
> > > > >>
> > > > >> Thank,
> > > > >>
> > > > >> Jiangjie (Becket) Qin
> > > > >>
> > > > >>
> > > > >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <
> > pnowojski@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi,
> > > > >>>
> > > > >>> It looks like option 1 wins overall? So let's go with that.
> > > > >>>
> > > > >>> Best,
> > > > >>> Piotrek
> > > > >>>
> > > > >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com>
> > > napisał(a):
> > > > >>>
> > > > >>>> Both option 1 (default impl in base interface) and option 2
> > > > (decorative
> > > > >>>> interface) are pretty common patterns. I would also be fine with
> > > > either.
> > > > >>>> The important thing is that an exception is thrown if a source
> > > doesn't
> > > > >>>> support the alignment capability.
> > > > >>>>
> > > > >>>> The other point is that we can validate the source capability if
> > > > >>>> alignment
> > > > >>>> is enabled in WatermarkStrategy. I believe either option can
> > achieve
> > > > >>>> this
> > > > >>>> goal too.
> > > > >>>> public interface WatermarkStrategy<T> {
> > > > >>>>     WatermarkStrategy<T> withWatermarkAlignment(String
> > > watermarkGroup,
> > > > >>>> Duration maxAllowedWatermarkDrift);
> > > > >>>> }
> > > > >>>>
> > > > >>>> If I have to pick one, I am slightly favoring option 1 (base
> > > > >>>> interface). As
> > > > >>>> watermark is already an essential concept of source, maybe
> > watermark
> > > > >>>> alignment capability can also be a property of the base
> > > source/reader
> > > > >>>> interface.
> > > > >>>>
> > > > >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org>
> > > wrote:
> > > > >>>>
> > > > >>>> > Hi everyone,
> > > > >>>> >
> > > > >>>> > Thank you for all the effort that went into this discussion.
> The
> > > > split
> > > > >>>> > level watermark alignment will be an important feature for
> Flink
> > > > that
> > > > >>>> > will address operational problems for various use cases. From
> > > > reading
> > > > >>>> > through this thread it appears that not too much remains to
> > bring
> > > > this
> > > > >>>> > FLIP to acceptance and allow development to move forward. I
> > would
> > > > like
> > > > >>>> > to contribute if possible.
> > > > >>>> >
> > > > >>>> > Regarding option 1 vs. option 2: I don't have a strong
> > preference,
> > > > >>>> > perhaps slightly leaning towards option 1.
> > > > >>>> >
> > > > >>>> > However, from a user perspective, should the split level
> > alignment
> > > > be
> > > > >>>> > an opt-in feature, at least for a few releases? If yes, then
> we
> > > > would
> > > > >>>> > require a knob similar to supportsPausingSplits(), which I
> > > > understand
> > > > >>>> > won't be part of the revised FLIP. Such control may be
> > beneficial:
> > > > >>>> >
> > > > >>>> > * Compare runtime behavior with split level alignment on/off
> > > > >>>> > * Allow use of sources that don't implement pausing splits yet
> > > > >>>> >
> > > > >>>> > The second point would, from the user's perspective, be
> > necessary
> > > > for
> > > > >>>> > backward compatibility? While the interface aspect and source
> > > > >>>> > compatibility has been discussed in great detail, I don't
> think
> > it
> > > > >>>> > would be desirable if an application that already uses
> alignment
> > > > fails
> > > > >>>> > after upgrading to the new Flink version, forcing users to
> lock
> > > step
> > > > >>>> > modify sources for the new non-optional split level alignment.
> > > > >>>> >
> > > > >>>> > So I think clarification of the compatibility aspect on the
> FLIP
> > > > page
> > > > >>>> > would be necessary.
> > > > >>>> >
> > > > >>>> > Thanks,
> > > > >>>> > Thomas
> > > > >>>> >
> > > > >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> > > > >>>> piotr.nowojski@gmail.com>
> > > > >>>> > wrote:
> > > > >>>> > >
> > > > >>>> > > Hi Becket,
> > > > >>>> > >
> > > > >>>> > > Thanks for summing this up. Just one correction:
> > > > >>>> > >
> > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > >>>> > > >   e) It is OK that the code itself in option 2 indicates
> the
> > > > >>>> developers
> > > > >>>> > > that a feature is optional. We will rely on the
> documentation
> > to
> > > > >>>> correct
> > > > >>>> > > that and clarify that the feature is actually obligatory.
> > > > >>>> > >
> > > > >>>> > > I would say based on a) and b) that feature would be still
> > > > >>>> optional. So
> > > > >>>> > > both the implementation and the documentation would be
> saying
> > > > that.
> > > > >>>> We
> > > > >>>> > > could add a mention to the docs and release notes, that this
> > > > >>>> feature will
> > > > >>>> > > be obligatory in the next major release and plan such a
> > release
> > > > >>>> > accordingly.
> > > > >>>> > >
> > > > >>>> > > Re the option 1., as you mentioned:
> > > > >>>> > > > As for option 1: For developers, the feature is still
> > optional
> > > > >>>> due to
> > > > >>>> > the
> > > > >>>> > > default implementation in the interface, regardless of what
> > the
> > > > >>>> default
> > > > >>>> > > implementation does, because the code compiles without
> > > overriding
> > > > >>>> these
> > > > >>>> > > methods
> > > > >>>> > >
> > > > >>>> > > Also importantly, the code will work in most cases.
> > > > >>>> > >
> > > > >>>> > > > Obligatory: Jobs may fail if these methods are not
> > implemented
> > > > >>>> > properly.
> > > > >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common
> > > pattern
> > > > in
> > > > >>>> > Java,
> > > > >>>> > > e.g. Iterator.remove() by default throws
> > > > >>>> "UnsupportedOperationException",
> > > > >>>> > > informing the implementation that things may go wrong if
> this
> > > > >>>> method is
> > > > >>>> > not
> > > > >>>> > > implemented.
> > > > >>>> > >
> > > > >>>> > > For me `Iterator#remove()` is an optional feature.
> > Personally, I
> > > > >>>> don't
> > > > >>>> > > remember if I have ever implemented it.
> > > > >>>> > >
> > > > >>>> > > Best,
> > > > >>>> > > Piotrek
> > > > >>>> > >
> > > > >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com>
> > > > >>>> napisał(a):
> > > > >>>> > >
> > > > >>>> > > > I had an offline discussion with Piotr and here is the
> > > summary.
> > > > >>>> Please
> > > > >>>> > > > correct me if I miss something, Piotr.
> > > > >>>> > > >
> > > > >>>> > > > There are two things we would like to seek more opinions
> > from
> > > > the
> > > > >>>> > > > community, so we can make progress on this FLIP.
> > > > >>>> > > >
> > > > >>>> > > > 1. The General pattern to add obligatory features to
> > existing
> > > > >>>> > interfaces.
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> ***********************************************************************************
> > > > >>>> > > > For interfaces exposed to the developers for
> implementation,
> > > > they
> > > > >>>> are
> > > > >>>> > > > either intended to be *optional* or *obligatory. *While it
> > is
> > > > >>>> quite
> > > > >>>> > clear
> > > > >>>> > > > about how to convey that intention when creating the
> > > interfaces,
> > > > >>>> it is
> > > > >>>> > not
> > > > >>>> > > > as commonly agreed when we are adding new features to an
> > > > existing
> > > > >>>> > > > interface. In general, Flink uses decorative interfaces
> when
> > > > >>>> adding
> > > > >>>> > > > optional features to existing interfaces. Both Piotr and I
> > > agree
> > > > >>>> that
> > > > >>>> > looks
> > > > >>>> > > > good.
> > > > >>>> > > >
> > > > >>>> > > > Different opinions are mainly about how to add obligatory
> > > > >>>> features to
> > > > >>>> > the
> > > > >>>> > > > existing interfaces, probably due to different
> > understandings
> > > of
> > > > >>>> > > > "obligatory".
> > > > >>>> > > >
> > > > >>>> > > > We have discussed about four options:
> > > > >>>> > > >
> > > > >>>> > > > *Option 1:*
> > > > >>>> > > >
> > > > >>>> > > >    - Just add a new method to the existing interface.
> > > > >>>> > > >    - For backwards compatibility, the method would have a
> > > > default
> > > > >>>> > > >    implementation throwing
> "UnsupportedOperationException".
> > > > >>>> > > >    - In the next major version, remove the default
> > > > implementation.
> > > > >>>> > > >    - For the developers, any method with a default
> > > > implementation
> > > > >>>> > > >    throwing an "UnsupportedOperationException" should be
> > taken
> > > > as
> > > > >>>> > obligatory.
> > > > >>>> > > >
> > > > >>>> > > > *Option 2:*
> > > > >>>> > > >
> > > > >>>> > > >    - Always make the features optional by adding a
> > decorative
> > > > >>>> > interface,
> > > > >>>> > > >    just like ordinary optional features.
> > > > >>>> > > >    - Inform the developers via documentation that this
> > feature
> > > > is
> > > > >>>> > > >    obligatory, although it looks like optional from the
> > code.
> > > > >>>> > > >    - In case the developers did not implement the
> decorative
> > > > >>>> interface,
> > > > >>>> > > >    throw an exception
> > > > >>>> > > >    - In the next major version, move the methods in the
> > > > decorative
> > > > >>>> > > >    interface to the base interface, and deprecate the
> > > decorative
> > > > >>>> > interface.
> > > > >>>> > > >
> > > > >>>> > > > *Option 3:*
> > > > >>>> > > >
> > > > >>>> > > >    - Always bump the major version when a new obligatory
> > > feature
> > > > >>>> is
> > > > >>>> > > >    added, even if we may have to do it frequently.
> > > > >>>> > > >
> > > > >>>> > > > *Option 4:*
> > > > >>>> > > >
> > > > >>>> > > >    - Add a V2, V3... of the interface affected by the new
> > > > >>>> obligatory
> > > > >>>> > > >    feature.
> > > > >>>> > > >    - In the next major versions, deprecate old versions of
> > the
> > > > >>>> > interfaces.
> > > > >>>> > > >
> > > > >>>> > > > Both Piotr and me agreed that option 3 and option 4 have a
> > big
> > > > >>>> side
> > > > >>>> > effect
> > > > >>>> > > > and should be avoided. We have different preference
> between
> > > > >>>> option 1
> > > > >>>> > and
> > > > >>>> > > > option 2.
> > > > >>>> > > >
> > > > >>>> > > > Personally I prefer option 1, the reasons are:
> > > > >>>> > > >   a) simple and intuitive. Java 8 introduced the default
> > impl
> > > in
> > > > >>>> > > > interfaces exactly for interface evolving, and this is a
> > > common
> > > > >>>> > pattern in
> > > > >>>> > > > many projects.
> > > > >>>> > > >   b) prominent to the developers that the feature is
> > expected
> > > to
> > > > >>>> be
> > > > >>>> > > > implemented, because it explicitly throws an exception in
> > the
> > > > >>>> default
> > > > >>>> > impl.
> > > > >>>> > > >   c) low maintenance overhead - the Flink framework can
> > always
> > > > >>>> assume
> > > > >>>> > the
> > > > >>>> > > > method exists, so no special handling logic is needed.
> > > > >>>> > > >   d) communicate a clear semantic boundary between
> optional
> > > and
> > > > >>>> > obligatory
> > > > >>>> > > > features in the Flink to the developers.
> > > > >>>> > > >       - Optional: Jobs still run without exception if
> these
> > > > >>>> methods are
> > > > >>>> > > > not implemented. e.g. all the SupportsXXXPushDown
> > interfaces.
> > > > >>>> > > >       - Obligatory: Jobs may fail if these methods are not
> > > > >>>> implemented
> > > > >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This
> is a
> > > > >>>> common
> > > > >>>> > pattern
> > > > >>>> > > > in Java, e.g. Iterator.remove() by default throws
> > > > >>>> > > > "UnsupportedOperationException", informing the
> > implementation
> > > > that
> > > > >>>> > things
> > > > >>>> > > > may go wrong if this method is not implemented.
> > > > >>>> > > >
> > > > >>>> > > > As for option 2, Although the API itself sounds clean, it
> > > > misleads
> > > > >>>> > people
> > > > >>>> > > > to think of an obligatory feature to be optional - from
> the
> > > code
> > > > >>>> the
> > > > >>>> > > > feature is optional, but the documents say it is
> obligatory.
> > > We
> > > > >>>> > probably
> > > > >>>> > > > should avoid such code-doc inconsistency, as people will
> be
> > > > >>>> confused.
> > > > >>>> > And I
> > > > >>>> > > > would actually be bewildered that sometimes not
> implementing
> > > an
> > > > >>>> > "optional"
> > > > >>>> > > > feature is fine, but sometimes it causes the jobs to fail.
> > > > >>>> > > >
> > > > >>>> > > > In response to the argument that the method with a default
> > > > >>>> > implementation
> > > > >>>> > > > is always optional, if that is true, it actually means all
> > the
> > > > >>>> > interfaces
> > > > >>>> > > > should be immutable once they are created. If we want to
> > add a
> > > > >>>> method
> > > > >>>> > to an
> > > > >>>> > > > existing interface, for backwards compatibility, we will
> > have
> > > to
> > > > >>>> > provide a
> > > > >>>> > > > default implementation. And the fact it has a default
> > > > >>>> implementation
> > > > >>>> > > > indicates the method is optional. If that method is
> > optional,
> > > it
> > > > >>>> should
> > > > >>>> > > > reside in a separate decorative interface, otherwise it
> > clogs
> > > > that
> > > > >>>> > existing
> > > > >>>> > > > interface. Therefore, people should never add a method to
> an
> > > > >>>> existing
> > > > >>>> > > > interface. I find this conclusion a bit extreme.
> > > > >>>> > > >
> > > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > > >>>> > > >     a) Obligatory methods are the methods that fail the
> code
> > > > >>>> > compilation
> > > > >>>> > > > if not implemented.
> > > > >>>> > > >     b) All obligatory methods should reside in the base
> > > > interface,
> > > > >>>> > without
> > > > >>>> > > > a default implementation. And all the optional methods
> > should
> > > be
> > > > >>>> in
> > > > >>>> > > > decorative interfaces. This is a clean API.
> > > > >>>> > > >     c) due to b), there isn't a viable solution to add an
> > > > >>>> obligatory
> > > > >>>> > > > method to an existing interface in a backwards compatible
> > way.
> > > > >>>> Unless
> > > > >>>> > we
> > > > >>>> > > > are OK with breaking backwards compatibility, all the
> > > interfaces
> > > > >>>> > should be
> > > > >>>> > > > treated as immutable. As a compromise, we might as well
> just
> > > > >>>> treat all
> > > > >>>> > the
> > > > >>>> > > > features added later as optional features. This way we
> keep
> > > the
> > > > >>>> API
> > > > >>>> > clean.
> > > > >>>> > > >     d) based on b) and c), option 2 has a clean API, while
> > > > option
> > > > >>>> 1
> > > > >>>> > does
> > > > >>>> > > > not.
> > > > >>>> > > >     e) It is OK that the code itself in option 2 indicates
> > the
> > > > >>>> > developers
> > > > >>>> > > > that a feature is optional. We will rely on the
> > documentation
> > > to
> > > > >>>> > correct
> > > > >>>> > > > that and clarify that the feature is actually obligatory.
> > > > >>>> > > >     f) Regarding the effectiveness of making people aware
> > that
> > > > the
> > > > >>>> > feature
> > > > >>>> > > > is obligatory, Option 1 and Option 2 are similar. For
> people
> > > > that
> > > > >>>> do
> > > > >>>> > not
> > > > >>>> > > > read the release note / documentation, they will mistake
> the
> > > > >>>> feature
> > > > >>>> > to be
> > > > >>>> > > > optional anyways.
> > > > >>>> > > >
> > > > >>>> > > > As for option 1: For developers, the feature is still
> > optional
> > > > >>>> due to
> > > > >>>> > the
> > > > >>>> > > > default implementation in the interface, regardless of
> what
> > > the
> > > > >>>> default
> > > > >>>> > > > implementation does, because the code compiles without
> > > > overriding
> > > > >>>> these
> > > > >>>> > > > methods. Also, another problem of this option is that for
> > > users
> > > > >>>> that
> > > > >>>> > do not
> > > > >>>> > > > know about the history of the interface, they may be
> > confused
> > > by
> > > > >>>> the
> > > > >>>> > > > default implementation throwing an exception.
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > > 2. For this particular FLIP, should it be optional or not?
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> ***********************************************************************************
> > > > >>>> > > > As mentioned in the previous email, I feel this FLIP
> should
> > be
> > > > >>>> > obligatory,
> > > > >>>> > > > for the following reasons:
> > > > >>>> > > > 1. The Flink framework exposes the watermark alignment API
> > to
> > > > the
> > > > >>>> end
> > > > >>>> > > > users. From the end users' perspective, the feature should
> > be
> > > > >>>> available
> > > > >>>> > > > regardless of the implementation details in the
> pluggables.
> > > This
> > > > >>>> is
> > > > >>>> > true
> > > > >>>> > > > for any other methods exposed as the Flink API.
> > > > >>>> > > > 2. If a Source is not pausable, the end user should
> receive
> > an
> > > > >>>> > exception
> > > > >>>> > > > when enable the watermark alignment, (both Piotr and me
> > agree
> > > on
> > > > >>>> > this). In
> > > > >>>> > > > that case, it meets my criteria of obligatory feature
> > because
> > > > not
> > > > >>>> > > > implementing the feature causes a framework API to throw
> > > > >>>> exception and
> > > > >>>> > > > fails the job.
> > > > >>>> > > >
> > > > >>>> > > > On the other hand, Piotr does not have a strong opinion
> > > > regarding
> > > > >>>> > whether
> > > > >>>> > > > this feature should be optional or not.
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > > Thanks for reading through this long email. So basically
> in
> > > > order
> > > > >>>> to
> > > > >>>> > make
> > > > >>>> > > > progress on this FLIP, we want to see what do people feel
> > > about
> > > > >>>> the
> > > > >>>> > above
> > > > >>>> > > > two topics.
> > > > >>>> > > >
> > > > >>>> > > > Thanks,
> > > > >>>> > > >
> > > > >>>> > > > Jiangjie (Becket) Qin
> > > > >>>> > > >
> > > > >>>> > > >
> > > > >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> > > > >>>> pnowojski@apache.org>
> > > > >>>> > > > wrote:
> > > > >>>> > > >
> > > > >>>> > > >> Hi Becket,
> > > > >>>> > > >>
> > > > >>>> > > >> I still sustain what I wrote before:
> > > > >>>> > > >> > I think I would still vote soft -1 on this option, but
> I
> > > > >>>> wouldn't
> > > > >>>> > block
> > > > >>>> > > >> it in case I am out-voted.
> > > > >>>> > > >>
> > > > >>>> > > >> > I think it might be helpful to agree on the definition
> of
> > > > >>>> optional
> > > > >>>> > in
> > > > >>>> > > >> our
> > > > >>>> > > >> case.
> > > > >>>> > > >>
> > > > >>>> > > >> For me it doesn't matter whether a default method
> throwing
> > an
> > > > >>>> > exception we
> > > > >>>> > > >> call optional or non-optional. As long as we keep it this
> > > way,
> > > > >>>> the
> > > > >>>> > effect
> > > > >>>> > > >> is the same. It's effectively a method that a user
> doesn't
> > > have
> > > > >>>> to
> > > > >>>> > > >> implement. If interface/system allows some methods to be
> > not
> > > > >>>> > implemented,
> > > > >>>> > > >> some users will do just that, regardless if we call it
> and
> > > > >>>> document as
> > > > >>>> > > >> non-optional. And at the same time it's clogging the base
> > > > >>>> interface.
> > > > >>>> > > >>
> > > > >>>> > > >> By the way, just the need for a java-doc/documentation
> > > > >>>> explaining the
> > > > >>>> > > >> existence of some construct is a bad smell (code should
> be
> > > > >>>> > > >> self-documenting
> > > > >>>> > > >> and default method throwing an
> > UnsupportedOperationException
> > > is
> > > > >>>> not).
> > > > >>>> > > >>
> > > > >>>> > > >> > Please note that so far we do not assume whether the
> > > feature
> > > > >>>> is in
> > > > >>>> > > >> > the original API or it is added later. A newly added
> > > feature
> > > > >>>> can
> > > > >>>> > also be
> > > > >>>> > > >> > non-optional, although it might take some time for all
> > the
> > > > >>>> pluggable
> > > > >>>> > > >> > developers to catch up, and they should still work if
> the
> > > new
> > > > >>>> > feature is
> > > > >>>> > > >> > not used until they catch up. In contrast, we may never
> > > > expect
> > > > >>>> an
> > > > >>>> > > >> optional
> > > > >>>> > > >> > feature to catch up, because leaving it unimplemented
> is
> > > also
> > > > >>>> > blessed.
> > > > >>>> > > >> >
> > > > >>>> > > >> > Let's take the checkpointing as an example. Imagine
> Flink
> > > did
> > > > >>>> not
> > > > >>>> > > >> support
> > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> trying
> > to
> > > > add
> > > > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > > > >>>> configuration
> > > > >>>> > to
> > > > >>>> > > >> the
> > > > >>>> > > >> > end users. In the meantime, will we tell the pluggable
> > > (e.g.
> > > > >>>> > operators,
> > > > >>>> > > >> > connectors) developers that methods like
> > "snapshotState()"
> > > is
> > > > >>>> > optional?
> > > > >>>> > > >> If
> > > > >>>> > > >> > we do that, the availability of checkpointing in Flink
> > > would
> > > > be
> > > > >>>> > severely
> > > > >>>> > > >> > weakened. But apparently we should still allow the
> > existing
> > > > >>>> > > >> implementations
> > > > >>>> > > >> > to work without checkpointing. It looks to me that
> adding
> > > the
> > > > >>>> > method to
> > > > >>>> > > >> the
> > > > >>>> > > >> > pluggable interfaces with a default implementation
> > throwing
> > > > >>>> > > >> > "UnsupportedOperationException" would be the solution
> > here.
> > > > >>>> Please
> > > > >>>> > note
> > > > >>>> > > >> > that in this case, having the default implementation
> does
> > > not
> > > > >>>> mean
> > > > >>>> > this
> > > > >>>> > > >> is
> > > > >>>> > > >> > optional. It is just the technique to support backwards
> > > > >>>> > compatibility in
> > > > >>>> > > >> > the feature evolution. The fact that this method is in
> > the
> > > > base
> > > > >>>> > > >> interface
> > > > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > > > >>>> implement it.
> > > > >>>> > > >>
> > > > >>>> > > >> I would soft vote -1 for having the default method
> throwing
> > > > >>>> > > >> UnsupportedOperationException as one of thing for this
> > > > (FLIP-217)
> > > > >>>> > special
> > > > >>>> > > >> circumstances.
> > > > >>>> > > >>
> > > > >>>> > > >> At the moment, without thinking this over too much, I
> would
> > > > vote
> > > > >>>> > harder -1
> > > > >>>> > > >> for having this as a general rule when adding new
> features.
> > > If
> > > > we
> > > > >>>> > ever end
> > > > >>>> > > >> up with an API that is littered with default methods
> > throwing
> > > > >>>> > > >> UnsupportedOperationException that are documented as "non
> > > > >>>> optional" it
> > > > >>>> > > >> would be IMO a big design failure. I would be ok-ish with
> > > that,
> > > > >>>> only
> > > > >>>> > if
> > > > >>>> > > >> that was a temporary thing and we had an aggressive plan
> to
> > > > >>>> release
> > > > >>>> > more
> > > > >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...)
> breaking
> > > API
> > > > >>>> > > >> compatibility, that would get rid of those default
> methods.
> > > > >>>> Adding
> > > > >>>> > > >> checkpointing and methods like "snapshotState()" would
> IMO
> > > > easily
> > > > >>>> > justify
> > > > >>>> > > >> a
> > > > >>>> > > >> new major Flink release. In that case we could add those
> > > > methods
> > > > >>>> with
> > > > >>>> > > >> default implementation for some transition period, a one
> or
> > > two
> > > > >>>> minor
> > > > >>>> > > >> releases, followed by a clean up in a major release.
> > However
> > > I
> > > > >>>> would
> > > > >>>> > still
> > > > >>>> > > >> argue that it would be cleaner/better to add a decorative
> > > > >>>> interface
> > > > >>>> > like
> > > > >>>> > > >> `CheckpointedOperator` interface instead of adding those
> > > > default
> > > > >>>> > methods
> > > > >>>> > > >> to
> > > > >>>> > > >> the base `Operator` interface.
> > > > >>>> > > >>
> > > > >>>> > > >> I think I can sum up our disagreement here is that I
> would
> > > like
> > > > >>>> to
> > > > >>>> > keep
> > > > >>>> > > >> the
> > > > >>>> > > >> interfaces simpler, with only obligatory methods/features
> > on
> > > > one
> > > > >>>> side
> > > > >>>> > and
> > > > >>>> > > >> clearly optional features on the other. While you would
> > like
> > > to
> > > > >>>> add an
> > > > >>>> > > >> extra third state in between those two?
> > > > >>>> > > >>
> > > > >>>> > > >> Best,
> > > > >>>> > > >> Piotrek
> > > > >>>> > > >>
> > > > >>>> > > >>
> > > > >>>> > > >>
> > > > >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <
> becket.qin@gmail.com
> > >
> > > > >>>> > napisał(a):
> > > > >>>> > > >>
> > > > >>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
> > > > >>>> > > >> >
> > > > >>>> > > >> > It looks like the key problem is still whether the
> > > > >>>> implementation of
> > > > >>>> > > >> > pausable splits in the Sources should be optional or
> not.
> > > > >>>> > > >> >
> > > > >>>> > > >> > I think it might be helpful to agree on the definition
> of
> > > > >>>> optional
> > > > >>>> > in
> > > > >>>> > > >> our
> > > > >>>> > > >> > case. To me:
> > > > >>>> > > >> > Optional = "You CAN leave the method unimplemented, and
> > > that
> > > > is
> > > > >>>> > fine."
> > > > >>>> > > >> > Non-Optional = "You CAN leave the method unimplemented,
> > but
> > > > you
> > > > >>>> > SHOULD
> > > > >>>> > > >> NOT,
> > > > >>>> > > >> > because people assume this works."
> > > > >>>> > > >> >
> > > > >>>> > > >> > I think one sufficient condition of a Non-Optional
> > feature
> > > is
> > > > >>>> that
> > > > >>>> > if
> > > > >>>> > > >> the
> > > > >>>> > > >> > feature is exposed through the framework API, Flink
> > should
> > > > >>>> expect
> > > > >>>> > the
> > > > >>>> > > >> > pluggables to support this feature by default.
> Otherwise
> > > the
> > > > >>>> > > >> availability
> > > > >>>> > > >> > of that feature becomes undefined.
> > > > >>>> > > >> >
> > > > >>>> > > >> > Please note that so far we do not assume whether the
> > > feature
> > > > >>>> is in
> > > > >>>> > > >> > the original API or it is added later. A newly added
> > > feature
> > > > >>>> can
> > > > >>>> > also be
> > > > >>>> > > >> > non-optional, although it might take some time for all
> > the
> > > > >>>> pluggable
> > > > >>>> > > >> > developers to catch up, and they should still work if
> the
> > > new
> > > > >>>> > feature is
> > > > >>>> > > >> > not used until they catch up. In contrast, we may never
> > > > expect
> > > > >>>> an
> > > > >>>> > > >> optional
> > > > >>>> > > >> > feature to catch up, because leaving it unimplemented
> is
> > > also
> > > > >>>> > blessed.
> > > > >>>> > > >> >
> > > > >>>> > > >> > Let's take the checkpointing as an example. Imagine
> Flink
> > > did
> > > > >>>> not
> > > > >>>> > > >> support
> > > > >>>> > > >> > checkpointing before release 1.16. And now we are
> trying
> > to
> > > > add
> > > > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > > > >>>> configuration
> > > > >>>> > to
> > > > >>>> > > >> the
> > > > >>>> > > >> > end users. In the meantime, will we tell the pluggable
> > > (e.g.
> > > > >>>> > operators,
> > > > >>>> > > >> > connectors) developers that methods like
> > "snapshotState()"
> > > is
> > > > >>>> > optional?
> > > > >>>> > > >> If
> > > > >>>> > > >> > we do that, the availability of checkpointing in Flink
> > > would
> > > > be
> > > > >>>> > severely
> > > > >>>> > > >> > weakened. But apparently we should still allow the
> > existing
> > > > >>>> > > >> implementations
> > > > >>>> > > >> > to work without checkpointing. It looks to me that
> adding
> > > the
> > > > >>>> > method to
> > > > >>>> > > >> the
> > > > >>>> > > >> > pluggable interfaces with a default implementation
> > throwing
> > > > >>>> > > >> > "UnsupportedOperationException" would be the solution
> > here.
> > > > >>>> Please
> > > > >>>> > note
> > > > >>>> > > >> > that in this case, having the default implementation
> does
> > > not
> > > > >>>> mean
> > > > >>>> > this
> > > > >>>> > > >> is
> > > > >>>> > > >> > optional. It is just the technique to support backwards
> > > > >>>> > compatibility in
> > > > >>>> > > >> > the feature evolution. The fact that this method is in
> > the
> > > > base
> > > > >>>> > > >> interface
> > > > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > > > >>>> implement it.
> > > > >>>> > > >> >
> > > > >>>> > > >> > When it comes to this FLIP, I think it meets the
> criteria
> > > of
> > > > >>>> > > >> non-optional
> > > > >>>> > > >> > features, so we should just use the evolution path of
> > > > >>>> non-optional
> > > > >>>> > > >> > features.
> > > > >>>> > > >> >
> > > > >>>> > > >> > Thanks,
> > > > >>>> > > >> >
> > > > >>>> > > >> > Jiangjie (Becket) Qin
> > > > >>>> > > >> >
> > > > >>>> > > >> >
> > > > >>>> > > >> >
> > > > >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > > > >>>> > pnowojski@apache.org>
> > > > >>>> > > >> > wrote:
> > > > >>>> > > >> >
> > > > >>>> > > >> > > Hi,
> > > > >>>> > > >> > >
> > > > >>>> > > >> > > Actually previously I thought about having a
> decorative
> > > > >>>> interface
> > > > >>>> > and
> > > > >>>> > > >> > > whenever watermark alignment is enabled, checking
> that
> > > the
> > > > >>>> source
> > > > >>>> > > >> > > implements the decorative interface. If not, throwing
> > an
> > > > >>>> > exception.
> > > > >>>> > > >> > >
> > > > >>>> > > >> > > The option with default methods in the source
> > interfaces
> > > > >>>> throwing
> > > > >>>> > > >> > > `UnsupportedOperationException` I think still suffers
> > > from
> > > > >>>> the
> > > > >>>> > same
> > > > >>>> > > >> > > problems I mentioned before. It's still an optional
> > > > >>>> implementation
> > > > >>>> > > >> and at
> > > > >>>> > > >> > > the same time it's clogging the base interface. I
> > think I
> > > > >>>> would
> > > > >>>> > still
> > > > >>>> > > >> > vote
> > > > >>>> > > >> > > soft -1 on this option, but I wouldn't block it in
> > case I
> > > > am
> > > > >>>> > > >> out-voted.
> > > > >>>> > > >> > >
> > > > >>>> > > >> > > Best,
> > > > >>>> > > >> > > Piotrek
> > > > >>>> > > >> > >
> > > > >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > > > >>>> > sebastian@ververica.com>
> > > > >>>> > > >> > > napisał(a):
> > > > >>>> > > >> > >
> > > > >>>> > > >> > > > Hi Becket,
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > > Thanks a lot for your fast and detailed response.
> For
> > > me,
> > > > >>>> it
> > > > >>>> > > >> converges
> > > > >>>> > > >> > > and
> > > > >>>> > > >> > > > dropping the supportsX method sounds very
> reasonable
> > to
> > > > me.
> > > > >>>> > (Side
> > > > >>>> > > >> note:
> > > > >>>> > > >> > > > With "pausable splits" enabled as "default" I think
> > we
> > > > >>>> > > >> misunderstood.
> > > > >>>> > > >> > As
> > > > >>>> > > >> > > > you described now "default" I understand as that it
> > > > should
> > > > >>>> be
> > > > >>>> > the
> > > > >>>> > > >> new
> > > > >>>> > > >> > > > recommended way of implementation, and I think that
> > is
> > > > >>>> fully
> > > > >>>> > valid.
> > > > >>>> > > >> > > Before,
> > > > >>>> > > >> > > > I understood "default" here as the default
> > > > implementation,
> > > > >>>> i.e.,
> > > > >>>> > > >> > throwing
> > > > >>>> > > >> > > > UnsupportedOperationException, which is the exact
> > > > >>>> opposite. :) )
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > > Nevertheless: As mentioned, an open question for me
> > is
> > > if
> > > > >>>> > watermark
> > > > >>>> > > >> > > > alignment should enforce pausable splits. For
> > > > >>>> clarification, the
> > > > >>>> > > >> > current
> > > > >>>> > > >> > > > documentation [1] says:
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning across
> > > tasks
> > > > >>>> of the
> > > > >>>> > same
> > > > >>>> > > >> > > >> source and/or different sources. It does not
> support
> > > > >>>> aligning
> > > > >>>> > > >> > > >> splits/partitions/shards in the same task.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> In a case where there are e.g. two Kafka
> partitions
> > > that
> > > > >>>> > produce
> > > > >>>> > > >> > > >> watermarks at different pace, that get assigned to
> > the
> > > > >>>> same
> > > > >>>> > task
> > > > >>>> > > >> > > watermark
> > > > >>>> > > >> > > >> might not behave as expected. Fortunately, worst
> > case
> > > it
> > > > >>>> > should not
> > > > >>>> > > >> > > perform
> > > > >>>> > > >> > > >> worse than without alignment.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> Given the limitation above, we suggest applying
> > > > watermark
> > > > >>>> > > >> alignment in
> > > > >>>> > > >> > > >> two situations:
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>    1. You have two different sources (e.g. Kafka
> and
> > > > >>>> File) that
> > > > >>>> > > >> > produce
> > > > >>>> > > >> > > >>    watermarks at different speeds
> > > > >>>> > > >> > > >>    2. You run your source with parallelism equal
> to
> > > the
> > > > >>>> number
> > > > >>>> > of
> > > > >>>> > > >> > > >>    splits/shards/partitions, which results in
> every
> > > > >>>> subtask
> > > > >>>> > being
> > > > >>>> > > >> > > assigned a
> > > > >>>> > > >> > > >>    single unit of work.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> I personally see no issue in implementing and I
> see
> > no
> > > > >>>> reason
> > > > >>>> > > >> against
> > > > >>>> > > >> > > > implementing this dependency of watermark alignment
> > and
> > > > >>>> pausable
> > > > >>>> > > >> > splits.
> > > > >>>> > > >> > > (I
> > > > >>>> > > >> > > > think this would even be a good path towards
> shaping
> > > > >>>> watermark
> > > > >>>> > > >> > alignment
> > > > >>>> > > >> > > in
> > > > >>>> > > >> > > > 1.16.) However, "I don't see" means that I would be
> > > happy
> > > > >>>> to
> > > > >>>> > hear
> > > > >>>> > > >> > Dawid's
> > > > >>>> > > >> > > > and Piotrek's opinions as they implemented
> watermark
> > > > >>>> alignment
> > > > >>>> > > >> based on
> > > > >>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
> > > > >>>> > rationale/background
> > > > >>>> > > >> > info
> > > > >>>> > > >> > > > from their side.
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > > Regards,
> > > > >>>> > > >> > > > Sebastian
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > > [1]
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > >
> > > > >>>> > > >> >
> > > > >>>> > > >>
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > > >>>> > > >> > > > [2]
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > >
> > > > >>>> > > >> >
> > > > >>>> > > >>
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > > > >>>> > becket.qin@gmail.com>
> > > > >>>> > > >> > wrote:
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > >> +dev
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> Hi Sebastian,
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> Thank you for the summary. Please see the detailed
> > > > replies
> > > > >>>> > inline.
> > > > >>>> > > >> As
> > > > >>>> > > >> > a
> > > > >>>> > > >> > > >> recap of my suggestions.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> 1. Pausable splits API.
> > > > >>>> > > >> > > >>   a) Add default implementations to methods
> > > > >>>> > "pauseOrResumeSplits"
> > > > >>>> > > >> in
> > > > >>>> > > >> > > both
> > > > >>>> > > >> > > >> SourceReader and SplitReader where both default
> > > > >>>> implementations
> > > > >>>> > > >> throw
> > > > >>>> > > >> > > >>  UnsupportedOperationException.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> 2. User story.
> > > > >>>> > > >> > > >>     a) We tell users to enable the watermark
> > alignment
> > > > as
> > > > >>>> they
> > > > >>>> > > >> like.
> > > > >>>> > > >> > > This
> > > > >>>> > > >> > > >> is exactly what the current Flink API is.
> > > > >>>> > > >> > > >>     b) We tell the source developers, please
> > implement
> > > > >>>> pausable
> > > > >>>> > > >> > splits,
> > > > >>>> > > >> > > >> otherwise bad things may happen. Think of it like
> > you
> > > > are
> > > > >>>> > expected
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >> implement SourceReader#snapshotState() properly,
> > > > otherwise
> > > > >>>> > > >> exceptions
> > > > >>>> > > >> > > will
> > > > >>>> > > >> > > >> be thrown when users enable checkpointing.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> Thanks,
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian
> Mattheis <
> > > > >>>> > > >> > > >> sebastian@ververica.com> wrote:
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>> Hi Becket, Hi everybody,
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> I'm sorry if I misread the messages but I could
> not
> > > > >>>> derive an
> > > > >>>> > > >> > agreement
> > > > >>>> > > >> > > >>> from the mailing list. Nevertheless, if I
> > understand
> > > > you
> > > > >>>> > right the
> > > > >>>> > > >> > > >>> suggestion is:
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> * Add default implementations to methods
> > > > >>>> > "pauseOrResumeSplits" in
> > > > >>>> > > >> > both
> > > > >>>> > > >> > > >>> SourceReader and SplitReader where both default
> > > > >>>> > implementations
> > > > >>>> > > >> throw
> > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >> Yes.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source
> > > > >>>> interface.
> > > > >>>> > (In
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>> following, I refer to supporting this as
> "pausable
> > > > >>>> splits".)
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >> We may no longer need this if pausable splits are
> > > > >>>> expected to
> > > > >>>> > be
> > > > >>>> > > >> > > >> implemented by the source developers, i.e.
> > > non-optional.
> > > > >>>> Having
> > > > >>>> > > >> this
> > > > >>>> > > >> > > method
> > > > >>>> > > >> > > >> would then be somewhat misleading as it looks like
> > the
> > > > >>>> sources
> > > > >>>> > > >> that do
> > > > >>>> > > >> > > not
> > > > >>>> > > >> > > >> support pausable splits are also acceptable in the
> > > long
> > > > >>>> term.
> > > > >>>> > So
> > > > >>>> > > >> API
> > > > >>>> > > >> > > wise,
> > > > >>>> > > >> > > >> I'd say maybe we should remove this for this FLIP,
> > > > >>>> although I
> > > > >>>> > > >> believe
> > > > >>>> > > >> > > this
> > > > >>>> > > >> > > >> supportXXX pattern itself is still attractive for
> > > > optional
> > > > >>>> > > >> features.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> To make the conclusions explicit:
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in
> > both
> > > > >>>> > interfaces
> > > > >>>> > > >> > > >>> SourceReader and SplitReader are optional where
> the
> > > > >>>> default is
> > > > >>>> > > >> that
> > > > >>>> > > >> > it
> > > > >>>> > > >> > > >>> doesn't support it. (--> This means that the
> > > > >>>> implementation is
> > > > >>>> > > >> still
> > > > >>>> > > >> > > >>> optional for the source developer.)
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >> It is optional for backwards compatibility with
> > > existing
> > > > >>>> > sources,
> > > > >>>> > > >> as
> > > > >>>> > > >> > > they
> > > > >>>> > > >> > > >> may still compile without code change. But
> starting
> > > from
> > > > >>>> this
> > > > >>>> > FLIP,
> > > > >>>> > > >> > > Flink
> > > > >>>> > > >> > > >> will always optimistically assume that all the
> > sources
> > > > >>>> support
> > > > >>>> > > >> > pausable
> > > > >>>> > > >> > > >> splits. If a source does not support pausable
> > splits,
> > > it
> > > > >>>> goes
> > > > >>>> > to an
> > > > >>>> > > >> > > error
> > > > >>>> > > >> > > >> handling path when watermark alignment is enabled
> on
> > > it.
> > > > >>>> This
> > > > >>>> > is
> > > > >>>> > > >> > > different
> > > > >>>> > > >> > > >> from a usual optional feature, where no error is
> > > > expected.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>> 2. If watermark alignment is enabled in the
> > > application
> > > > >>>> code
> > > > >>>> > by
> > > > >>>> > > >> > adding
> > > > >>>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy
> > while
> > > > >>>> > > >> SourceReader or
> > > > >>>> > > >> > > >>> SplitReader do not support pausableSplits, we
> throw
> > > an
> > > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >> Yes.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>> 3. With regard to your statement:
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>>> [...] basically means watermark alignment is an
> > > > >>>> non-optional
> > > > >>>> > > >> feature
> > > > >>>> > > >> > > to
> > > > >>>> > > >> > > >>>> the end users.
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> You actually mean that "pausable splits" are
> > > > >>>> non-optional for
> > > > >>>> > the
> > > > >>>> > > >> app
> > > > >>>> > > >> > > >>> developer if watermark alignment is enabled.
> > However,
> > > > >>>> > watermark
> > > > >>>> > > >> > > alignment
> > > > >>>> > > >> > > >>> is optional and can be enabled/disabled.
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >> Yes, watermark alignment can be enabled/disabled
> in
> > > > >>>> individual
> > > > >>>> > > >> sources
> > > > >>>> > > >> > > in
> > > > >>>> > > >> > > >> Flink jobs, which basically means the code
> > supporting
> > > > >>>> watermark
> > > > >>>> > > >> > > alignment
> > > > >>>> > > >> > > >> has to already be there. That again means the
> Source
> > > > >>>> > developers are
> > > > >>>> > > >> > also
> > > > >>>> > > >> > > >> expected to support pausable splits by default. So
> > > this
> > > > >>>> way we
> > > > >>>> > > >> > > essentially
> > > > >>>> > > >> > > >> tell the end users that you may enable / disable
> > this
> > > > >>>> feature
> > > > >>>> > as
> > > > >>>> > > >> you
> > > > >>>> > > >> > > wish,
> > > > >>>> > > >> > > >> and tell the source developers that you SHOULD
> > > implement
> > > > >>>> this
> > > > >>>> > > >> because
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >> end users may turn it on/off at will. And if the
> > > source
> > > > >>>> does
> > > > >>>> > not
> > > > >>>> > > >> > support
> > > > >>>> > > >> > > >> pausable splits, that goes to an error handling
> path
> > > > when
> > > > >>>> > watermark
> > > > >>>> > > >> > > >> alignment is enabled on it. So users know they
> have
> > to
> > > > >>>> > explicitly
> > > > >>>> > > >> > > exclude
> > > > >>>> > > >> > > >> this source.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> So far it's totally clear to me and I hope this
> is
> > > what
> > > > >>>> you
> > > > >>>> > mean.
> > > > >>>> > > >> I
> > > > >>>> > > >> > > also
> > > > >>>> > > >> > > >>> agree with both statements:
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> So making that expectation aligned with the
> source
> > > > >>>> developers
> > > > >>>> > > >> seems
> > > > >>>> > > >> > > >>>> reasonable.
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> I think this is a simple and clean solution from
> > both
> > > > >>>> the end
> > > > >>>> > user
> > > > >>>> > > >> > and
> > > > >>>> > > >> > > >>>> source developers' standpoint.
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> However, a last conclusion derives from 3. and is
> > an
> > > > open
> > > > >>>> > question
> > > > >>>> > > >> > for
> > > > >>>> > > >> > > >>> me:
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> 4. The feature of "pausable splits" is now
> tightly
> > > > bound
> > > > >>>> to
> > > > >>>> > > >> watermark
> > > > >>>> > > >> > > >>> alignment, i.e., if sources do not support
> > "pausable
> > > > >>>> splits"
> > > > >>>> > one
> > > > >>>> > > >> can
> > > > >>>> > > >> > > not
> > > > >>>> > > >> > > >>> enable watermark alignment for these sources.
> This
> > > > >>>> dependency
> > > > >>>> > is
> > > > >>>> > > >> not
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >>> current status of watermark alignment
> > implementation
> > > > >>>> because
> > > > >>>> > it
> > > > >>>> > > >> > is/was
> > > > >>>> > > >> > > >>> implemented without pausable splits. Do we want
> to
> > > > >>>> introduce
> > > > >>>> > this
> > > > >>>> > > >> > > >>> dependency? (This is an open question. I cannot
> > judge
> > > > >>>> that.)
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >> The watermark alignment basically relies on the
> > > pausable
> > > > >>>> > splits,
> > > > >>>> > > >> > right?
> > > > >>>> > > >> > > >> So personally I found it quite reasonable that if
> > the
> > > > >>>> source
> > > > >>>> > does
> > > > >>>> > > >> not
> > > > >>>> > > >> > > >> support pausable splits, end users cannot enable
> > > > watermark
> > > > >>>> > > >> alignment
> > > > >>>> > > >> > on
> > > > >>>> > > >> > > it.
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>
> > > > >>>> > > >> > > >>> If something is wrong, please correct me.
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> Regards,
> > > > >>>> > > >> > > >>> Sebastian
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> > > > >>>> > becket.qin@gmail.com>
> > > > >>>> > > >> > > wrote:
> > > > >>>> > > >> > > >>>
> > > > >>>> > > >> > > >>>> Hi Sebastian,
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>> Thanks for the reply and patient discussion. I
> > agree
> > > > >>>> this is
> > > > >>>> > a
> > > > >>>> > > >> > tricky
> > > > >>>> > > >> > > >>>> decision.
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about
> > Option
> > > > c)
> > > > >>>> > which I
> > > > >>>> > > >> see
> > > > >>>> > > >> > as
> > > > >>>> > > >> > > >>>>> follows:
> > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> implementation
> > > > >>>> makes the
> > > > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> > > default
> > > > >>>> > > >> > implementation
> > > > >>>> > > >> > > is and
> > > > >>>> > > >> > > >>>>> will remain a way of making implementation
> > optional
> > > > >>>> because
> > > > >>>> > > >> even in
> > > > >>>> > > >> > > future
> > > > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> > > flavor"
> > > > >>>> without
> > > > >>>> > > >> > support
> > > > >>>> > > >> > > for
> > > > >>>> > > >> > > >>>>> pausable splits.
> > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also find
> it
> > > > >>>> suboptimal
> > > > >>>> > > >> that
> > > > >>>> > > >> > > with
> > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no way
> to
> > > > check
> > > > >>>> at
> > > > >>>> > > >> runtime
> > > > >>>> > > >> > if
> > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> > > pausing.
> > > > >>>> (To
> > > > >>>> > do so,
> > > > >>>> > > >> > one
> > > > >>>> > > >> > > would
> > > > >>>> > > >> > > >>>>> need a supportsX method which makes it again
> more
> > > > >>>> > complicated.)\
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>> Based on the last few messages in the mailing
> > list.
> > > > >>>> Piotr
> > > > >>>> > and I
> > > > >>>> > > >> > > agreed
> > > > >>>> > > >> > > >>>> that the default implementation should just
> throw
> > an
> > > > >>>> > > >> > > >>>> UnsupportedOperationException if the source is
> > > > >>>> unpausable. So
> > > > >>>> > > >> this
> > > > >>>> > > >> > > >>>> basically tells the Source developers that this
> > > > feature
> > > > >>>> is
> > > > >>>> > > >> expected
> > > > >>>> > > >> > > to be
> > > > >>>> > > >> > > >>>> supported. Because we cannot prevent end users
> > from
> > > > >>>> putting
> > > > >>>> > an
> > > > >>>> > > >> > > unpausable
> > > > >>>> > > >> > > >>>> source into the watermark alignment group, that
> > > > >>>> basically
> > > > >>>> > means
> > > > >>>> > > >> > > watermark
> > > > >>>> > > >> > > >>>> alignment is an non-optional feature to the end
> > > users.
> > > > >>>> So
> > > > >>>> > making
> > > > >>>> > > >> > that
> > > > >>>> > > >> > > >>>> expectation aligned with the source developers
> > seems
> > > > >>>> > reasonable.
> > > > >>>> > > >> > And
> > > > >>>> > > >> > > if a
> > > > >>>> > > >> > > >>>> source does not support this feature, the end
> > users
> > > > >>>> should
> > > > >>>> > > >> > explicitly
> > > > >>>> > > >> > > >>>> remove that source from the watermark alignment
> > > group.
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>> Personally speaking I think this is a simple and
> > > clean
> > > > >>>> > solution
> > > > >>>> > > >> from
> > > > >>>> > > >> > > >>>> both the end user and source developers'
> > standpoint.
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>> Does this address your concerns?
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>> Thanks,
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian
> > Mattheis <
> > > > >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> > > > >>>> > > >> > > >>>>
> > > > >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> > > > >>>> suggestions/options
> > > > >>>> > and
> > > > >>>> > > >> we
> > > > >>>> > > >> > > >>>>> would be okay either way because we find
> neither
> > > > >>>> solution is
> > > > >>>> > > >> > perfect
> > > > >>>> > > >> > > just
> > > > >>>> > > >> > > >>>>> because of the already present complexity.
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
> > > > >>>> SourceReader
> > > > >>>> > and
> > > > >>>> > > >> > > >>>>> SplitReader
> > > > >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to be
> used
> > > by
> > > > >>>> > > >> SourceReader
> > > > >>>> > > >> > and
> > > > >>>> > > >> > > >>>>> SplitReader
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP
> [1],
> > > it
> > > > is
> > > > >>>> > based on
> > > > >>>> > > >> > > >>>>> Option c) which we find acceptable because the
> > > > >>>> complexity
> > > > >>>> > added
> > > > >>>> > > >> is
> > > > >>>> > > >> > > only a
> > > > >>>> > > >> > > >>>>> single method.
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about
> > Option
> > > > c)
> > > > >>>> > which I
> > > > >>>> > > >> see
> > > > >>>> > > >> > as
> > > > >>>> > > >> > > >>>>> follows:
> > > > >>>> > > >> > > >>>>> (1) An interface with default NOOP
> implementation
> > > > >>>> makes the
> > > > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> > > default
> > > > >>>> > > >> > implementation
> > > > >>>> > > >> > > is and
> > > > >>>> > > >> > > >>>>> will remain a way of making implementation
> > optional
> > > > >>>> because
> > > > >>>> > > >> even in
> > > > >>>> > > >> > > future
> > > > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> > > flavor"
> > > > >>>> without
> > > > >>>> > > >> > support
> > > > >>>> > > >> > > for
> > > > >>>> > > >> > > >>>>> pausable splits.
> > > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also find
> it
> > > > >>>> suboptimal
> > > > >>>> > > >> that
> > > > >>>> > > >> > > with
> > > > >>>> > > >> > > >>>>> a NOOP default implementation there is no way
> to
> > > > check
> > > > >>>> at
> > > > >>>> > > >> runtime
> > > > >>>> > > >> > if
> > > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> > > pausing.
> > > > >>>> (To
> > > > >>>> > do so,
> > > > >>>> > > >> > one
> > > > >>>> > > >> > > would
> > > > >>>> > > >> > > >>>>> need a supportsX method which makes it again
> more
> > > > >>>> > complicated.)
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> However, we haven't changed it because Option
> a)
> > is
> > > > >>>> also not
> > > > >>>> > > >> > optimal
> > > > >>>> > > >> > > >>>>> or straight-forward:
> > > > >>>> > > >> > > >>>>> (1) We need to add two distinct yet similar
> > > > decorative
> > > > >>>> > > >> interfaces
> > > > >>>> > > >> > > >>>>> since, as mentioned, the signatures of the
> > methods
> > > > are
> > > > >>>> > > >> different.
> > > > >>>> > > >> > For
> > > > >>>> > > >> > > >>>>> example, we would need decorative interfaces
> like
> > > > >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > > > >>>> > > >> > `SourceReaderWithPausableSplits`.
> > > > >>>> > > >> > > >>>>> (2) As a consequence, we would need to somehow
> > > > document
> > > > >>>> > > >> how/where
> > > > >>>> > > >> > to
> > > > >>>> > > >> > > >>>>> implement both interfaces and how this relates
> to
> > > > each
> > > > >>>> > other.
> > > > >>>> > > >> This
> > > > >>>> > > >> > > we could
> > > > >>>> > > >> > > >>>>> solve by adding a note in the interface of
> > > > >>>> SourceReader and
> > > > >>>> > > >> > > SplitReader and
> > > > >>>> > > >> > > >>>>> reference to the decorative interfaces but it
> > still
> > > > >>>> > increases
> > > > >>>> > > >> > > complexity
> > > > >>>> > > >> > > >>>>> too.
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> In summary, we see both as acceptable and
> > preferred
> > > > >>>> over
> > > > >>>> > other
> > > > >>>> > > >> > > >>>>> options. The question is if we can find a
> > solution
> > > or
> > > > >>>> > compromise
> > > > >>>> > > >> > > that is
> > > > >>>> > > >> > > >>>>> acceptable for everybody to reach consensus.
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> Please let us know what you think because we
> > would
> > > be
> > > > >>>> happy
> > > > >>>> > if
> > > > >>>> > > >> we
> > > > >>>> > > >> > can
> > > > >>>> > > >> > > >>>>> conclude the discussion to avoid dropping the
> > > > >>>> initiative on
> > > > >>>> > this
> > > > >>>> > > >> > > FLIP.
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> Regards,
> > > > >>>> > > >> > > >>>>> Sebastian
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> [1]
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > >
> > > > >>>> > > >> >
> > > > >>>> > > >>
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > > >>>> > > >> > > >>>>> (v. 12)
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski
> <
> > > > >>>> > > >> > pnowojski@apache.org
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > >>>>> wrote:
> > > > >>>> > > >> > > >>>>>
> > > > >>>> > > >> > > >>>>>> Hi Guowei,
> > > > >>>> > > >> > > >>>>>>
> > > > >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
> > > > >>>> > > >> > > >>>>>>
> > > > >>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1]
> which
> > > has
> > > > >>>> been
> > > > >>>> > > >> already
> > > > >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be
> > enabled
> > > > >>>> with the
> > > > >>>> > > >> > watermark
> > > > >>>> > > >> > > >>>>>> strategy which also configures drift and
> update
> > > > >>>> interval
> > > > >>>> > > >> > > >>>>>>
> > > > >>>> > > >> > > >>>>>> So by default watermark alignment is disabled,
> > > > >>>> regardless
> > > > >>>> > if a
> > > > >>>> > > >> > > source
> > > > >>>> > > >> > > >>>>>> supports it or not.
> > > > >>>> > > >> > > >>>>>>
> > > > >>>> > > >> > > >>>>>> Best,
> > > > >>>> > > >> > > >>>>>> Piotrek
> > > > >>>> > > >> > > >>>>>>
> > > > >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> > > > >>>> guowei.mgw@gmail.com>
> > > > >>>> > > >> > > napisał(a):
> > > > >>>> > > >> > > >>>>>>
> > > > >>>> > > >> > > >>>>>>> Hi,
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>> We know that in the case of Bounded input
> Flink
> > > > >>>> supports
> > > > >>>> > the
> > > > >>>> > > >> > Batch
> > > > >>>> > > >> > > >>>>>>> execution mode. Currently in Batch execution
> > > mode,
> > > > >>>> flink
> > > > >>>> > is
> > > > >>>> > > >> > > executed
> > > > >>>> > > >> > > >>>>>>> on a
> > > > >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps
> > > > watermark
> > > > >>>> > alignment
> > > > >>>> > > >> > > might
> > > > >>>> > > >> > > >>>>>>> not
> > > > >>>> > > >> > > >>>>>>> gain much.
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>> So my question is: Is watermark alignment the
> > > > default
> > > > >>>> > > >> > behavior(for
> > > > >>>> > > >> > > >>>>>>> implemented source only)? If so, have you
> > > > considered
> > > > >>>> > > >> evaluating
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> impact
> > > > >>>> > > >> > > >>>>>>> of this behavior on the Batch execution mode?
> > Or
> > > > >>>> thinks
> > > > >>>> > it is
> > > > >>>> > > >> not
> > > > >>>> > > >> > > >>>>>>> necessary.
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>> Correct me if I miss something.
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>> Best,
> > > > >>>> > > >> > > >>>>>>> Guowei
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr
> Nowojski <
> > > > >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > > > >>>> > > >> > > >>>>>>> wrote:
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > > I feel that no matter which option we
> > choose
> > > > >>>> this can
> > > > >>>> > not
> > > > >>>> > > >> be
> > > > >>>> > > >> > > >>>>>>> solved
> > > > >>>> > > >> > > >>>>>>> > entirely in either of the options, because
> of
> > > the
> > > > >>>> point
> > > > >>>> > > >> above
> > > > >>>> > > >> > and
> > > > >>>> > > >> > > >>>>>>> because
> > > > >>>> > > >> > > >>>>>>> > the signature of
> > > SplitReader#pauseOrResumeSplits
> > > > >>>> and
> > > > >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are
> slightly
> > > > >>>> different
> > > > >>>> > (one
> > > > >>>> > > >> > > >>>>>>> identifies
> > > > >>>> > > >> > > >>>>>>> > splits with splitId the other one passes
> the
> > > > splits
> > > > >>>> > > >> directly).
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case and
> for
> > > > >>>> features
> > > > >>>> > that
> > > > >>>> > > >> > need
> > > > >>>> > > >> > > >>>>>>> to be
> > > > >>>> > > >> > > >>>>>>> > implemented in more than one place.
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading
> > from
> > > a
> > > > >>>> split
> > > > >>>> > an
> > > > >>>> > > >> > > optional
> > > > >>>> > > >> > > >>>>>>> feature,
> > > > >>>> > > >> > > >>>>>>> > > other than that this was not included in
> > the
> > > > >>>> original
> > > > >>>> > > >> > > interface?
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > An additional argument in favor of making
> it
> > > > >>>> optional
> > > > >>>> > is to
> > > > >>>> > > >> > > >>>>>>> simplify source
> > > > >>>> > > >> > > >>>>>>> > implementation. But on its own I'm not sure
> > if
> > > > that
> > > > >>>> > would be
> > > > >>>> > > >> > > >>>>>>> enough to
> > > > >>>> > > >> > > >>>>>>> > justify making this feature optional.
> Maybe.
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> clearer
> > > to
> > > > >>>> just
> > > > >>>> > let
> > > > >>>> > > >> end
> > > > >>>> > > >> > > >>>>>>> users and
> > > > >>>> > > >> > > >>>>>>> > Flink
> > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will implement
> > this
> > > > >>>> feature.
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > As I wrote above that would be an
> interesting
> > > > >>>> choice to
> > > > >>>> > make
> > > > >>>> > > >> > > (ease
> > > > >>>> > > >> > > >>>>>>> of
> > > > >>>> > > >> > > >>>>>>> > implementation for new users, vs system
> > > > >>>> consistency).
> > > > >>>> > > >> > Regardless
> > > > >>>> > > >> > > >>>>>>> of that,
> > > > >>>> > > >> > > >>>>>>> > yes, for me the main argument is the API
> > > backward
> > > > >>>> > > >> > compatibility.
> > > > >>>> > > >> > > >>>>>>> But let's
> > > > >>>> > > >> > > >>>>>>> > clear a couple of points:
> > > > >>>> > > >> > > >>>>>>> > - The current proposal adding methods to
> the
> > > base
> > > > >>>> > interface
> > > > >>>> > > >> > with
> > > > >>>> > > >> > > >>>>>>> default
> > > > >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature.
> Same
> > as
> > > > the
> > > > >>>> > > >> decorative
> > > > >>>> > > >> > > >>>>>>> version
> > > > >>>> > > >> > > >>>>>>> > would be.
> > > > >>>> > > >> > > >>>>>>> > - Decorative version could implement "throw
> > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> > > > >>>> > > >> > > >>>>>>> > if user enabled watermark alignment just as
> > > well
> > > > >>>> and I
> > > > >>>> > agree
> > > > >>>> > > >> > > >>>>>>> that's a
> > > > >>>> > > >> > > >>>>>>> > better option compared to logging a
> warning.
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > Best,
> > > > >>>> > > >> > > >>>>>>> > Piotrek
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > > > >>>> > becket.qin@gmail.com>
> > > > >>>> > > >> > > >>>>>>> napisał(a):
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient
> > discussion,
> > > > >>>> Piotr and
> > > > >>>> > > >> Dawid.
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading
> > from
> > > a
> > > > >>>> split
> > > > >>>> > an
> > > > >>>> > > >> > > optional
> > > > >>>> > > >> > > >>>>>>> feature,
> > > > >>>> > > >> > > >>>>>>> > > other than that this was not included in
> > the
> > > > >>>> original
> > > > >>>> > > >> > > interface?
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > To be honest I am really worried about
> the
> > > > >>>> complexity
> > > > >>>> > of
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> user story
> > > > >>>> > > >> > > >>>>>>> > > here. Optional features like this have a
> > high
> > > > >>>> > overhead.
> > > > >>>> > > >> > Imagine
> > > > >>>> > > >> > > >>>>>>> this
> > > > >>>> > > >> > > >>>>>>> > > feature is optional, now a user enabled
> > > > watermark
> > > > >>>> > > >> alignment
> > > > >>>> > > >> > and
> > > > >>>> > > >> > > >>>>>>> defined a
> > > > >>>> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm,
> > > that
> > > > >>>> > depends on
> > > > >>>> > > >> > > >>>>>>> whether the
> > > > >>>> > > >> > > >>>>>>> > > involved Source has implmemented this
> > > feature.
> > > > >>>> If the
> > > > >>>> > > >> Sources
> > > > >>>> > > >> > > >>>>>>> are well
> > > > >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end
> users
> > > may
> > > > >>>> have to
> > > > >>>> > > >> look
> > > > >>>> > > >> > > into
> > > > >>>> > > >> > > >>>>>>> the code
> > > > >>>> > > >> > > >>>>>>> > > of the Source to see whether the feature
> is
> > > > >>>> supported.
> > > > >>>> > > >> Which
> > > > >>>> > > >> > is
> > > > >>>> > > >> > > >>>>>>> something
> > > > >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and
> clearer
> > > to
> > > > >>>> just
> > > > >>>> > let
> > > > >>>> > > >> end
> > > > >>>> > > >> > > >>>>>>> users and
> > > > >>>> > > >> > > >>>>>>> > Flink
> > > > >>>> > > >> > > >>>>>>> > > assume all the connectors will implement
> > this
> > > > >>>> feature.
> > > > >>>> > > >> After
> > > > >>>> > > >> > > all
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to the
> end
> > > > >>>> users. If
> > > > >>>> > in
> > > > >>>> > > >> some
> > > > >>>> > > >> > > >>>>>>> rare cases,
> > > > >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
> > > > >>>> > > >> > > >>>>>>> UnsupportedOperationException
> > > > >>>> > > >> > > >>>>>>> > will
> > > > >>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly
> > remove
> > > > this
> > > > >>>> > Source
> > > > >>>> > > >> from
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >>>>>>> > watermark
> > > > >>>> > > >> > > >>>>>>> > > group. I don't think we should have a
> > warning
> > > > >>>> message
> > > > >>>> > > >> here,
> > > > >>>> > > >> > as
> > > > >>>> > > >> > > >>>>>>> they tend
> > > > >>>> > > >> > > >>>>>>> > to
> > > > >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do this,
> we
> > > > >>>> don't even
> > > > >>>> > > >> need
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> > supportXXX
> > > > >>>> > > >> > > >>>>>>> > > method in the Source for this feature. In
> > > fact
> > > > >>>> this is
> > > > >>>> > > >> > exactly
> > > > >>>> > > >> > > >>>>>>> how many
> > > > >>>> > > >> > > >>>>>>> > > interfaces works today. For example,
> > > > >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > > >>>> > > >> > > >>>>>>> > not
> > > > >>>> > > >> > > >>>>>>> > > supported by Pravega source because it
> does
> > > not
> > > > >>>> > support
> > > > >>>> > > >> > partial
> > > > >>>> > > >> > > >>>>>>> failover.
> > > > >>>> > > >> > > >>>>>>> > > In that case, it simply throws an
> exception
> > > to
> > > > >>>> > trigger a
> > > > >>>> > > >> > global
> > > > >>>> > > >> > > >>>>>>> recovery.
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > The reason we add a default
> implementation
> > in
> > > > >>>> this
> > > > >>>> > case
> > > > >>>> > > >> would
> > > > >>>> > > >> > > >>>>>>> just for
> > > > >>>> > > >> > > >>>>>>> > the
> > > > >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so the
> old
> > > > >>>> source can
> > > > >>>> > > >> still
> > > > >>>> > > >> > > >>>>>>> compile.
> > > > >>>> > > >> > > >>>>>>> > Sure,
> > > > >>>> > > >> > > >>>>>>> > > in short term, this feature might not be
> > > > >>>> supported by
> > > > >>>> > many
> > > > >>>> > > >> > > >>>>>>> existing
> > > > >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite
> > visible
> > > to
> > > > >>>> the
> > > > >>>> > source
> > > > >>>> > > >> > > >>>>>>> developers
> > > > >>>> > > >> > > >>>>>>> > that
> > > > >>>> > > >> > > >>>>>>> > > they did not override the default impl
> > which
> > > > >>>> throws an
> > > > >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > @Dawid,
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method
> in
> > > the
> > > > >>>> Source
> > > > >>>> > > >> would
> > > > >>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> the single
> > > > >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to
> > implement
> > > > >>>> this
> > > > >>>> > > >> feature.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > I also don't find it entirely true. Half
> of
> > > the
> > > > >>>> > classes
> > > > >>>> > > >> are
> > > > >>>> > > >> > > >>>>>>> theoretically
> > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from
> the
> > > > >>>> point of
> > > > >>>> > view
> > > > >>>> > > >> how
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do
> not
> > > > need
> > > > >>>> to
> > > > >>>> > use
> > > > >>>> > > >> any
> > > > >>>> > > >> > of
> > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would
> be
> > > > weird
> > > > >>>> to
> > > > >>>> > list
> > > > >>>> > > >> > their
> > > > >>>> > > >> > > >>>>>>> methods in
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is
> > to
> > > > >>>> guide
> > > > >>>> > users
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> implement the
> > > > >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the
> > preferred
> > > > way
> > > > >>>> to
> > > > >>>> > > >> > implement a
> > > > >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning
> > that.
> > > > >>>> Even the
> > > > >>>> > > >> Java
> > > > >>>> > > >> > > >>>>>>> language
> > > > >>>> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
> > > > >>>> > implementations
> > > > >>>> > > >> [1]
> > > > >>>> > > >> > so
> > > > >>>> > > >> > > >>>>>>> people
> > > > >>>> > > >> > > >>>>>>> > can
> > > > >>>> > > >> > > >>>>>>> > > leverage them. But for this particular
> > case,
> > > if
> > > > >>>> we
> > > > >>>> > make
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> feature
> > > > >>>> > > >> > > >>>>>>> > > non-optional, we don't even need the
> > > > supportXXX()
> > > > >>>> > method
> > > > >>>> > > >> for
> > > > >>>> > > >> > > now.
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > Thanks,
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid
> > > > Wysakowicz <
> > > > >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > wrote:
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy
> with
> > > > >>>> whichever
> > > > >>>> > > >> option
> > > > >>>> > > >> > is
> > > > >>>> > > >> > > >>>>>>> agreed in
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > > discussion.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from
> the
> > > > >>>> discussion
> > > > >>>> > > >> > though:
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > @Becket:
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > The main argument for adding the
> methods
> > to
> > > > the
> > > > >>>> > > >> > SourceReader
> > > > >>>> > > >> > > >>>>>>> is that
> > > > >>>> > > >> > > >>>>>>> > > these
> > > > >>>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
> > > > >>>> SourceReader
> > > > >>>> > > >> impl,
> > > > >>>> > > >> > > i.e.
> > > > >>>> > > >> > > >>>>>>> > starting
> > > > >>>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders
> > impl
> > > > are
> > > > >>>> > expected
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> support this
> > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> implementations
> > > may
> > > > >>>> not
> > > > >>>> > have
> > > > >>>> > > >> > > >>>>>>> implemented this
> > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish
> > the
> > > > new
> > > > >>>> > features
> > > > >>>> > > >> > from
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > optional
> > > > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > > > >>>> interface is a
> > > > >>>> > > >> > solution
> > > > >>>> > > >> > > >>>>>>> to the
> > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not use it
> > for
> > > > the
> > > > >>>> > features
> > > > >>>> > > >> > that
> > > > >>>> > > >> > > >>>>>>> are
> > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > I don't think that this feature is
> > > > >>>> NON-OPTIONAL.
> > > > >>>> > Even
> > > > >>>> > > >> > though
> > > > >>>> > > >> > > >>>>>>> > preferred, I
> > > > >>>> > > >> > > >>>>>>> > > > still think it can be simply optional.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method
> > in
> > > > the
> > > > >>>> > Source
> > > > >>>> > > >> would
> > > > >>>> > > >> > > be
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > single
> > > > >>>> > > >> > > >>>>>>> > > > source of truth regarding how to
> > implement
> > > > this
> > > > >>>> > feature.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > I also don't find it entirely true.
> Half
> > of
> > > > the
> > > > >>>> > classes
> > > > >>>> > > >> are
> > > > >>>> > > >> > > >>>>>>> > theoretically
> > > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from
> the
> > > > >>>> point of
> > > > >>>> > view
> > > > >>>> > > >> how
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do
> not
> > > > need
> > > > >>>> to
> > > > >>>> > use
> > > > >>>> > > >> any
> > > > >>>> > > >> > of
> > > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would
> be
> > > > weird
> > > > >>>> to
> > > > >>>> > list
> > > > >>>> > > >> > their
> > > > >>>> > > >> > > >>>>>>> methods in
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > @Piotr
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > If we have all of the methods with
> > default
> > > > >>>> > > >> implementation
> > > > >>>> > > >> > in
> > > > >>>> > > >> > > >>>>>>> the base
> > > > >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any
> clue
> > to
> > > > >>>> the user
> > > > >>>> > > >> which
> > > > >>>> > > >> > > set
> > > > >>>> > > >> > > >>>>>>> of
> > > > >>>> > > >> > > >>>>>>> > methods
> > > > >>>> > > >> > > >>>>>>> > > > are required to be implemented at the
> > same
> > > > >>>> time.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > I feel that no matter which option we
> > > choose
> > > > >>>> this
> > > > >>>> > can
> > > > >>>> > > >> not
> > > > >>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> solved
> > > > >>>> > > >> > > >>>>>>> > > > entirely in either of the options,
> > because
> > > of
> > > > >>>> the
> > > > >>>> > point
> > > > >>>> > > >> > above
> > > > >>>> > > >> > > >>>>>>> and
> > > > >>>> > > >> > > >>>>>>> > because
> > > > >>>> > > >> > > >>>>>>> > > > the signature of
> > > > >>>> SplitReader#pauseOrResumeSplits and
> > > > >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are
> > > slightly
> > > > >>>> > different
> > > > >>>> > > >> > (one
> > > > >>>> > > >> > > >>>>>>> identifies
> > > > >>>> > > >> > > >>>>>>> > > > splits with splitId the other one
> passes
> > > the
> > > > >>>> splits
> > > > >>>> > > >> > > directly).
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Best,
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Dawid
> > > > >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the
> > > > >>>> decorative
> > > > >>>> > > >> > interfaces,
> > > > >>>> > > >> > > >>>>>>> but I do
> > > > >>>> > > >> > > >>>>>>> > > > think we should use them with caution.
> > The
> > > > main
> > > > >>>> > argument
> > > > >>>> > > >> > for
> > > > >>>> > > >> > > >>>>>>> adding the
> > > > >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that
> these
> > > > >>>> methods
> > > > >>>> > are
> > > > >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to
> SourceReader
> > > > impl,
> > > > >>>> i.e.
> > > > >>>> > > >> > starting
> > > > >>>> > > >> > > >>>>>>> from this
> > > > >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are
> > > expected
> > > > >>>> to
> > > > >>>> > support
> > > > >>>> > > >> > this
> > > > >>>> > > >> > > >>>>>>> > > > method, although some old
> implementations
> > > may
> > > > >>>> not
> > > > >>>> > have
> > > > >>>> > > >> > > >>>>>>> implemented this
> > > > >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish
> > the
> > > > new
> > > > >>>> > features
> > > > >>>> > > >> > from
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > optional
> > > > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > > > >>>> interface is a
> > > > >>>> > > >> > solution
> > > > >>>> > > >> > > >>>>>>> to the
> > > > >>>> > > >> > > >>>>>>> > > > optional features. We should not use it
> > for
> > > > the
> > > > >>>> > features
> > > > >>>> > > >> > that
> > > > >>>> > > >> > > >>>>>>> are
> > > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > That said, this feature is optional for
> > > > >>>> > SplitReaders.
> > > > >>>> > > >> > > Arguably
> > > > >>>> > > >> > > >>>>>>> we can
> > > > >>>> > > >> > > >>>>>>> > > have
> > > > >>>> > > >> > > >>>>>>> > > > a decorative interface for that, but
> for
> > > > >>>> simplicity
> > > > >>>> > and
> > > > >>>> > > >> > > >>>>>>> symmetry of the
> > > > >>>> > > >> > > >>>>>>> > > > interface, personally I prefer just
> > adding
> > > a
> > > > >>>> new
> > > > >>>> > method.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned
> > > about
> > > > >>>> the
> > > > >>>> > > >> decorative
> > > > >>>> > > >> > > >>>>>>> interfaces,
> > > > >>>> > > >> > > >>>>>>> > > > they would make sense if:
> > > > >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > > > >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative
> interface
> > > > >>>> involved
> > > > >>>> > for a
> > > > >>>> > > >> > > >>>>>>> feature.
> > > > >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the
> > methods
> > > > are
> > > > >>>> > grouped
> > > > >>>> > > >> > > >>>>>>> together will
> > > > >>>> > > >> > > >>>>>>> > not
> > > > >>>> > > >> > > >>>>>>> > > > stand.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Compared with that, I think the current
> > > > >>>> solution
> > > > >>>> > works
> > > > >>>> > > >> fine
> > > > >>>> > > >> > > in
> > > > >>>> > > >> > > >>>>>>> all
> > > > >>>> > > >> > > >>>>>>> > cases,
> > > > >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in
> > Source,
> > > > and
> > > > >>>> > default
> > > > >>>> > > >> > > >>>>>>> methods /
> > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces in base
> > interfaces.".
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > The advantages are:
> > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > > interface
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > In the current approach, the Java doc
> of
> > > the
> > > > >>>> > > >> SupportXXX()
> > > > >>>> > > >> > > >>>>>>> method in the
> > > > >>>> > > >> > > >>>>>>> > > > Source would be the single source of
> > truth
> > > > >>>> regarding
> > > > >>>> > > >> how to
> > > > >>>> > > >> > > >>>>>>> implement
> > > > >>>> > > >> > > >>>>>>> > > this
> > > > >>>> > > >> > > >>>>>>> > > > feature. It lists the method that has
> to
> > be
> > > > >>>> > implemented
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> support this
> > > > >>>> > > >> > > >>>>>>> > > > feature, regardless of how many
> classes /
> > > > >>>> > interfaces are
> > > > >>>> > > >> > > >>>>>>> involved.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > When implementing the base interface,
> > users
> > > > do
> > > > >>>> not
> > > > >>>> > need
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> implement a
> > > > >>>> > > >> > > >>>>>>> > > > method with default implementation. If
> > they
> > > > are
> > > > >>>> > curious
> > > > >>>> > > >> > what
> > > > >>>> > > >> > > >>>>>>> the method
> > > > >>>> > > >> > > >>>>>>> > > is
> > > > >>>> > > >> > > >>>>>>> > > > for, the java doc of that method simply
> > > > points
> > > > >>>> > users to
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > SupportXXX()
> > > > >>>> > > >> > > >>>>>>> > > > method in the Source. It seems not
> adding
> > > > work
> > > > >>>> to
> > > > >>>> > the
> > > > >>>> > > >> users
> > > > >>>> > > >> > > >>>>>>> compared
> > > > >>>> > > >> > > >>>>>>> > with
> > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much
> > > better
> > > > >>>> > > >> > discoverability.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > - all of the methods from a single
> > feature
> > > > are
> > > > >>>> > grouped
> > > > >>>> > > >> in a
> > > > >>>> > > >> > > >>>>>>> single
> > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with
> their
> > > > >>>> dedicated
> > > > >>>> > java
> > > > >>>> > > >> > doc.
> > > > >>>> > > >> > > >>>>>>> It's also
> > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help using
> > the
> > > > >>>> > decorator
> > > > >>>> > > >> name
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > - if an optional feature requires two
> > > methods
> > > > >>>> to be
> > > > >>>> > > >> > > >>>>>>> implemented at
> > > > >>>> > > >> > > >>>>>>> > once,
> > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > These two points are not true when
> > multiple
> > > > >>>> > components
> > > > >>>> > > >> and
> > > > >>>> > > >> > > >>>>>>> classes are
> > > > >>>> > > >> > > >>>>>>> > > > involved collaboratively to provide a
> > > > feature.
> > > > >>>> In
> > > > >>>> > our
> > > > >>>> > > >> case,
> > > > >>>> > > >> > > we
> > > > >>>> > > >> > > >>>>>>> have
> > > > >>>> > > >> > > >>>>>>> > both
> > > > >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved.
> > And
> > > > >>>> there
> > > > >>>> > might
> > > > >>>> > > >> be
> > > > >>>> > > >> > > other
> > > > >>>> > > >> > > >>>>>>> > > interfaces
> > > > >>>> > > >> > > >>>>>>> > > > on the JM side involved for some future
> > > > >>>> features.
> > > > >>>> > So the
> > > > >>>> > > >> > > >>>>>>> relevant
> > > > >>>> > > >> > > >>>>>>> > methods
> > > > >>>> > > >> > > >>>>>>> > > > can actually be scattered over the
> > places.
> > > > That
> > > > >>>> > said, we
> > > > >>>> > > >> > may
> > > > >>>> > > >> > > >>>>>>> still use
> > > > >>>> > > >> > > >>>>>>> > > > decorative interfaces for each
> component,
> > > if
> > > > >>>> the
> > > > >>>> > > >> feature is
> > > > >>>> > > >> > > >>>>>>> optional,
> > > > >>>> > > >> > > >>>>>>> > > given
> > > > >>>> > > >> > > >>>>>>> > > > there is a single source of truth for
> the
> > > > >>>> feature.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards
> making
> > > > life
> > > > >>>> > easier
> > > > >>>> > > >> for
> > > > >>>> > > >> > new
> > > > >>>> > > >> > > >>>>>>> users,
> > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the
> (imo)
> > > > >>>> slight
> > > > >>>> > expense
> > > > >>>> > > >> > for
> > > > >>>> > > >> > > >>>>>>> the power
> > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > I actually think the current approach
> is
> > > > >>>> simpler,
> > > > >>>> > more
> > > > >>>> > > >> > > >>>>>>> extensible and
> > > > >>>> > > >> > > >>>>>>> > > more
> > > > >>>> > > >> > > >>>>>>> > > > general for all the users. Can you
> > > articulate
> > > > >>>> a bit
> > > > >>>> > > >> more on
> > > > >>>> > > >> > > >>>>>>> which part
> > > > >>>> > > >> > > >>>>>>> > > you
> > > > >>>> > > >> > > >>>>>>> > > > think makes users harder to understand?
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > There is another benefit of the
> > decorative
> > > > >>>> > interfaces
> > > > >>>> > > >> which
> > > > >>>> > > >> > > is
> > > > >>>> > > >> > > >>>>>>> not
> > > > >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth
> considering
> > > > here.
> > > > >>>> > Usually
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> decorative
> > > > >>>> > > >> > > >>>>>>> > > > interfaces give slightly better
> backwards
> > > > >>>> > compatibility
> > > > >>>> > > >> > than
> > > > >>>> > > >> > > >>>>>>> the new
> > > > >>>> > > >> > > >>>>>>> > > > default method in the interfaces. That
> is
> > > > when
> > > > >>>> > users are
> > > > >>>> > > >> > > using
> > > > >>>> > > >> > > >>>>>>> a jar
> > > > >>>> > > >> > > >>>>>>> > that
> > > > >>>> > > >> > > >>>>>>> > > > was compiled with an older version of
> > Flink
> > > > >>>> which
> > > > >>>> > does
> > > > >>>> > > >> not
> > > > >>>> > > >> > > >>>>>>> have the
> > > > >>>> > > >> > > >>>>>>> > > default
> > > > >>>> > > >> > > >>>>>>> > > > method in the interfaces in question. A
> > > > >>>> decorative
> > > > >>>> > > >> > interface
> > > > >>>> > > >> > > >>>>>>> may still
> > > > >>>> > > >> > > >>>>>>> > > > provide backwards compatibility in that
> > > case,
> > > > >>>> while
> > > > >>>> > > >> default
> > > > >>>> > > >> > > >>>>>>> method impl
> > > > >>>> > > >> > > >>>>>>> > > > cannot.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do not
> > > > guarantee
> > > > >>>> > custom
> > > > >>>> > > >> > > >>>>>>> components
> > > > >>>> > > >> > > >>>>>>> > > compiled
> > > > >>>> > > >> > > >>>>>>> > > > with an older version can run with a
> > newer
> > > > >>>> version
> > > > >>>> > of
> > > > >>>> > > >> > Flink.
> > > > >>>> > > >> > > A
> > > > >>>> > > >> > > >>>>>>> > recompile
> > > > >>>> > > >> > > >>>>>>> > > > with a newer version would be required.
> > > That
> > > > >>>> said,
> > > > >>>> > if
> > > > >>>> > > >> we do
> > > > >>>> > > >> > > >>>>>>> care about
> > > > >>>> > > >> > > >>>>>>> > > > this, we can just change the
> > "supportXXX()"
> > > > >>>> method
> > > > >>>> > in
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> Source
> > > > >>>> > > >> > > >>>>>>> > > interface
> > > > >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave
> > the
> > > > >>>> other
> > > > >>>> > parts
> > > > >>>> > > >> > > >>>>>>> unchanged.
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Thanks,
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr
> > > > Nowojski <
> > > > >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > > wrote:
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > >> Hi,
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I
> was
> > > OoO
> > > > >>>> for
> > > > >>>> > the
> > > > >>>> > > >> last
> > > > >>>> > > >> > > two
> > > > >>>> > > >> > > >>>>>>> weeks.
> > > > >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be
> > > > actually
> > > > >>>> > against
> > > > >>>> > > >> > adding
> > > > >>>> > > >> > > >>>>>>> those
> > > > >>>> > > >> > > >>>>>>> > > methods
> > > > >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons
> > > > >>>> mentioned
> > > > >>>> > above.
> > > > >>>> > > >> > > >>>>>>> Clogging the
> > > > >>>> > > >> > > >>>>>>> > > base
> > > > >>>> > > >> > > >>>>>>> > > >> interface for new users with tons of
> > > methods
> > > > >>>> that
> > > > >>>> > they
> > > > >>>> > > >> do
> > > > >>>> > > >> > > not
> > > > >>>> > > >> > > >>>>>>> need, do
> > > > >>>> > > >> > > >>>>>>> > > not
> > > > >>>> > > >> > > >>>>>>> > > >> understand and do not know what to do
> > with
> > > > >>>> them.
> > > > >>>> > > >> Moreover,
> > > > >>>> > > >> > > >>>>>>> such
> > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a
> > > > feature
> > > > >>>> > requires
> > > > >>>> > > >> two
> > > > >>>> > > >> > > or
> > > > >>>> > > >> > > >>>>>>> more
> > > > >>>> > > >> > > >>>>>>> > > methods
> > > > >>>> > > >> > > >>>>>>> > > >> to be implemented at the same time. If
> > we
> > > > >>>> have all
> > > > >>>> > of
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> methods with
> > > > >>>> > > >> > > >>>>>>> > > >> default implementation in the base
> > > > interface,
> > > > >>>> the
> > > > >>>> > API
> > > > >>>> > > >> > > doesn't
> > > > >>>> > > >> > > >>>>>>> give any
> > > > >>>> > > >> > > >>>>>>> > > >> clue
> > > > >>>> > > >> > > >>>>>>> > > >> to the user which set of methods are
> > > > required
> > > > >>>> to be
> > > > >>>> > > >> > > >>>>>>> implemented at the
> > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > >>>> > > >> > > >>>>>>> > > >> time.
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of
> > > > decorative
> > > > >>>> > > >> interfaces
> > > > >>>> > > >> > is
> > > > >>>> > > >> > > >>>>>>> which
> > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > >>>> > > >> > > >>>>>>> > > >> > they can decorate and which
> > combinations
> > > > of
> > > > >>>> > multiple
> > > > >>>> > > >> > > >>>>>>> decorative
> > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> > > > >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new
> > feature
> > > > >>>> added
> > > > >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data
> > > > >>>> aware), are
> > > > >>>> > we
> > > > >>>> > > >> > going
> > > > >>>> > > >> > > >>>>>>> to create
> > > > >>>> > > >> > > >>>>>>> > > >> another
> > > > >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
> > > > >>>> > SortedSplitReader or
> > > > >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > > > >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the
> > > > >>>> additional
> > > > >>>> > > >> > decorative
> > > > >>>> > > >> > > >>>>>>> interface
> > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually
> increases
> > > the
> > > > >>>> > > >> understanding
> > > > >>>> > > >> > > >>>>>>> cost of
> > > > >>>> > > >> > > >>>>>>> > users
> > > > >>>> > > >> > > >>>>>>> > > >> > because they have to know what
> > > decorative
> > > > >>>> > interfaces
> > > > >>>> > > >> are
> > > > >>>> > > >> > > >>>>>>> there,
> > > > >>>> > > >> > > >>>>>>> > which
> > > > >>>> > > >> > > >>>>>>> > > >> > interface they can decorate and
> which
> > > > >>>> > combinations of
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> decorative
> > > > >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are
> > not.
> > > > >>>> Ideally
> > > > >>>> > we
> > > > >>>> > > >> want
> > > > >>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> avoid
> > > > >>>> > > >> > > >>>>>>> > that.
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how
> > embedding
> > > > >>>> default
> > > > >>>> > > >> methods
> > > > >>>> > > >> > > in
> > > > >>>> > > >> > > >>>>>>> the base
> > > > >>>> > > >> > > >>>>>>> > > >> interface is solving the problem: what
> > can
> > > > be
> > > > >>>> > combined
> > > > >>>> > > >> or
> > > > >>>> > > >> > > >>>>>>> not? If
> > > > >>>> > > >> > > >>>>>>> > there
> > > > >>>> > > >> > > >>>>>>> > > >> are
> > > > >>>> > > >> > > >>>>>>> > > >> two conflicting features, having
> > > decorative
> > > > >>>> > interfaces
> > > > >>>> > > >> > that
> > > > >>>> > > >> > > >>>>>>> can not be
> > > > >>>> > > >> > > >>>>>>> > > >> mixed together actually makes much
> more
> > > > sense
> > > > >>>> to me
> > > > >>>> > > >> rather
> > > > >>>> > > >> > > >>>>>>> than having
> > > > >>>> > > >> > > >>>>>>> > > >> them
> > > > >>>> > > >> > > >>>>>>> > > >> all in one base class. How would you
> > allow
> > > > >>>> users to
> > > > >>>> > > >> > > implement
> > > > >>>> > > >> > > >>>>>>> only one
> > > > >>>> > > >> > > >>>>>>> > > of
> > > > >>>> > > >> > > >>>>>>> > > >> those two features?
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there
> > are
> > > > >>>> > drawbacks:
> > > > >>>> > > >> > > >>>>>>> > > >> - how can a user discover what
> > decorators
> > > > are
> > > > >>>> > there?
> > > > >>>> > > >> > > >>>>>>> > > >> - how can a user know where the
> > decorator
> > > > can
> > > > >>>> be
> > > > >>>> > > >> applied
> > > > >>>> > > >> > to?
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for more
> > power
> > > > >>>> users,
> > > > >>>> > that
> > > > >>>> > > >> can
> > > > >>>> > > >> > > be
> > > > >>>> > > >> > > >>>>>>> > mitigated
> > > > >>>> > > >> > > >>>>>>> > > by
> > > > >>>> > > >> > > >>>>>>> > > >> the documentation. For example listing
> > all
> > > > of
> > > > >>>> the
> > > > >>>> > > >> > decorators
> > > > >>>> > > >> > > >>>>>>> with
> > > > >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the docs
> > and
> > > in
> > > > >>>> the
> > > > >>>> > java
> > > > >>>> > > >> > docs.
> > > > >>>> > > >> > > >>>>>>> More
> > > > >>>> > > >> > > >>>>>>> > > >> experienced users will be able to deal
> > > with
> > > > >>>> those
> > > > >>>> > > >> issues
> > > > >>>> > > >> > > >>>>>>> easier, as
> > > > >>>> > > >> > > >>>>>>> > they
> > > > >>>> > > >> > > >>>>>>> > > >> will already have some basic
> > understanding
> > > > of
> > > > >>>> > Flink.
> > > > >>>> > > >> Also
> > > > >>>> > > >> > if
> > > > >>>> > > >> > > >>>>>>> user has
> > > > >>>> > > >> > > >>>>>>> > a
> > > > >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he
> will
> > > > google
> > > > >>>> > search a
> > > > >>>> > > >> > > >>>>>>> potential
> > > > >>>> > > >> > > >>>>>>> > > solution
> > > > >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing
> > > that
> > > > >>>> he is
> > > > >>>> > very
> > > > >>>> > > >> > > likely
> > > > >>>> > > >> > > >>>>>>> to
> > > > >>>> > > >> > > >>>>>>> > > discover
> > > > >>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in
> > the
> > > > >>>> docs.
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> The advantages are:
> > > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > > interface
> > > > >>>> > > >> > > >>>>>>> > > >> - all of the methods from a single
> > feature
> > > > are
> > > > >>>> > grouped
> > > > >>>> > > >> in
> > > > >>>> > > >> > a
> > > > >>>> > > >> > > >>>>>>> single
> > > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with
> their
> > > > >>>> dedicated
> > > > >>>> > java
> > > > >>>> > > >> > doc.
> > > > >>>> > > >> > > >>>>>>> It's also
> > > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help using
> > the
> > > > >>>> > decorator
> > > > >>>> > > >> name
> > > > >>>> > > >> > > >>>>>>> > > >> - if an optional feature requires two
> > > > methods
> > > > >>>> to be
> > > > >>>> > > >> > > >>>>>>> implemented at
> > > > >>>> > > >> > > >>>>>>> > once,
> > > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards
> > making
> > > > life
> > > > >>>> > easier
> > > > >>>> > > >> for
> > > > >>>> > > >> > > new
> > > > >>>> > > >> > > >>>>>>> users,
> > > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the
> (imo)
> > > > >>>> slight
> > > > >>>> > expense
> > > > >>>> > > >> > for
> > > > >>>> > > >> > > >>>>>>> the power
> > > > >>>> > > >> > > >>>>>>> > > >> users.
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> Piotrek
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> > > > >>>> > > >> becket.qin@gmail.com
> > > > >>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > napisał(a):
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and
> > > Dawid,
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good
> summary.
> > > This
> > > > >>>> is a
> > > > >>>> > > >> really
> > > > >>>> > > >> > > >>>>>>> helpful
> > > > >>>> > > >> > > >>>>>>> > > >> discussion.
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it
> > > might
> > > > >>>> still
> > > > >>>> > be
> > > > >>>> > > >> > better
> > > > >>>> > > >> > > >>>>>>> to add the
> > > > >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source
> > > rather
> > > > >>>> than
> > > > >>>> > > >> > > SourceReader.
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are
> trying
> > > to
> > > > do
> > > > >>>> > here is
> > > > >>>> > > >> to
> > > > >>>> > > >> > > let
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > Flink
> > > > >>>> > > >> > > >>>>>>> > > >> > framework know what the Source is
> > > capable
> > > > >>>> of. In
> > > > >>>> > this
> > > > >>>> > > >> > > FLIP,
> > > > >>>> > > >> > > >>>>>>> it
> > > > >>>> > > >> > > >>>>>>> > happens
> > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> > > >> > be the capability that only involves
> > > > >>>> > SourceReader.
> > > > >>>> > > >> But
> > > > >>>> > > >> > in
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > future,
> > > > >>>> > > >> > > >>>>>>> > > >> it is
> > > > >>>> > > >> > > >>>>>>> > > >> > possible that another functionality
> > > > involves
> > > > >>>> > both the
> > > > >>>> > > >> > > >>>>>>> > SplitEnumerator
> > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case,
> following
> > > the
> > > > >>>> current
> > > > >>>> > > >> > > approach,
> > > > >>>> > > >> > > >>>>>>> we
> > > > >>>> > > >> > > >>>>>>> > should
> > > > >>>> > > >> > > >>>>>>> > > >> put
> > > > >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
> > > > >>>> > SplitEnumerator
> > > > >>>> > > >> and
> > > > >>>> > > >> > > >>>>>>> SourceReader.
> > > > >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
> > > > >>>> SourceReader,
> > > > >>>> > then
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> JM would
> > > > >>>> > > >> > > >>>>>>> > > have
> > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to
> know
> > > > >>>> whether
> > > > >>>> > this
> > > > >>>> > > >> > > feature
> > > > >>>> > > >> > > >>>>>>> is
> > > > >>>> > > >> > > >>>>>>> > > >> supported,
> > > > >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we
> put
> > > the
> > > > >>>> > > >> > "supportsXXX()"
> > > > >>>> > > >> > > >>>>>>> method in
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > >> > Source, we will break the
> "symmetric"
> > > > design
> > > > >>>> > because
> > > > >>>> > > >> > this
> > > > >>>> > > >> > > >>>>>>> FLIP
> > > > >>>> > > >> > > >>>>>>> > chose a
> > > > >>>> > > >> > > >>>>>>> > > >> > different way.
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > This is also why I think
> supportsXXX()
> > > > >>>> method
> > > > >>>> > seems a
> > > > >>>> > > >> > good
> > > > >>>> > > >> > > >>>>>>> thing to
> > > > >>>> > > >> > > >>>>>>> > > >> have,
> > > > >>>> > > >> > > >>>>>>> > > >> > because when there are a few
> > interfaces
> > > /
> > > > >>>> methods
> > > > >>>> > > >> that
> > > > >>>> > > >> > are
> > > > >>>> > > >> > > >>>>>>> expected
> > > > >>>> > > >> > > >>>>>>> > to
> > > > >>>> > > >> > > >>>>>>> > > >> be
> > > > >>>> > > >> > > >>>>>>> > > >> > implemented at the same time in
> order
> > to
> > > > >>>> deliver
> > > > >>>> > a
> > > > >>>> > > >> > > feature,
> > > > >>>> > > >> > > >>>>>>> it is
> > > > >>>> > > >> > > >>>>>>> > > always
> > > > >>>> > > >> > > >>>>>>> > > >> > good to have a single source of
> truth
> > to
> > > > >>>> tell the
> > > > >>>> > > >> > > framework
> > > > >>>> > > >> > > >>>>>>> what to
> > > > >>>> > > >> > > >>>>>>> > > do,
> > > > >>>> > > >> > > >>>>>>> > > >> so
> > > > >>>> > > >> > > >>>>>>> > > >> > the framework can do consistent
> things
> > > in
> > > > >>>> > different
> > > > >>>> > > >> > parts.
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> > > > >>>> sebastian@ververica.com>
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> > > > >>>> > > >> AlignedSourceReader
> > > > >>>> > > >> > +
> > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel
> > awkward
> > > > >>>> about is
> > > > >>>> > > >> that we
> > > > >>>> > > >> > > are
> > > > >>>> > > >> > > >>>>>>> > > essentially
> > > > >>>> > > >> > > >>>>>>> > > >> > expecting almost all the
> SourceReader
> > > > >>>> > > >> implementations to
> > > > >>>> > > >> > > >>>>>>> extend
> > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively
> > > makes
> > > > >>>> the
> > > > >>>> > > >> > SourceReader
> > > > >>>> > > >> > > >>>>>>> interface
> > > > >>>> > > >> > > >>>>>>> > > >> > without the pausing support useless.
> > So
> > > > this
> > > > >>>> > > >> indicates
> > > > >>>> > > >> > > that
> > > > >>>> > > >> > > >>>>>>> public
> > > > >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or
> > sub-interfaces
> > > > >>>> for the
> > > > >>>> > same
> > > > >>>> > > >> > > >>>>>>> purpose) only
> > > > >>>> > > >> > > >>>>>>> > > >> > make sense if the original interface
> > is
> > > > also
> > > > >>>> > > >> expected to
> > > > >>>> > > >> > > be
> > > > >>>> > > >> > > >>>>>>> used.
> > > > >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense
> > to
> > > > add
> > > > >>>> the
> > > > >>>> > > >> method
> > > > >>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > original
> > > > >>>> > > >> > > >>>>>>> > > >> > interface itself.
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > Cheers,
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM
> Dawid
> > > > >>>> Wysakowicz
> > > > >>>> > <
> > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice
> > > summary.
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > I think most of your points
> aligned
> > > with
> > > > >>>> the
> > > > >>>> > > >> > suggestions
> > > > >>>> > > >> > > >>>>>>> I made to
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your
> > > reply
> > > > (I
> > > > >>>> > believe
> > > > >>>> > > >> we
> > > > >>>> > > >> > > hit
> > > > >>>> > > >> > > >>>>>>> enter
> > > > >>>> > > >> > > >>>>>>> > > >> nearly at
> > > > >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > > > >>>> > > >> supportsWatermarksSplitAlignment
> > > > >>>> > > >> > > to
> > > > >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express
> the
> > > > >>>> general
> > > > >>>> > > >> > capability
> > > > >>>> > > >> > > of
> > > > >>>> > > >> > > >>>>>>> > pausing.
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > > > >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > > >>>> > > >> > > >>>>>>> > > (option
> > > > >>>> > > >> > > >>>>>>> > > >> b)
> > > > >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the
> methods
> > > > >>>> (option
> > > > >>>> > c), I
> > > > >>>> > > >> > > >>>>>>> suggest to
> > > > >>>> > > >> > > >>>>>>> > simply
> > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is
> > much
> > > > >>>> > preferred
> > > > >>>> > > >> > > approach
> > > > >>>> > > >> > > >>>>>>> Becket,
> > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless there
> > is
> > > an
> > > > >>>> > opposition
> > > > >>>> > > >> > > let's
> > > > >>>> > > >> > > >>>>>>> go with
> > > > >>>> > > >> > > >>>>>>> > > this
> > > > >>>> > > >> > > >>>>>>> > > >> > > option c.
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Dawid
> > > > >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian
> > > Mattheis
> > > > >>>> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many
> > > > thanks
> > > > >>>> for
> > > > >>>> > all
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> input and
> > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I
> > > needed
> > > > >>>> some
> > > > >>>> > time
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> catch up and
> > > > >>>> > > >> > > >>>>>>> > > >> > structure
> > > > >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I
> > > wanted
> > > > >>>> to
> > > > >>>> > find a
> > > > >>>> > > >> way
> > > > >>>> > > >> > > to
> > > > >>>> > > >> > > >>>>>>> > structure
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had
> the
> > > > >>>> feeling
> > > > >>>> > that
> > > > >>>> > > >> some
> > > > >>>> > > >> > > >>>>>>> concerns
> > > > >>>> > > >> > > >>>>>>> > got
> > > > >>>> > > >> > > >>>>>>> > > >> lost
> > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my
> attempt
> > > and
> > > > >>>> please
> > > > >>>> > > >> correct
> > > > >>>> > > >> > > me
> > > > >>>> > > >> > > >>>>>>> if
> > > > >>>> > > >> > > >>>>>>> > > >> something is
> > > > >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to
> > > > >>>> collect and
> > > > >>>> > > >> > assemble
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > > opinions,
> > > > >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to
> the
> > > > best
> > > > >>>> of my
> > > > >>>> > > >> > > >>>>>>> knowledge):
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
> > > > >>>> (pause/resume
> > > > >>>> > > >> > behavior)
> > > > >>>> > > >> > > >>>>>>> be a
> > > > >>>> > > >> > > >>>>>>> > general
> > > > >>>> > > >> > > >>>>>>> > > >> > > capability?
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any reason
> no
> > > to
> > > > >>>> have
> > > > >>>> > it a
> > > > >>>> > > >> > > general
> > > > >>>> > > >> > > >>>>>>> > capability
> > > > >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit method
> it
> > > is
> > > > >>>> > actually
> > > > >>>> > > >> > > >>>>>>> independent of
> > > > >>>> > > >> > > >>>>>>> > the
> > > > >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to
> have
> > > it
> > > > a
> > > > >>>> > general
> > > > >>>> > > >> > > >>>>>>> capability, we
> > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > >>>> > > >> > > >>>>>>> > > >> > > also agree on the right wording.
> > Does
> > > > >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > > > >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is
> > then
> > > > >>>> actually
> > > > >>>> > > >> meant?
> > > > >>>> > > >> > (I
> > > > >>>> > > >> > > >>>>>>> see it as
> > > > >>>> > > >> > > >>>>>>> > > >> okay.
> > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst
> > > Arvid
> > > > >>>> > suggested
> > > > >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
> > > > >>>> > enable/disable
> > > > >>>> > > >> split
> > > > >>>> > > >> > > >>>>>>> alignment?
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > I would personally not disable the
> > > split
> > > > >>>> > alignment
> > > > >>>> > > >> on
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >>>>>>> source
> > > > >>>> > > >> > > >>>>>>> > > >> reader
> > > > >>>> > > >> > > >>>>>>> > > >> > > side because if split alignment is
> > > used
> > > > >>>> for
> > > > >>>> > some
> > > > >>>> > > >> other
> > > > >>>> > > >> > > >>>>>>> use case
> > > > >>>> > > >> > > >>>>>>> > (see
> > > > >>>> > > >> > > >>>>>>> > > >> A)
> > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > >>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
> > > > >>>> other/future
> > > > >>>> > use
> > > > >>>> > > >> > cases.
> > > > >>>> > > >> > > >>>>>>> Instead,
> > > > >>>> > > >> > > >>>>>>> > I
> > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split
> alignment"
> > > > where
> > > > >>>> I
> > > > >>>> > think
> > > > >>>> > > >> it
> > > > >>>> > > >> > > >>>>>>> should
> > > > >>>> > > >> > > >>>>>>> > disable
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for
> > split
> > > > >>>> > alignment.
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX
> > > > method?
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define the
> > > scope
> > > > >>>> of a
> > > > >>>> > > >> supportsX
> > > > >>>> > > >> > > >>>>>>> method
> > > > >>>> > > >> > > >>>>>>> > w.r.t.
> > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> > > >> > > the following questions: a) Where
> is
> > > it
> > > > >>>> used?
> > > > >>>> > and
> > > > >>>> > > >> b)
> > > > >>>> > > >> > > What
> > > > >>>> > > >> > > >>>>>>> is the
> > > > >>>> > > >> > > >>>>>>> > > >> expected
> > > > >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not
> > > straight-forward
> > > > >>>> to
> > > > >>>> > > >> provide a
> > > > >>>> > > >> > > >>>>>>> meaningful
> > > > >>>> > > >> > > >>>>>>> > > >> output,
> > > > >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports
> split
> > > > >>>> alignment
> > > > >>>> > but
> > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > >>>> > > >> > > >>>>>>> > not.
> > > > >>>> > > >> > > >>>>>>> > > >> This
> > > > >>>> > > >> > > >>>>>>> > > >> > is
> > > > >>>> > > >> > > >>>>>>> > > >> > > because with the current
> > > implementation,
> > > > >>>> we can
> > > > >>>> > > >> > > determine
> > > > >>>> > > >> > > >>>>>>> whether
> > > > >>>> > > >> > > >>>>>>> > > >> split
> > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported only
> > > during
> > > > >>>> > runtime
> > > > >>>> > > >> and
> > > > >>>> > > >> > > >>>>>>> specifically
> > > > >>>> > > >> > > >>>>>>> > > >> > actually
> > > > >>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down
> > the
> > > > >>>> call
> > > > >>>> > > >> hierarchy
> > > > >>>> > > >> > up
> > > > >>>> > > >> > > >>>>>>> to the
> > > > >>>> > > >> > > >>>>>>> > > actual
> > > > >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to
> either
> > > > >>>> raise an
> > > > >>>> > > >> error or
> > > > >>>> > > >> > > >>>>>>> warning if
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not
> > supported
> > > at
> > > > >>>> some
> > > > >>>> > > >> point. I
> > > > >>>> > > >> > > >>>>>>> know we
> > > > >>>> > > >> > > >>>>>>> > > should
> > > > >>>> > > >> > > >>>>>>> > > >> > > carefully think about when this
> > could
> > > be
> > > > >>>> the
> > > > >>>> > case
> > > > >>>> > > >> > > because
> > > > >>>> > > >> > > >>>>>>> we don't
> > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings.
> > > > However,
> > > > >>>> > warnings
> > > > >>>> > > >> > > could
> > > > >>>> > > >> > > >>>>>>> be an
> > > > >>>> > > >> > > >>>>>>> > > >> indicator
> > > > >>>> > > >> > > >>>>>>> > > >> > > for the user that for watermark
> > split
> > > > >>>> > alignment use
> > > > >>>> > > >> > case
> > > > >>>> > > >> > > >>>>>>> split
> > > > >>>> > > >> > > >>>>>>> > > >> reading is
> > > > >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to
> > > either
> > > > >>>> > disable
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> trigger for
> > > > >>>> > > >> > > >>>>>>> > > >> > watermark
> > > > >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
> > > > >>>> > use/implement a
> > > > >>>> > > >> > source
> > > > >>>> > > >> > > >>>>>>> and
> > > > >>>> > > >> > > >>>>>>> > reader
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the
> > discussion
> > > > >>>> with the
> > > > >>>> > the
> > > > >>>> > > >> > > various
> > > > >>>> > > >> > > >>>>>>> > > >> possibilities
> > > > >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and
> > > emails, I
> > > > >>>> would
> > > > >>>> > > >> like to
> > > > >>>> > > >> > > >>>>>>> summarize
> > > > >>>> > > >> > > >>>>>>> > the
> > > > >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> > > > >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
> > > > >>>> > ("symmetric"),
> > > > >>>> > > >> i.e.,
> > > > >>>> > > >> > > >>>>>>> similar
> > > > >>>> > > >> > > >>>>>>> > > >> semantics
> > > > >>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces
> with
> > > > >>>> similar
> > > > >>>> > usage.
> > > > >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which
> > implementations
> > > > >>>> implement
> > > > >>>> > > >> > > >>>>>>> interfaces/support
> > > > >>>> > > >> > > >>>>>>> > > >> > > behavior.
> > > > >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
> > > > >>>> implementations
> > > > >>>> > and
> > > > >>>> > > >> how
> > > > >>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> implement
> > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the
> > > > relations
> > > > >>>> > between
> > > > >>>> > > >> > > >>>>>>> relevant
> > > > >>>> > > >> > > >>>>>>> > classes
> > > > >>>> > > >> > > >>>>>>> > > of
> > > > >>>> > > >> > > >>>>>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> > > > >>>> SourceReaderBase
> > > > >>>> > > >> (Internal)
> > > > >>>> > > >> > > >>>>>>> <|-- ..
> > > > >>>> > > >> > > >>>>>>> > <|--
> > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>--
> > > SplitFetcherManager
> > > > >>>> > (Internal)
> > > > >>>> > > >> > <>--
> > > > >>>> > > >> > > >>>>>>> > SplitFetcher
> > > > >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader
> (Public)
> > > > <|--
> > > > >>>> > > >> > MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A
> <>--
> > > B:
> > > > A
> > > > >>>> "has
> > > > >>>> > a"
> > > > >>>> > > >> B)
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> > > > >>>> > SplitFetcherManager
> > > > >>>> > > >> > > >>>>>>> implement most
> > > > >>>> > > >> > > >>>>>>> > of
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just
> > > because
> > > > >>>> we
> > > > >>>> > wanted
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> implement
> > > > >>>> > > >> > > >>>>>>> > > split
> > > > >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be
> > > available
> > > > as
> > > > >>>> > kind of
> > > > >>>> > > >> a
> > > > >>>> > > >> > > >>>>>>> default. As a
> > > > >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
> > > > >>>> > implementation" for
> > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > >>>> > > >> > > >>>>>>> > > that
> > > > >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic for
> > split
> > > > >>>> > alignment.
> > > > >>>> > > >> For
> > > > >>>> > > >> > > that
> > > > >>>> > > >> > > >>>>>>> reason,
> > > > >>>> > > >> > > >>>>>>> > I
> > > > >>>> > > >> > > >>>>>>> > > >> find
> > > > >>>> > > >> > > >>>>>>> > > >> > it
> > > > >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP
> > default
> > > > >>>> > > >> implementation
> > > > >>>> > > >> > in
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > > interface
> > > > >>>> > > >> > > >>>>>>> > > >> for
> > > > >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a
> consequence,
> > > > >>>> interface
> > > > >>>> > > >> strategy
> > > > >>>> > > >> > > c)
> > > > >>>> > > >> > > >>>>>>> is
> > > > >>>> > > >> > > >>>>>>> > > difficult
> > > > >>>> > > >> > > >>>>>>> > > >> > > because this would require NOOP
> > > default
> > > > >>>> > > >> > implementations
> > > > >>>> > > >> > > >>>>>>> in the
> > > > >>>> > > >> > > >>>>>>> > > public
> > > > >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and
> > > > >>>> SplitReader.
> > > > >>>> > This is
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> same for
> > > > >>>> > > >> > > >>>>>>> > > >> strategy
> > > > >>>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP
> > > default
> > > > >>>> > > >> > implementation
> > > > >>>> > > >> > > >>>>>>> in the
> > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid
> > > > described
> > > > >>>> > method
> > > > >>>> > > >> > > >>>>>>> signatures of
> > > > >>>> > > >> > > >>>>>>> > > >> alignSplit
> > > > >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader
> > > differ
> > > > >>>> and it
> > > > >>>> > > >> would
> > > > >>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> extremely
> > > > >>>> > > >> > > >>>>>>> > > >> > difficult
> > > > >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the same
> > (with
> > > > even
> > > > >>>> > > >> potential
> > > > >>>> > > >> > > >>>>>>> performance
> > > > >>>> > > >> > > >>>>>>> > > >> impact
> > > > >>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of
> > > split
> > > > >>>> ids).
> > > > >>>> > > >> > Therefore,
> > > > >>>> > > >> > > >>>>>>> having a
> > > > >>>> > > >> > > >>>>>>> > > >> > symmetric
> > > > >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of
> strategy
> > a)
> > > > is
> > > > >>>> > actually
> > > > >>>> > > >> not
> > > > >>>> > > >> > > >>>>>>> possible
> > > > >>>> > > >> > > >>>>>>> > and
> > > > >>>> > > >> > > >>>>>>> > > >> > having
> > > > >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with
> > > different
> > > > >>>> method
> > > > >>>> > > >> > > >>>>>>> signatures is
> > > > >>>> > > >> > > >>>>>>> > > >> confusing.
> > > > >>>> > > >> > > >>>>>>> > > >> > My
> > > > >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best
> with
> > > > >>>> strategy b)
> > > > >>>> > > >> which
> > > > >>>> > > >> > > >>>>>>> means to
> > > > >>>> > > >> > > >>>>>>> > have
> > > > >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that
> > > inherit
> > > > >>>> from
> > > > >>>> > the
> > > > >>>> > > >> > parent
> > > > >>>> > > >> > > >>>>>>> > interface:
> > > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|--
> > AlignedSourceReader,
> > > > >>>> > SplitReader
> > > > >>>> > > >> > <|--
> > > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100%
> sure
> > > what
> > > > >>>> the
> > > > >>>> > > >> > > implications
> > > > >>>> > > >> > > >>>>>>> are and
> > > > >>>> > > >> > > >>>>>>> > if
> > > > >>>> > > >> > > >>>>>>> > > >> this
> > > > >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest
> > that
> > > > >>>> Dawid
> > > > >>>> > and I
> > > > >>>> > > >> just
> > > > >>>> > > >> > > >>>>>>> try to
> > > > >>>> > > >> > > >>>>>>> > > >> implement
> > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > # Summary
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know
> > your
> > > > >>>> > > >> perspectives.
> > > > >>>> > > >> > > >>>>>>> Please
> > > > >>>> > > >> > > >>>>>>> > correct
> > > > >>>> > > >> > > >>>>>>> > > >> me,
> > > > >>>> > > >> > > >>>>>>> > > >> > if
> > > > >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I
> > > misunderstood
> > > > >>>> > > >> something. My
> > > > >>>> > > >> > > >>>>>>> perspective
> > > > >>>> > > >> > > >>>>>>> > > >> would
> > > > >>>> > > >> > > >>>>>>> > > >> > be:
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > > > >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark
> > > > trigger
> > > > >>>> for
> > > > >>>> > split
> > > > >>>> > > >> > > >>>>>>> alignment)
> > > > >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> > > > >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM
> > Dawid
> > > > >>>> > Wysakowicz <
> > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > > wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > capability
> > > > >>>> > approach, I
> > > > >>>> > > >> > fear
> > > > >>>> > > >> > > >>>>>>> that it
> > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> > Sources
> > > > can
> > > > >>>> > always
> > > > >>>> > > >> be
> > > > >>>> > > >> > > >>>>>>> aligned
> > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > > alignment.
> > > > >>>> So
> > > > >>>> > it's
> > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even
> though
> > > > they
> > > > >>>> do
> > > > >>>> > not
> > > > >>>> > > >> use
> > > > >>>> > > >> > any
> > > > >>>> > > >> > > >>>>>>> of the
> > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> > > > issue a
> > > > >>>> > > >> warning, if
> > > > >>>> > > >> > > >>>>>>> they handle
> > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method
> > > would
> > > > >>>> be for
> > > > >>>> > > >> telling
> > > > >>>> > > >> > > if
> > > > >>>> > > >> > > >>>>>>> it
> > > > >>>> > > >> > > >>>>>>> > > supports
> > > > >>>> > > >> > > >>>>>>> > > >> > that
> > > > >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task
> > > > >>>> alignment
> > > > >>>> > would
> > > > >>>> > > >> > always
> > > > >>>> > > >> > > >>>>>>> be
> > > > >>>> > > >> > > >>>>>>> > > supported.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I
> > > believe
> > > > >>>> to be
> > > > >>>> > > >> closest
> > > > >>>> > > >> > > to a
> > > > >>>> > > >> > > >>>>>>> > consensus
> > > > >>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do you
> > mind
> > > > >>>> taking a
> > > > >>>> > > >> look?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing
> > the
> > > > >>>> nits?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise
> > > wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b).
> I
> > > > fully
> > > > >>>> agree
> > > > >>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> decorator
> > > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > > >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also
> not
> > a
> > > > big
> > > > >>>> fan
> > > > >>>> > of
> > > > >>>> > > >> > > >>>>>>> overloading the
> > > > >>>> > > >> > > >>>>>>> > > base
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to
> > > implement
> > > > as
> > > > >>>> > is). The
> > > > >>>> > > >> > > usual
> > > > >>>> > > >> > > >>>>>>> feedback
> > > > >>>> > > >> > > >>>>>>> > > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are
> > always
> > > > that
> > > > >>>> > they are
> > > > >>>> > > >> > > >>>>>>> overwhelming
> > > > >>>> > > >> > > >>>>>>> > and
> > > > >>>> > > >> > > >>>>>>> > > >> too
> > > > >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd
> > also
> > > > not
> > > > >>>> > oppose
> > > > >>>> > > >> 1c)
> > > > >>>> > > >> > as
> > > > >>>> > > >> > > >>>>>>> scattered
> > > > >>>> > > >> > > >>>>>>> > > >> > interfaces
> > > > >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just
> > dislike
> > > > >>>> 1a) and
> > > > >>>> > 1d).
> > > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> > capability
> > > > >>>> > approach, I
> > > > >>>> > > >> > fear
> > > > >>>> > > >> > > >>>>>>> that it
> > > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > > >>>> > > >> > > >>>>>>> > > >> > work
> > > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> > Sources
> > > > can
> > > > >>>> > always
> > > > >>>> > > >> be
> > > > >>>> > > >> > > >>>>>>> aligned
> > > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > > >>>> > > >> > > >>>>>>> > > >> > and
> > > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > > alignment.
> > > > >>>> So
> > > > >>>> > it's
> > > > >>>> > > >> > > >>>>>>> plausible to put
> > > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even
> though
> > > > they
> > > > >>>> do
> > > > >>>> > not
> > > > >>>> > > >> use
> > > > >>>> > > >> > any
> > > > >>>> > > >> > > >>>>>>> of the
> > > > >>>> > > >> > > >>>>>>> > > >> presented
> > > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> > > > issue a
> > > > >>>> > > >> warning, if
> > > > >>>> > > >> > > >>>>>>> they handle
> > > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to
> > > > >>>> facilitate
> > > > >>>> > future
> > > > >>>> > > >> use
> > > > >>>> > > >> > > >>>>>>> cases makes
> > > > >>>> > > >> > > >>>>>>> > > >> sense
> > > > >>>> > > >> > > >>>>>>> > > >> > but
> > > > >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is
> > > chosen)
> > > > >>>> should
> > > > >>>> > be
> > > > >>>> > > >> > > adjusted
> > > > >>>> > > >> > > >>>>>>> > > >> accordingly.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> > > > >>>> > PausingSourceReader
> > > > >>>> > > >> and
> > > > >>>> > > >> > > I'd
> > > > >>>> > > >> > > >>>>>>> go for
> > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's
> > > proposal
> > > > >>>> > afaik). We
> > > > >>>> > > >> > could
> > > > >>>> > > >> > > >>>>>>> also
> > > > >>>> > > >> > > >>>>>>> > split
> > > > >>>> > > >> > > >>>>>>> > > it
> > > > >>>> > > >> > > >>>>>>> > > >> > into
> > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> > > > >>>> > > >> pauseOrResumeSplits
> > > > >>>> > > >> > > >>>>>>> may allow
> > > > >>>> > > >> > > >>>>>>> > > >> Sources
> > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library
> > calls
> > > > (as
> > > > >>>> > written
> > > > >>>> > > >> in
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >>>>>>> > Javadoc),
> > > > >>>> > > >> > > >>>>>>> > > >> both
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and
> > I'm
> > > > not
> > > > >>>> > sure if
> > > > >>>> > > >> > there
> > > > >>>> > > >> > > >>>>>>> is a
> > > > >>>> > > >> > > >>>>>>> > system
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >> can.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with
> > "pause".
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in
> the
> > > > >>>> capability
> > > > >>>> > > >> > section:
> > > > >>>> > > >> > > >>>>>>> Sources
> > > > >>>> > > >> > > >>>>>>> > that
> > > > >>>> > > >> > > >>>>>>> > > >> > adopt
> > > > >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in
> > > > earlier
> > > > >>>> > > >> versions. So
> > > > >>>> > > >> > > it
> > > > >>>> > > >> > > >>>>>>> feels
> > > > >>>> > > >> > > >>>>>>> > like
> > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > >>>> > > >> > > >>>>>>> > > >> > are
> > > > >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old
> > sources
> > > > can
> > > > >>>> be
> > > > >>>> > used
> > > > >>>> > > >> > after
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > change);
> > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > >>>> > > >> > > >>>>>>> > > >> > I
> > > > >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API
> > > addition.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what
> > happens
> > > > >>>> when all
> > > > >>>> > > >> splits
> > > > >>>> > > >> > > are
> > > > >>>> > > >> > > >>>>>>> paused.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how
> the
> > 3
> > > > >>>> flavors
> > > > >>>> > of
> > > > >>>> > > >> > > >>>>>>> SourceReaderBase
> > > > >>>> > > >> > > >>>>>>> > > >> > interact
> > > > >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense
> to
> > > > >>>> include
> > > > >>>> > Kafka
> > > > >>>> > > >> and
> > > > >>>> > > >> > > >>>>>>> Pulsar in
> > > > >>>> > > >> > > >>>>>>> > the
> > > > >>>> > > >> > > >>>>>>> > > >> > FLIP.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate
> > > > >>>> follow-up
> > > > >>>> > work.
> > > > >>>> > > >> > (could
> > > > >>>> > > >> > > >>>>>>> be in the
> > > > >>>> > > >> > > >>>>>>> > > >> same
> > > > >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM
> > > Dawid
> > > > >>>> > Wysakowicz
> > > > >>>> > > >> <
> > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > >>>> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > > >>>> SourceReader,
> > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > > > >>>> SplitReader,
> > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > >>>> > > >> > > >>>>>>> > > or
> > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > > >>>> > AlignedSourceReader"
> > > > >>>> > > >> > and
> > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader",
> or
> > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > > >>>> SourceReader"
> > > > >>>> > and
> > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > according
> > > > to
> > > > >>>> Dawid
> > > > >>>> > > >> would
> > > > >>>> > > >> > > be:
> > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > > >>>> SourceReader"
> > > > >>>> > and
> > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> > > > >>>> personally
> > > > >>>> > > >> > speaking I
> > > > >>>> > > >> > > >>>>>>> think it
> > > > >>>> > > >> > > >>>>>>> > is
> > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the
> > split
> > > > >>>> reader
> > > > >>>> > with
> > > > >>>> > > >> > > >>>>>>> default impl.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to
> have
> > > it
> > > > >>>> > aligned as
> > > > >>>> > > >> > much
> > > > >>>> > > >> > > as
> > > > >>>> > > >> > > >>>>>>> > possible.
> > > > >>>> > > >> > > >>>>>>> > > >> I'd
> > > > >>>> > > >> > > >>>>>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can
> merge
> > > > >>>> > > >> > AlignedSplitReader
> > > > >>>> > > >> > > >>>>>>> with
> > > > >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> > > > >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin
> > > wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to
> align
> > > > other
> > > > >>>> > things in
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> future?
> > > > >>>> > > >> > > >>>>>>> > For
> > > > >>>> > > >> > > >>>>>>> > > >> > example,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> > > > >>>> limits. This
> > > > >>>> > > >> > feature
> > > > >>>> > > >> > > >>>>>>> may also
> > > > >>>> > > >> > > >>>>>>> > > need
> > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > > >>>> interface to
> > > > >>>> > > >> support
> > > > >>>> > > >> > > >>>>>>> align other
> > > > >>>> > > >> > > >>>>>>> > > >> > things, I
> > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such
> as
> > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming
> wise,
> > it
> > > > >>>> would
> > > > >>>> > > >> usually
> > > > >>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> more
> > > > >>>> > > >> > > >>>>>>> > > >> extensible
> > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> just describe what the method
> > > actually
> > > > >>>> does,
> > > > >>>> > > >> instead
> > > > >>>> > > >> > of
> > > > >>>> > > >> > > >>>>>>> assuming
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For
> example,
> > > in
> > > > >>>> this
> > > > >>>> > case,
> > > > >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > > >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible because
> > this
> > > > >>>> can be
> > > > >>>> > used
> > > > >>>> > > >> for
> > > > >>>> > > >> > > >>>>>>> any kind of
> > > > >>>> > > >> > > >>>>>>> > > >> flow
> > > > >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark
> alignment
> > or
> > > > >>>> simple
> > > > >>>> > rate
> > > > >>>> > > >> > > >>>>>>> limiting.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on
> > > this.
> > > > I
> > > > >>>> think
> > > > >>>> > > >> they
> > > > >>>> > > >> > > have
> > > > >>>> > > >> > > >>>>>>> their
> > > > >>>> > > >> > > >>>>>>> > own
> > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > > Interfaces
> > > > >>>> for
> > > > >>>> > > >> extending
> > > > >>>> > > >> > > >>>>>>> abilities
> > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > TableSource/TableSink,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather
> than
> > > > >>>> methods in
> > > > >>>> > > >> this
> > > > >>>> > > >> > > >>>>>>> case. When
> > > > >>>> > > >> > > >>>>>>> > you
> > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> ability
> > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > Interfaces
> > > > can
> > > > >>>> help
> > > > >>>> > to
> > > > >>>> > > >> > > >>>>>>> organize them
> > > > >>>> > > >> > > >>>>>>> > and
> > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want
> to
> > > have
> > > > >>>> an
> > > > >>>> > > >> ability.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative
> interfaces
> > if
> > > > >>>> this is
> > > > >>>> > a
> > > > >>>> > > >> > general
> > > > >>>> > > >> > > >>>>>>> design
> > > > >>>> > > >> > > >>>>>>> > > >> pattern
> > > > >>>> > > >> > > >>>>>>> > > >> > in
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the other components in Flink.
> But
> > it
> > > > >>>> looks
> > > > >>>> > like
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> current API
> > > > >>>> > > >> > > >>>>>>> > > >> > proposal
> > > > >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is
> essentially
> > > > >>>> > > >> "MySourceReader
> > > > >>>> > > >> > > >>>>>>> implements
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader,
> WithSplitsAlignment",
> > > > >>>> along with
> > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> implements
> > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric?
> > I'd
> > > > >>>> > consider any
> > > > >>>> > > >> > one
> > > > >>>> > > >> > > >>>>>>> of the
> > > > >>>> > > >> > > >>>>>>> > > >> following
> > > > >>>> > > >> > > >>>>>>> > > >> > as
> > > > >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > > >>>> SourceReader,
> > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > >>>> > > >> > > >>>>>>> > > >> along
> > > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > > > >>>> SplitReader,
> > > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > > >>>> > > >> > > >>>>>>> > > or
> > > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > > >>>> > AlignedSourceReader"
> > > > >>>> > > >> > and
> > > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader",
> or
> > > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > > >>>> SourceReader"
> > > > >>>> > and
> > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> > according
> > > > to
> > > > >>>> Dawid
> > > > >>>> > > >> would
> > > > >>>> > > >> > > be:
> > > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > > >>>> SourceReader"
> > > > >>>> > and
> > > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> > > > >>>> personally
> > > > >>>> > > >> > speaking I
> > > > >>>> > > >> > > >>>>>>> think it
> > > > >>>> > > >> > > >>>>>>> > is
> > > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the
> > split
> > > > >>>> reader
> > > > >>>> > with
> > > > >>>> > > >> > > >>>>>>> default impl.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> > > > >>>> dwysakowicz@apache.org> <
> > > > >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > >>>> dwysakowicz@apache.org
> > > > >>>> > >,
> > > > >>>> > > >> > thanks
> > > > >>>> > > >> > > >>>>>>> for the
> > > > >>>> > > >> > > >>>>>>> > > reply.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't
> have a
> > > > >>>> preference
> > > > >>>> > > >> and I
> > > > >>>> > > >> > > >>>>>>> agree most
> > > > >>>> > > >> > > >>>>>>> > of
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> alignment
> > I
> > > am
> > > > >>>> fine
> > > > >>>> > > >> > following
> > > > >>>> > > >> > > >>>>>>> your
> > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending
> > from
> > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> there,
> > > not
> > > > >>>> in the
> > > > >>>> > > >> Source
> > > > >>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> keep the
> > > > >>>> > > >> > > >>>>>>> > > two
> > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the
> > > "supportsXXX"
> > > > >>>> in
> > > > >>>> > Source
> > > > >>>> > > >> is
> > > > >>>> > > >> > > >>>>>>> that this
> > > > >>>> > > >> > > >>>>>>> > > allows
> > > > >>>> > > >> > > >>>>>>> > > >> > some
> > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For example,
> > if a
> > > > >>>> user
> > > > >>>> > enabled
> > > > >>>> > > >> > > >>>>>>> watermark
> > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by the
> > > > Source,
> > > > >>>> an
> > > > >>>> > > >> exception
> > > > >>>> > > >> > > >>>>>>> can be
> > > > >>>> > > >> > > >>>>>>> > thrown
> > > > >>>> > > >> > > >>>>>>> > > >> at
> > > > >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general
> > > > >>>> useful. That
> > > > >>>> > > >> said,
> > > > >>>> > > >> > I
> > > > >>>> > > >> > > >>>>>>> agree that
> > > > >>>> > > >> > > >>>>>>> > > API
> > > > >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to
> > put
> > > > the
> > > > >>>> two
> > > > >>>> > > >> methods
> > > > >>>> > > >> > > >>>>>>> together.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM
> > Jark
> > > > Wu <
> > > > >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> > > > >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <imjark@gmail.com
> >
> > <
> > > > >>>> > > >> imjark@gmail.com>
> > > > >>>> > > >> > > >>>>>>> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on
> > > > FLIP-182.
> > > > >>>> > Sorry I
> > > > >>>> > > >> did
> > > > >>>> > > >> > > >>>>>>> miss it.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40,
> Dawid
> > > > >>>> > Wysakowicz <
> > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> > > > >>>> dwysakowicz@apache.org>
> > > > >>>> > <
> > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always
> align
> > > with
> > > > >>>> > watermarks
> > > > >>>> > > >> > when
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > source
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > > > watermark
> > > > >>>> > > >> alignment
> > > > >>>> > > >> > > even
> > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> throughput
> > > > >>>> somehow. I
> > > > >>>> > > >> agree
> > > > >>>> > > >> > > >>>>>>> with Becket
> > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method
> > for
> > > > >>>> users to
> > > > >>>> > > >> > > configure
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > source
> > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> calculate
> > > > >>>> > > >> > > maxDesiredWatermark?
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > >>>> maxDesiredWatermark
> > > > >>>> > will
> > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> > resume.
> > > > Can
> > > > >>>> users
> > > > >>>> > > >> > > configure
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous
> > > FLIP[1]
> > > > >>>> which
> > > > >>>> > has
> > > > >>>> > > >> > been
> > > > >>>> > > >> > > >>>>>>> already
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be
> > enabled
> > > > >>>> with the
> > > > >>>> > > >> > > watermark
> > > > >>>> > > >> > > >>>>>>> strategy
> > > > >>>> > > >> > > >>>>>>> > > >> which
> > > > >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and update
> > > > >>>> interval.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > > >>>> interface to
> > > > >>>> > > >> support
> > > > >>>> > > >> > > >>>>>>> align other
> > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such
> as
> > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the
> > other
> > > > >>>> hand
> > > > >>>> > putting
> > > > >>>> > > >> > all
> > > > >>>> > > >> > > >>>>>>> methods,
> > > > >>>> > > >> > > >>>>>>> > > even
> > > > >>>> > > >> > > >>>>>>> > > >> > with
> > > > >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for
> enabling
> > > > >>>> certain
> > > > >>>> > > >> features,
> > > > >>>> > > >> > > >>>>>>> makes the
> > > > >>>> > > >> > > >>>>>>> > > entry
> > > > >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new
> source
> > > > >>>> higher.
> > > > >>>> > > >> Instead of
> > > > >>>> > > >> > > >>>>>>> focusing on
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> basic
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and required properties of the
> > > Source,
> > > > >>>> the
> > > > >>>> > person
> > > > >>>> > > >> > > >>>>>>> implementing a
> > > > >>>> > > >> > > >>>>>>> > > >> source
> > > > >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to
> figure
> > > out
> > > > >>>> what
> > > > >>>> > all
> > > > >>>> > > >> of
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> extra
> > > > >>>> > > >> > > >>>>>>> > > >> features
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> are
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with them.
> It
> > > > >>>> makes it
> > > > >>>> > also
> > > > >>>> > > >> > > harder
> > > > >>>> > > >> > > >>>>>>> to
> > > > >>>> > > >> > > >>>>>>> > > organize
> > > > >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark
> > > said.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't
> have a
> > > > >>>> preference
> > > > >>>> > > >> and I
> > > > >>>> > > >> > > >>>>>>> agree most
> > > > >>>> > > >> > > >>>>>>> > of
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the
> alignment
> > I
> > > am
> > > > >>>> fine
> > > > >>>> > > >> > following
> > > > >>>> > > >> > > >>>>>>> your
> > > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending
> > from
> > > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > > >>>> > > >> > > >>>>>>> > > >> but
> > > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX"
> there,
> > > not
> > > > >>>> in the
> > > > >>>> > > >> Source
> > > > >>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> keep the
> > > > >>>> > > >> > > >>>>>>> > > two
> > > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really
> > > > unfortunate
> > > > >>>> the
> > > > >>>> > > >> > > >>>>>>> "alignSplits"
> > > > >>>> > > >> > > >>>>>>> > methods
> > > > >>>> > > >> > > >>>>>>> > > >> > differ
> > > > >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and
> > > > >>>> SpitReader. The
> > > > >>>> > > >> reason
> > > > >>>> > > >> > > for
> > > > >>>> > > >> > > >>>>>>> that is
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with
> > > > >>>> SplitIds,
> > > > >>>> > whereas
> > > > >>>> > > >> > > >>>>>>> SplitReader
> > > > >>>> > > >> > > >>>>>>> > > needs
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I
> > found
> > > > the
> > > > >>>> > > >> discrepancy
> > > > >>>> > > >> > > >>>>>>> acceptable
> > > > >>>> > > >> > > >>>>>>> > for
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> > > > >>>> significantly,
> > > > >>>> > > >> especially
> > > > >>>> > > >> > > as
> > > > >>>> > > >> > > >>>>>>> they
> > > > >>>> > > >> > > >>>>>>> > would
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> highly
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we
> > would
> > > > >>>> have to
> > > > >>>> > > >> perform
> > > > >>>> > > >> > > >>>>>>> additional
> > > > >>>> > > >> > > >>>>>>> > > >> > lookups.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a
> > > secondary
> > > > >>>> > interface.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu
> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and
> > > > >>>> Sebastian!
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor questions
> > > > (maybe I
> > > > >>>> > missed
> > > > >>>> > > >> > > >>>>>>> something).
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always
> align
> > > with
> > > > >>>> > watermarks
> > > > >>>> > > >> > when
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > source
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > > > watermark
> > > > >>>> > > >> alignment
> > > > >>>> > > >> > > even
> > > > >>>> > > >> > > >>>>>>> if Kafka
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the
> throughput
> > > > >>>> somehow. I
> > > > >>>> > > >> agree
> > > > >>>> > > >> > > >>>>>>> with Becket
> > > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method
> > for
> > > > >>>> users to
> > > > >>>> > > >> > > configure
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > source
> > > > >>>> > > >> > > >>>>>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework
> calculate
> > > > >>>> > > >> > > maxDesiredWatermark?
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > > >>>> maxDesiredWatermark
> > > > >>>> > will
> > > > >>>> > > >> > > >>>>>>> greatly affect
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> > resume.
> > > > Can
> > > > >>>> users
> > > > >>>> > > >> > > configure
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to
> align
> > > > other
> > > > >>>> > things in
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> future?
> > > > >>>> > > >> > > >>>>>>> > For
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> example,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> > > > >>>> limits. This
> > > > >>>> > > >> > feature
> > > > >>>> > > >> > > >>>>>>> may also
> > > > >>>> > > >> > > >>>>>>> > > need
> > > > >>>> > > >> > > >>>>>>> > > >> an
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > > >>>> interface to
> > > > >>>> > > >> support
> > > > >>>> > > >> > > >>>>>>> align other
> > > > >>>> > > >> > > >>>>>>> > > >> things,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such
> as
> > > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > > >>>> > > >> > > >>>>>>> > > >> instead
> > > > >>>> > > >> > > >>>>>>> > > >> > of
> > > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on
> > > this.
> > > > I
> > > > >>>> think
> > > > >>>> > > >> they
> > > > >>>> > > >> > > have
> > > > >>>> > > >> > > >>>>>>> their
> > > > >>>> > > >> > > >>>>>>> > own
> > > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > > Interfaces
> > > > >>>> for
> > > > >>>> > > >> extending
> > > > >>>> > > >> > > >>>>>>> abilities
> > > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > > TableSource/TableSink,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather
> than
> > > > >>>> methods in
> > > > >>>> > > >> this
> > > > >>>> > > >> > > >>>>>>> case. When
> > > > >>>> > > >> > > >>>>>>> > you
> > > > >>>> > > >> > > >>>>>>> > > >> have
> > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each
> ability
> > > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> > Interfaces
> > > > can
> > > > >>>> help
> > > > >>>> > to
> > > > >>>> > > >> > > >>>>>>> organize them
> > > > >>>> > > >> > > >>>>>>> > and
> > > > >>>> > > >> > > >>>>>>> > > >> make
> > > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want
> to
> > > have
> > > > >>>> an
> > > > >>>> > > >> ability.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13,
> > Becket
> > > > Qin
> > > > >>>> <
> > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > >>>> > > >> > > >>>>>>> > <
> > > > >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> > > > becket.qin@gmail.com>
> > > > >>>> <
> > > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > > >>>> > > >> > > >>>>>>> > <
> > > > >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation.
> > Apologies
> > > > >>>> that I
> > > > >>>> > > >> somehow
> > > > >>>> > > >> > > >>>>>>> misread a
> > > > >>>> > > >> > > >>>>>>> > > bunch
> > > > >>>> > > >> > > >>>>>>> > > >> of
> > > > >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were
> > > "assign".
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default
> > > implementation,
> > > > >>>> I was
> > > > >>>> > > >> > thinking
> > > > >>>> > > >> > > >>>>>>> of the
> > > > >>>> > > >> > > >>>>>>> > > default
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little
> > worried
> > > > >>>> about
> > > > >>>> > the
> > > > >>>> > > >> > > >>>>>>> proliferation of
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> decorative
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most
> > > important
> > > > >>>> thing
> > > > >>>> > about
> > > > >>>> > > >> > > >>>>>>> interfaces is
> > > > >>>> > > >> > > >>>>>>> > > that
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> they
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this
> > > case, I
> > > > >>>> prefer
> > > > >>>> > > >> adding
> > > > >>>> > > >> > > >>>>>>> new method
> > > > >>>> > > >> > > >>>>>>> > to
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the
> > following
> > > > >>>> reasons:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of
> > > > >>>> decorative
> > > > >>>> > > >> > interfaces
> > > > >>>> > > >> > > >>>>>>> is which
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> > > > combinations
> > > > >>>> of
> > > > >>>> > > >> multiple
> > > > >>>> > > >> > > >>>>>>> decorative
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP,
> the
> > > > >>>> > > >> > withSplitsAlignment
> > > > >>>> > > >> > > >>>>>>> interface
> > > > >>>> > > >> > > >>>>>>> > > is
> > > > >>>> > > >> > > >>>>>>> > > >> > only
> > > > >>>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader
> > which
> > > > >>>> means it
> > > > >>>> > > >> can't
> > > > >>>> > > >> > > >>>>>>> decorate any
> > > > >>>> > > >> > > >>>>>>> > > >> other
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface
> design
> > > > >>>> > perspective, a
> > > > >>>> > > >> > > >>>>>>> natural
> > > > >>>> > > >> > > >>>>>>> > question
> > > > >>>> > > >> > > >>>>>>> > > >> is
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> why
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader"
> extend
> > > > >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > > > >>>> > > >> > > >>>>>>> > > is
> > > > >>>> > > >> > > >>>>>>> > > >> > also
> > > > >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split
> > reader
> > > > >>>> > implementing
> > > > >>>> > > >> > both
> > > > >>>> > > >> > > >>>>>>> > SplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work,
> > > because
> > > > a
> > > > >>>> > source
> > > > >>>> > > >> > reader
> > > > >>>> > > >> > > >>>>>>> > implementing
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and
> > withSplitsAlignment
> > > > >>>> works.
> > > > >>>> > So why
> > > > >>>> > > >> > > isn't
> > > > >>>> > > >> > > >>>>>>> there an
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the
> > > future,
> > > > if
> > > > >>>> > there
> > > > >>>> > > >> is a
> > > > >>>> > > >> > > new
> > > > >>>> > > >> > > >>>>>>> feature
> > > > >>>> > > >> > > >>>>>>> > > >> added
> > > > >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned
> > data
> > > > >>>> aware),
> > > > >>>> > are
> > > > >>>> > > >> we
> > > > >>>> > > >> > > >>>>>>> going to
> > > > >>>> > > >> > > >>>>>>> > create
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> another
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
> > > > >>>> > > >> SortedSplitReader or
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think
> > the
> > > > >>>> > additional
> > > > >>>> > > >> > > >>>>>>> decorative
> > > > >>>> > > >> > > >>>>>>> > > interface
> > > > >>>> > > >> > > >>>>>>> > > >> > like
> > > > >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually
> > > increases
> > > > >>>> the
> > > > >>>> > > >> > > understanding
> > > > >>>> > > >> > > >>>>>>> cost of
> > > > >>>> > > >> > > >>>>>>> > > >> users
> > > > >>>> > > >> > > >>>>>>> > > >> > >> because they have to know what
> > > > decorative
> > > > >>>> > > >> interfaces
> > > > >>>> > > >> > > are
> > > > >>>> > > >> > > >>>>>>> there,
> > > > >>>> > > >> > > >>>>>>> > > which
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and
> > which
> > > > >>>> > > >> combinations of
> > > > >>>> > > >> > > the
> > > > >>>> > > >> > > >>>>>>> > > decorative
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which
> are
> > > not.
> > > > >>>> > Ideally we
> > > > >>>> > > >> > want
> > > > >>>> > > >> > > >>>>>>> to avoid
> > > > >>>> > > >> > > >>>>>>> > > >> that.
> > > > >>>> > > >> > > >>>>>>> > > >> > To
> > > > >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing
> having
> > an
> > > > >>>> > interface of
> > > > >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an
> > > > >>>> internal
> > > > >>>> > > >> interface
> > > > >>>> > > >> > > and
> > > > >>>> > > >> > > >>>>>>> let
> > > > >>>> > > >> > > >>>>>>> > > >> > SourceReader
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the
> > > > >>>> SourceReader
> > > > >>>> > with a
> > > > >>>> > > >> > > default
> > > > >>>> > > >> > > >>>>>>> > > >> implementation
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic
> > > branching
> > > > >>>> in the
> > > > >>>> > > >> source
> > > > >>>> > > >> > > >>>>>>> logic,
> > > > >>>> > > >> > > >>>>>>> > > >> especially
> > > > >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast
> > > > >>>> majority of
> > > > >>>> > the
> > > > >>>> > > >> > > >>>>>>> SourceReader
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all,
> would
> > > just
> > > > >>>> extend
> > > > >>>> > > >> from
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new method to
> > the
> > > > >>>> > interface
> > > > >>>> > > >> would
> > > > >>>> > > >> > > >>>>>>> effectively
> > > > >>>> > > >> > > >>>>>>> > > >> give
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but
> simpler.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle
> that
> > > may
> > > > be
> > > > >>>> > worth
> > > > >>>> > > >> > > >>>>>>> discussing is how
> > > > >>>> > > >> > > >>>>>>> > > do
> > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> let
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell
> > Flink
> > > > >>>> what
> > > > >>>> > > >> capability
> > > > >>>> > > >> > > is
> > > > >>>> > > >> > > >>>>>>> > supported
> > > > >>>> > > >> > > >>>>>>> > > >> and
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> what
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I
> feel
> > > the
> > > > >>>> most
> > > > >>>> > > >> intuitive
> > > > >>>> > > >> > > >>>>>>> place to me
> > > > >>>> > > >> > > >>>>>>> > > is
> > > > >>>> > > >> > > >>>>>>> > > >> in
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is
> the
> > > > >>>> entrance
> > > > >>>> > of the
> > > > >>>> > > >> > > >>>>>>> entire Source
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> connector
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> logic.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am
> > > > >>>> wondering
> > > > >>>> > if the
> > > > >>>> > > >> > > >>>>>>> following
> > > > >>>> > > >> > > >>>>>>> > > >> interface
> > > > >>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand by
> > the
> > > > >>>> users.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to
> > > > >>>> internal
> > > > >>>> > > >> interface,
> > > > >>>> > > >> > > >>>>>>> let both
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it,
> > with
> > > a
> > > > >>>> default
> > > > >>>> > > >> no-op
> > > > >>>> > > >> > > >>>>>>> > > implementation.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > > > >>>> > > >> supportSplitsAlignment()"
> > > > >>>> > > >> > > to
> > > > >>>> > > >> > > >>>>>>> the
> > > > >>>> > > >> > > >>>>>>> > Source
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> > > > implementation
> > > > >>>> > returning
> > > > >>>> > > >> > > >>>>>>> false. Sources
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> have
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic
> can
> > > > >>>> change
> > > > >>>> > this to
> > > > >>>> > > >> > > >>>>>>> return true,
> > > > >>>> > > >> > > >>>>>>> > and
> > > > >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits()
> methods
> > in
> > > > the
> > > > >>>> > > >> > SourceReader
> > > > >>>> > > >> > > /
> > > > >>>> > > >> > > >>>>>>> > > SplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> if
> > > > >>>> > > >> > > >>>>>>> > > >> > >> needed.
> > > > >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new
> optional
> > > > >>>> feature is
> > > > >>>> > > >> going
> > > > >>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> be added
> > > > >>>> > > >> > > >>>>>>> > to
> > > > >>>> > > >> > > >>>>>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires
> > the
> > > > >>>> > awareness
> > > > >>>> > > >> from
> > > > >>>> > > >> > > >>>>>>> Flink, we
> > > > >>>> > > >> > > >>>>>>> > can
> > > > >>>> > > >> > > >>>>>>> > > >> add
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> more
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM
> > Dawid
> > > > >>>> > Wysakowicz <
> > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > >>>> dwysakowicz@apache.org>
> > > > >>>> > <
> > > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > > > >>>> > dwysakowicz@apache.org>
> > > > >>>> > > >> <
> > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > >>>> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > >>>> `AlignedSplitReader`
> > > > >>>> > > >> > > interface
> > > > >>>> > > >> > > >>>>>>> (aka the
> > > > >>>> > > >> > > >>>>>>> > > >> stop &
> > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > implemented
> > > > for
> > > > >>>> > Kafka and
> > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > >>>> > > >> > > >>>>>>> > > >> correct?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though,
> > > those
> > > > >>>> are
> > > > >>>> > the
> > > > >>>> > > >> only
> > > > >>>> > > >> > > >>>>>>> sources
> > > > >>>> > > >> > > >>>>>>> > which
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> consume
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits
> > and
> > > > >>>> thus
> > > > >>>> > > >> alignment
> > > > >>>> > > >> > > >>>>>>> applies.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > > > alignment
> > > > >>>> in
> > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > >>>> > > >> > > >>>>>>> > > then
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does
> > not
> > > > >>>> implement
> > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > WithSplitsAlignment
> > > > >>>> can
> > > > >>>> > > >> instead
> > > > >>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> added to
> > > > >>>> > > >> > > >>>>>>> > the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > > >>>> > KafkaSourceReader) to
> > > > >>>> > > >> > make
> > > > >>>> > > >> > > >>>>>>> it
> > > > >>>> > > >> > > >>>>>>> > explicit
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I
> > > think
> > > > >>>> we
> > > > >>>> > could
> > > > >>>> > > >> > > >>>>>>> actually do
> > > > >>>> > > >> > > >>>>>>> > that.
> > > > >>>> > > >> > > >>>>>>> > > >> Given
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
> > > > >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > > > >>>> > > >> > > >>>>>>> > > >> rather
> > > > >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the
> > > > >>>> corresponding
> > > > >>>> > method
> > > > >>>> > > >> of
> > > > >>>> > > >> > > >>>>>>> > SplitFetcher),
> > > > >>>> > > >> > > >>>>>>> > > >> we
> > > > >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the
> actual
> > > > source
> > > > >>>> > > >> > > >>>>>>> implementations. This
> > > > >>>> > > >> > > >>>>>>> > > >> solution
> > > > >>>> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors
> of
> > > new
> > > > >>>> > sources
> > > > >>>> > > >> would
> > > > >>>> > > >> > > >>>>>>> have to do
> > > > >>>> > > >> > > >>>>>>> > > two
> > > > >>>> > > >> > > >>>>>>> > > >> > >> things: extend from
> > > AlignedSplitReader
> > > > >>>> and
> > > > >>>> > > >> implement
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> > > > >>>> AlignedSplitReader.
> > > > >>>> > I
> > > > >>>> > > >> would
> > > > >>>> > > >> > > be
> > > > >>>> > > >> > > >>>>>>> fine
> > > > >>>> > > >> > > >>>>>>> > with
> > > > >>>> > > >> > > >>>>>>> > > >> such
> > > > >>>> > > >> > > >>>>>>> > > >> > a
> > > > >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others
> think?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> > > > >>>> section, is
> > > > >>>> > it
> > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > partitions
> > > > >>>> and
> > > > >>>> > one of
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> partition
> > > > >>>> > > >> > > >>>>>>> > is
> > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > > situation,
> > > > >>>> > watermark
> > > > >>>> > > >> > for
> > > > >>>> > > >> > > >>>>>>> this
> > > > >>>> > > >> > > >>>>>>> > source
> > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading
> > > recent
> > > > >>>> data
> > > > >>>> > from
> > > > >>>> > > >> > other
> > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't
> > > hold
> > > > >>>> back
> > > > >>>> > the
> > > > >>>> > > >> > overall
> > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > >>>> > > >> > > >>>>>>> > > I
> > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we
> may
> > > have
> > > > >>>> late
> > > > >>>> > data
> > > > >>>> > > >> for
> > > > >>>> > > >> > > >>>>>>> this slow
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark.
> > > > >>>> Watermark of
> > > > >>>> > an
> > > > >>>> > > >> > > >>>>>>> operator is the
> > > > >>>> > > >> > > >>>>>>> > > >> minimum
> > > > >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> > > > >>>> restart. Say
> > > > >>>> > > >> split
> > > > >>>> > > >> > > >>>>>>> alignment is
> > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> completed.
> > > job
> > > > >>>> > failed and
> > > > >>>> > > >> > > >>>>>>> restored from
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > > decision
> > > > >>>> is not
> > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > >>>> > > >> > > >>>>>>> > > initially
> > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until
> > we
> > > > get
> > > > >>>> a
> > > > >>>> > cycle
> > > > >>>> > > >> of
> > > > >>>> > > >> > > >>>>>>> watermark
> > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> saying
> > > this
> > > > >>>> > corner is
> > > > >>>> > > >> a
> > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is
> > > sort
> > > > >>>> of a
> > > > >>>> > > >> general
> > > > >>>> > > >> > use
> > > > >>>> > > >> > > >>>>>>> case, so
> > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > > > >>>> SourceReader
> > > > >>>> > > >> directly
> > > > >>>> > > >> > > >>>>>>> instead of
> > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > > > >>>> > > >> WithSplitAssignment?
> > > > >>>> > > >> > > We
> > > > >>>> > > >> > > >>>>>>> can
> > > > >>>> > > >> > > >>>>>>> > > provide
> > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > backwards
> > > > >>>> > > >> compatibility
> > > > >>>> > > >> > > >>>>>>> won't be an
> > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a
> > > default
> > > > >>>> > > >> > implementation.
> > > > >>>> > > >> > > >>>>>>> How would
> > > > >>>> > > >> > > >>>>>>> > we
> > > > >>>> > > >> > > >>>>>>> > > >> do
> > > > >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op?
> Is
> > it
> > > > >>>> better
> > > > >>>> > than
> > > > >>>> > > >> > > having
> > > > >>>> > > >> > > >>>>>>> an
> > > > >>>> > > >> > > >>>>>>> > opt-in
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface? The default
> > implementation
> > > > >>>> would
> > > > >>>> > have
> > > > >>>> > > >> to
> > > > >>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> added
> > > > >>>> > > >> > > >>>>>>> > > >> exclusively
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> in
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader
> interface.
> > By
> > > > >>>> the way
> > > > >>>> > > >> notice
> > > > >>>> > > >> > > >>>>>>> > > SourceReaderBase
> > > > >>>> > > >> > > >>>>>>> > > >> > >> does extend from
> > WithSplitsAlignment,
> > > > so
> > > > >>>> > > >> effectively
> > > > >>>> > > >> > > all
> > > > >>>> > > >> > > >>>>>>> > > >> implementations
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be
> > > > honest I
> > > > >>>> > think
> > > > >>>> > > >> it is
> > > > >>>> > > >> > > >>>>>>> impossible
> > > > >>>> > > >> > > >>>>>>> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader
> > interface
> > > > >>>> directly
> > > > >>>> > by
> > > > >>>> > > >> end
> > > > >>>> > > >> > > >>>>>>> users.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > SplitReader
> > > > >>>> interface
> > > > >>>> > > >> > probably
> > > > >>>> > > >> > > >>>>>>> also
> > > > >>>> > > >> > > >>>>>>> > needs
> > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at
> the
> > > > split
> > > > >>>> > > >> > granularity.
> > > > >>>> > > >> > > >>>>>>> Can you
> > > > >>>> > > >> > > >>>>>>> > add
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> > > > >>>> interface
> > > > >>>> > > >> section as
> > > > >>>> > > >> > > >>>>>>> well?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the
> > beginning.
> > > > See
> > > > >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the
> > method
> > > > >>>> name
> > > > >>>> > > >> > assignSplits
> > > > >>>> > > >> > > >>>>>>> here,
> > > > >>>> > > >> > > >>>>>>> > given
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the
> split
> > > > >>>> > assignments? It
> > > > >>>> > > >> > > seems
> > > > >>>> > > >> > > >>>>>>> > something
> > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > >>>> > > >> adjustSplitsThrottling() is
> > > > >>>> > > >> > > >>>>>>> more
> > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> The method's called
> *alignSplits*,
> > > not
> > > > >>>> > assign. Do
> > > > >>>> > > >> you
> > > > >>>> > > >> > > >>>>>>> still
> > > > >>>> > > >> > > >>>>>>> > prefer
> > > > >>>> > > >> > > >>>>>>> > > a
> > > > >>>> > > >> > > >>>>>>> > > >> > >> different name for that?
> > Personally,
> > > I
> > > > am
> > > > >>>> > open for
> > > > >>>> > > >> > > >>>>>>> suggestions
> > > > >>>> > > >> > > >>>>>>> > > here.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > >
> > > > >>>> > > >> >
> > > > >>>> > > >>
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin
> > > wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort,
> > > > >>>> Sebastion. I
> > > > >>>> > think
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> motivation
> > > > >>>> > > >> > > >>>>>>> > > >> makes a
> > > > >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few
> > suggestions
> > > /
> > > > >>>> > questions.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is
> > > sort
> > > > >>>> of a
> > > > >>>> > > >> general
> > > > >>>> > > >> > use
> > > > >>>> > > >> > > >>>>>>> case, so
> > > > >>>> > > >> > > >>>>>>> > > >> should
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > > > >>>> SourceReader
> > > > >>>> > > >> directly
> > > > >>>> > > >> > > >>>>>>> instead of
> > > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > > > >>>> > > >> WithSplitAssignment?
> > > > >>>> > > >> > > We
> > > > >>>> > > >> > > >>>>>>> can
> > > > >>>> > > >> > > >>>>>>> > > provide
> > > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> > backwards
> > > > >>>> > > >> compatibility
> > > > >>>> > > >> > > >>>>>>> won't be an
> > > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> > SplitReader
> > > > >>>> interface
> > > > >>>> > > >> > probably
> > > > >>>> > > >> > > >>>>>>> also
> > > > >>>> > > >> > > >>>>>>> > needs
> > > > >>>> > > >> > > >>>>>>> > > >> some
> > > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at
> the
> > > > split
> > > > >>>> > > >> > granularity.
> > > > >>>> > > >> > > >>>>>>> Can you
> > > > >>>> > > >> > > >>>>>>> > add
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> > > > >>>> interface
> > > > >>>> > > >> section as
> > > > >>>> > > >> > > >>>>>>> well?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the
> > method
> > > > >>>> name
> > > > >>>> > > >> > assignSplits
> > > > >>>> > > >> > > >>>>>>> here,
> > > > >>>> > > >> > > >>>>>>> > given
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the
> split
> > > > >>>> > assignments? It
> > > > >>>> > > >> > > seems
> > > > >>>> > > >> > > >>>>>>> > something
> > > > >>>> > > >> > > >>>>>>> > > >> like
> > > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > > >>>> > > >> adjustSplitsThrottling() is
> > > > >>>> > > >> > > >>>>>>> more
> > > > >>>> > > >> > > >>>>>>> > > accurate.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM
> > > Steven
> > > > >>>> Wu <
> > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> > > <
> > > > >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> > > > stevenz3wu@gmail.com>
> > > > >>>> <
> > > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > > > >>>> > > >> > > >>>>>>> > <
> > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> However, a single source operator
> > may
> > > > >>>> read
> > > > >>>> > data
> > > > >>>> > > >> from
> > > > >>>> > > >> > > >>>>>>> multiple
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple
> > > Kafka
> > > > >>>> > > >> partitions,
> > > > >>>> > > >> > > such
> > > > >>>> > > >> > > >>>>>>> that
> > > > >>>> > > >> > > >>>>>>> > even
> > > > >>>> > > >> > > >>>>>>> > > >> with
> > > > >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source
> > > operator
> > > > >>>> may
> > > > >>>> > need
> > > > >>>> > > >> to
> > > > >>>> > > >> > > >>>>>>> buffer
> > > > >>>> > > >> > > >>>>>>> > > excessive
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> amount
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data
> > > faster
> > > > >>>> than
> > > > >>>> > > >> another.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> > > > >>>> section, is
> > > > >>>> > it
> > > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > > >>>> > > >> > > >>>>>>> > > >> assume
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > > partitions
> > > > >>>> and
> > > > >>>> > one of
> > > > >>>> > > >> the
> > > > >>>> > > >> > > >>>>>>> partition
> > > > >>>> > > >> > > >>>>>>> > is
> > > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > > situation,
> > > > >>>> > watermark
> > > > >>>> > > >> > for
> > > > >>>> > > >> > > >>>>>>> this
> > > > >>>> > > >> > > >>>>>>> > source
> > > > >>>> > > >> > > >>>>>>> > > >> task
> > > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading
> > > recent
> > > > >>>> data
> > > > >>>> > from
> > > > >>>> > > >> > other
> > > > >>>> > > >> > > >>>>>>> two Kafka
> > > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't
> > > hold
> > > > >>>> back
> > > > >>>> > the
> > > > >>>> > > >> > overall
> > > > >>>> > > >> > > >>>>>>> > watermark.
> > > > >>>> > > >> > > >>>>>>> > > I
> > > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we
> may
> > > have
> > > > >>>> late
> > > > >>>> > data
> > > > >>>> > > >> for
> > > > >>>> > > >> > > >>>>>>> this slow
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> > > > >>>> restart. Say
> > > > >>>> > > >> split
> > > > >>>> > > >> > > >>>>>>> alignment is
> > > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is
> completed.
> > > job
> > > > >>>> > failed and
> > > > >>>> > > >> > > >>>>>>> restored from
> > > > >>>> > > >> > > >>>>>>> > > the
> > > > >>>> > > >> > > >>>>>>> > > >> > last
> > > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > > decision
> > > > >>>> is not
> > > > >>>> > > >> > > >>>>>>> checkpointed,
> > > > >>>> > > >> > > >>>>>>> > > initially
> > > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until
> > we
> > > > get
> > > > >>>> a
> > > > >>>> > cycle
> > > > >>>> > > >> of
> > > > >>>> > > >> > > >>>>>>> watermark
> > > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not
> saying
> > > this
> > > > >>>> > corner is
> > > > >>>> > > >> a
> > > > >>>> > > >> > > >>>>>>> problem. Just
> > > > >>>> > > >> > > >>>>>>> > > >> want
> > > > >>>> > > >> > > >>>>>>> > > >> > to
> > > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM
> > > Thomas
> > > > >>>> Weise <
> > > > >>>> > > >> > > >>>>>>> thw@apache.org> <
> > > > >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
> > > > >>>> thw@apache.org>
> > > > >>>> > <
> > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > > > alignment
> > > > >>>> in
> > > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > > >>>> > > >> > > >>>>>>> > > then
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does
> > not
> > > > >>>> implement
> > > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > > >>>> > > >> > > >>>>>>> > > >> could
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > > WithSplitsAlignment
> > > > >>>> can
> > > > >>>> > > >> instead
> > > > >>>> > > >> > be
> > > > >>>> > > >> > > >>>>>>> added to
> > > > >>>> > > >> > > >>>>>>> > the
> > > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > > >>>> > KafkaSourceReader) to
> > > > >>>> > > >> > make
> > > > >>>> > > >> > > >>>>>>> it
> > > > >>>> > > >> > > >>>>>>> > explicit
> > > > >>>> > > >> > > >>>>>>> > > >> that
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM
> > > > >>>> Konstantin
> > > > >>>> > Knauf <
> > > > >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> <
> > > > >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <
> knaufk@apache.org
> > >
> > > <
> > > > >>>> > > >> > knaufk@apache.org
> > > > >>>> > > >> > > >
> > > > >>>> > > >> > > >>>>>>> <
> > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > > >>>> `AlignedSplitReader`
> > > > >>>> > > >> > > interface
> > > > >>>> > > >> > > >>>>>>> (aka the
> > > > >>>> > > >> > > >>>>>>> > > >> stop
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> &
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> > implemented
> > > > for
> > > > >>>> > Kafka and
> > > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> correct?
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is
> > > valuable
> > > > >>>> to
> > > > >>>> > > >> complete
> > > > >>>> > > >> > the
> > > > >>>> > > >> > > >>>>>>> watermark
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> aligned
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM
> > > Dawid
> > > > >>>> > Wysakowicz
> > > > >>>> > > >> <
> > > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > > >>>> dwysakowicz@apache.org>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on
> > it,
> > > I
> > > > >>>> > support it
> > > > >>>> > > >> ;)
> > > > >>>> > > >> > I
> > > > >>>> > > >> > > >>>>>>> think we
> > > > >>>> > > >> > > >>>>>>> > can
> > > > >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as
> > there
> > > > >>>> are no
> > > > >>>> > > >> concerns
> > > > >>>> > > >> > > so
> > > > >>>> > > >> > > >>>>>>> far.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian
> > > Mattheis
> > > > >>>> wrote:
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion
> > on
> > > > >>>> FLIP 217
> > > > >>>> > [1]
> > > > >>>> > > >> for
> > > > >>>> > > >> > > an
> > > > >>>> > > >> > > >>>>>>> > extension
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform
> > > > alignment
> > > > >>>> also
> > > > >>>> > in
> > > > >>>> > > >> > > >>>>>>> SplitReaders. To
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> so,
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to
> > suspend
> > > > and
> > > > >>>> > resume
> > > > >>>> > > >> > reading
> > > > >>>> > > >> > > >>>>>>> from
> > > > >>>> > > >> > > >>>>>>> > split
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> sources
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator
> > coordinates
> > > > and
> > > > >>>> > controlls
> > > > >>>> > > >> > > >>>>>>> suspend and
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> resume.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> To
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> gather information about current
> > > > >>>> watermarks
> > > > >>>> > of the
> > > > >>>> > > >> > > >>>>>>> SplitReaders,
> > > > >>>> > > >> > > >>>>>>> > we
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> extend
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the internal
> > > WatermarkOutputMulitplexer
> > > > >>>> and
> > > > >>>> > report
> > > > >>>> > > >> > > >>>>>>> watermarks to
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2],
> > > > >>>> prototyped
> > > > >>>> > by
> > > > >>>> > > >> Arvid
> > > > >>>> > > >> > > >>>>>>> Heise and
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> revised
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> and
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He
> > did
> > > > >>>> most of
> > > > >>>> > the
> > > > >>>> > > >> > work.)
> > > > >>>> > > >> > > >>>>>>> and me.
> > > > >>>> > > >> > > >>>>>>> > The
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> changes
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way
> > > that
> > > > if
> > > > >>>> > affected
> > > > >>>> > > >> > > >>>>>>> components do
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> not
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the
> > behavior
> > > is
> > > > >>>> as
> > > > >>>> > before.
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > >
> > > > >>>> > > >> >
> > > > >>>> > > >>
> > > > >>>> >
> > > > >>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> [2]
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> --
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> twitter.com/snntrablehttps://github.com/knaufk
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > > >>>> > > >> > > >>>>>>> > > >> >
> > > > >>>> > > >> > > >>>>>>> > > >>
> > > > >>>> > > >> > > >>>>>>> > > >
> > > > >>>> > > >> > > >>>>>>> > >
> > > > >>>> > > >> > > >>>>>>> >
> > > > >>>> > > >> > > >>>>>>>
> > > > >>>> > > >> > > >>>>>>
> > > > >>>> > > >> > >
> > > > >>>> > > >> >
> > > > >>>> > > >>
> > > > >>>> > > >
> > > > >>>> >
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Thanks for the explanation.

> 2. It is fully compatible with FLIP-182, if we consider it as the right
> thing to throw an exception for readers reading from multiple splits
> without supporting split pausing.

I think that's fine. But the question is should we provide a workaround for
existing users? IMO if it's easy to do, we should.

> I actually think neither solution 1 or 2 breaks FLIP-182 users,

They do. User has currently a working Flink 1.15 deployment, where
watermark alignment maybe is not behaving ideally, but it's working to some
extent and you are proposing to throw them an exception after upgrading
Flink, without any workaround (short of implementing a feature, which is a
very problematic requirement). Given that costly upgrades are one of the
major complaints, I would be definitely in favor of option 2. Given the
most likely small actually affected user base, I would be +1 for solution 2
with throwing an exception by default.

Best,
Piotrek


śr., 29 cze 2022 o 12:55 Becket Qin <be...@gmail.com> napisał(a):

> Hi Piotr,
>
> Please see the reply inline below:
>
> On Wed, Jun 29, 2022 at 5:11 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
> > Hi Becket,
> >
> > > My main concern of having a supportsPausingSplits() knob
> >
> > What is the problem with `supportsPausingSplits()` that you see?  Do you
> > want to remove it?
> >
> Just to make sure we are on the same page, I assume we are talking about
> this supportingPausingSplits() method in the Source interface. If we go
> with the obligatory features addition path, having this method seems
> misleading. And also, later on at some point when we see all the sources
> have implemented this feature, we will have to worry about deprecating this
> method, which is backwards incompatible.
>
>
> > Also I don't understand your proposal for Solution 1. How do you want to
> > decide whether to throw an exception? For that we would need to have
> > `supportsPausingSplits()`, right?
> >
>
> What I am thinking is the following:
>
> 1. The Flink framework always assumes split pausing is supported and just
> naively invokes SourceReader#pauseOrResumeSplits().
> 2. The SourceReaderBase will basically again try to ask the SplitReader to
> pause the splits.
> 3. Because the default implementation throws an
> UnsupportedOperationException, if the source developer did not override it,
> this exception will be thrown and bubbled up.
> 4. After catching this exception, the SourceReaderBase will just check if
> there is only one split that is currently assigned to the split reader. If
> so, it swallows the exception, stops polling the split reader and returns
> NOTHING_AVAILABLE. This is the same as the current logic in the
> SourceOperator. If we are not comfortable with moving this logic to the
> SourceReaderBase, we can also just keep the logic there and simply let
> SourceOperator remember if there are more than one split assigned to the
> source reader, when SourceOperator.handleAddSplitsEvent() is invoked.
>
> This way the existing FLIP-182 users won't be impacted by this FLIP. For
> those source readers that only have one split assigned, it works fine
> without any change. For those source readers with multiple splits assigned,
> they are already in a limp state with unpredictable side effects. We might
> as well let them know this instead of pretending the
> coarse-grained watermark alignment works fine for them.
>
> The advantage of this solution is that we don't have to do anything after
> this. That would work fine as the final state, as in:
> 1. We have already done the best we can do for the Sources that do not
> support split pausing.
> 2. It is fully compatible with FLIP-182, if we consider it as the right
> thing to throw an exception for readers reading from multiple splits
> without supporting split pausing.
> 3. There is nothing to deprecate in the future.
>
>
> >
> > If so, I would prefer solution 2, to provide a graceful migration path
> for
> > any users that are already using FLIP-182 with multiple splits per
> > operator. I don't think there are many of those, but such a flag seems
> easy
> > to implement while making migration easier. Having said that, technically
> > we could change the behaviour and start throwing an exception always in
> > such a case, as this feature is marked as Experimental.
> >
>
> I actually think neither solution 1 or 2 breaks FLIP-182 users, but
> solution 2 needs a deprecation process for the option in the future.
>
>
> >
> > Best,
> > Piotrek
> >
> > śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com> napisał(a):
> >
> > > Hi Sebastian,
> > >
> > > Regarding the question,
> > >
> > > >
> > > > @Becket: I'm not sure about the intention of solution 1. Can you
> > explain
> > > > that a bit more? In particular, I don't understand: "The reasoning
> > behind
> > > > this solution is that existing users should only use the
> > > > coarse watermark alignment when a source reader only reads from a
> > single
> > > > split." Why should a user not use coarse watermark alignment when
> > source
> > > > reader reads from multiple splits? The split alignment uses the
> "coarse
> > > > watermark", i.e., maxDesiredWatermark, as described in the FLIP for
> > > > alignment.
> > >
> > >
> > > Imagine you have a source reader reading from two splits, and the
> > > watermarks look like the following:
> > > 1. Watermark of Split 1: 10:00 AM Jun 29,
> > > 2. Watermark of Split 2: 11:00 AM Jun 29
> > > 3. maxDesiredWatermark:10:30 AM Jun 29
> > >
> > > At this point, the source reader's watermark is 10:00 AM which is lower
> > > than the maxDesiredWatermark, so the source reader won't be paused from
> > > reading. However, because the source reader cannot specify which split
> to
> > > read from, if it continues to read, the watermark gap between the two
> > > splits may become even bigger. This essentially fails the main purpose
> of
> > > watermark alignment - to reduce the number of records buffered in the
> > > state. This does not necessarily happen, but this is not what FLIP-182
> > was
> > > designed for to begin with. So I'd rather avoid extending the feature
> > > to that case.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> > > sebastian@ververica.com>
> > > wrote:
> > >
> > > > (Sorry I didn't send to the list but only to Becket. My bad and
> thanks
> > > > Piotr. Next attempt:)
> > > >
> > > > Hi all,
> > > >
> > > > Thanks for pushing the FLIP. I would drive it and would be happy to
> get
> > > > back to you, @Thomas, for reviews. (Sorry for low responsiveness,
> there
> > > > were several efforts with high priority on my side ...) As next
> step, I
> > > > would revise the FLIP to get the discussion concluded.
> > > >
> > > > However, as Becket mentioned I feel that some things are still not
> > clear
> > > > yet:
> > > >
> > > > Re: Thomas
> > > >>
> > > >> However, from a user perspective, should the split level alignment
> be
> > > >>> an opt-in feature, at least for a few releases? If yes, then we
> would
> > > >>> require a knob similar to supportsPausingSplits(), which I
> understand
> > > >>> won't be part of the revised FLIP. Such control may be beneficial:
> > > >>
> > > >>
> > > >>> * Compare runtime behavior with split level alignment on/off
> > > >>> * Allow use of sources that don't implement pausing splits yet
> > > >>
> > > >>
> > > >>> The second point would, from the user's perspective, be necessary
> for
> > > >>> backward compatibility? While the interface aspect and source
> > > >>> compatibility has been discussed in great detail, I don't think it
> > > >>> would be desirable if an application that already uses alignment
> > fails
> > > >>> after upgrading to the new Flink version, forcing users to lock
> step
> > > >>> modify sources for the new non-optional split level alignment.
> > > >>
> > > >>
> > > >>> So I think clarification of the compatibility aspect on the FLIP
> page
> > > >>> would be necessary.
> > > >>
> > > >>
> > > >> That is a good point. Currently FLIP-182 is marked as experimental.
> So
> > > >> technically speaking it could change. That said, I agree that it
> would
> > > be
> > > >> good to not break the existing sources.
> > > >>
> > > >> My main concern of having a supportsPausingSplits() knob is that
> this
> > > >> might be something requiring code change on future derepcation. I am
> > > >> thinking of two potential solutions:
> > > >>
> > > >> Solution 1:
> > > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if
> the
> > > >> source reader only has one split assigned, the source reader simply
> > > stops
> > > >> polling but just returns NOTHING_AVAILABLE. If there are more than
> one
> > > >> splits assigned, it throws an exception with a message such as "The
> > > >> unpausable SplitReader CLASS_NAME only works with watermark
> alignment
> > > >> when assigned a single split. There are more than one split assigned
> > to
> > > the
> > > >> SplitReader".
> > > >> The reasoning behind this solution is that existing users should
> only
> > > use
> > > >> the coarse watermark alignment when a source reader only reads from
> a
> > > >> single split. Reading from more than one split might have unwanted
> > side
> > > >> effects, so we might as well throw an exception in this case.
> > > >>
> > > >> Solution 2:
> > > >> Having a configuration "enable.coarse-grained.watermark.alignment",
> > the
> > > >> default value is false. Once it is set to true, we will allow
> > > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > > >> This solution allows users to keep the current FLIP-182 behavior,
> with
> > > >> the risk of side effects.
> > > >>
> > > >> Personally speaking, I feel solution 1 seems better because
> > > >> coarse-grained watermark alignment could be frustrating to the users
> > > >> when more than one split is assigned. So we might as well not
> support
> > > it at
> > > >> all. And also there is nothing to deprecate in the future with this
> > > >> solution.
> > > >>
> > > >> What do you think?
> > > >>
> > > >
> > > > @Thomas: My understanding is that you intend a simple switch/knob to
> > test
> > > > w/ and w/o (split) watermark alignment, right? Isn't the
> coarse-grained
> > > w/
> > > > vs w/o watermark alignment sufficient for that? Or do you think that
> > > > switching watermark aligment explicitly on split level is required?
> > > >
> > > > @Becket: I'm not sure about the intention of solution 1. Can you
> > explain
> > > > that a bit more? In particular, I don't understand: "The reasoning
> > behind
> > > > this solution is that existing users should only use the coarse
> > watermark
> > > > alignment when a source reader only reads from a single split." Why
> > > > should a user not use coarse watermark alignment when source reader
> > reads
> > > > from multiple splits? The split alignment uses the "coarse
> watermark",
> > > > i.e., maxDesiredWatermark, as described in the FLIP for alignment.
> > > >
> > > > Could you please clarify?
> > > >
> > > > Regards,
> > > > Sebastian
> > > >
> > > > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > >> Thanks for the feedback, Thomas and Steve. And thanks Piotr for the
> > > >> patient and detailed discussion.
> > > >>
> > > >> Let's move forward with option 1 then.
> > > >>
> > > >> Re: Thomas
> > > >>
> > > >> However, from a user perspective, should the split level alignment
> be
> > > >>> an opt-in feature, at least for a few releases? If yes, then we
> would
> > > >>> require a knob similar to supportsPausingSplits(), which I
> understand
> > > >>> won't be part of the revised FLIP. Such control may be beneficial:
> > > >>
> > > >>
> > > >>> * Compare runtime behavior with split level alignment on/off
> > > >>> * Allow use of sources that don't implement pausing splits yet
> > > >>
> > > >>
> > > >>> The second point would, from the user's perspective, be necessary
> for
> > > >>> backward compatibility? While the interface aspect and source
> > > >>> compatibility has been discussed in great detail, I don't think it
> > > >>> would be desirable if an application that already uses alignment
> > fails
> > > >>> after upgrading to the new Flink version, forcing users to lock
> step
> > > >>> modify sources for the new non-optional split level alignment.
> > > >>
> > > >>
> > > >>> So I think clarification of the compatibility aspect on the FLIP
> page
> > > >>> would be necessary.
> > > >>
> > > >>
> > > >> That is a good point. Currently FLIP-182 is marked as experimental.
> So
> > > >> technically speaking it could change. That said, I agree that it
> would
> > > be
> > > >> good to not break the existing sources.
> > > >>
> > > >> My main concern of having a supportsPausingSplits() knob is that
> this
> > > >> might be something requiring code change on future derepcation. I am
> > > >> thinking of two potential solutions:
> > > >>
> > > >> Solution 1:
> > > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if
> the
> > > >> source reader only has one split assigned, the source reader simply
> > > stops
> > > >> polling but just returns NOTHING_AVAILABLE. If there are more than
> one
> > > >> splits assigned, it throws an exception with a message such as "The
> > > >> unpausable SplitReader CLASS_NAME only works with watermark
> alignment
> > > when
> > > >> assigned a single split. There are more than one split assigned to
> the
> > > >> SplitReader".
> > > >> The reasoning behind this solution is that existing users should
> only
> > > use
> > > >> the coarse watermark alignment when a source reader only reads from
> a
> > > >> single split. Reading from more than one split might have unwanted
> > side
> > > >> effects, so we might as well throw an exception in this case.
> > > >>
> > > >> Solution 2:
> > > >> Having a configuration "enable.coarse-grained.watermark.alignment",
> > the
> > > >> default value is false. Once it is set to true, we will allow
> > > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > > >> This solution allows users to keep the current FLIP-182 behavior,
> with
> > > >> the risk of side effects.
> > > >>
> > > >> Personally speaking, I feel solution 1 seems better because
> > > >> coarse-grained watermark alignment could be frustrating to the users
> > > when
> > > >> more than one split is assigned. So we might as well not support it
> at
> > > all.
> > > >> And also there is nothing to deprecate in the future with this
> > solution.
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Thank,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >>
> > > >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <
> pnowojski@apache.org>
> > > >> wrote:
> > > >>
> > > >>> Hi,
> > > >>>
> > > >>> It looks like option 1 wins overall? So let's go with that.
> > > >>>
> > > >>> Best,
> > > >>> Piotrek
> > > >>>
> > > >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com>
> > napisał(a):
> > > >>>
> > > >>>> Both option 1 (default impl in base interface) and option 2
> > > (decorative
> > > >>>> interface) are pretty common patterns. I would also be fine with
> > > either.
> > > >>>> The important thing is that an exception is thrown if a source
> > doesn't
> > > >>>> support the alignment capability.
> > > >>>>
> > > >>>> The other point is that we can validate the source capability if
> > > >>>> alignment
> > > >>>> is enabled in WatermarkStrategy. I believe either option can
> achieve
> > > >>>> this
> > > >>>> goal too.
> > > >>>> public interface WatermarkStrategy<T> {
> > > >>>>     WatermarkStrategy<T> withWatermarkAlignment(String
> > watermarkGroup,
> > > >>>> Duration maxAllowedWatermarkDrift);
> > > >>>> }
> > > >>>>
> > > >>>> If I have to pick one, I am slightly favoring option 1 (base
> > > >>>> interface). As
> > > >>>> watermark is already an essential concept of source, maybe
> watermark
> > > >>>> alignment capability can also be a property of the base
> > source/reader
> > > >>>> interface.
> > > >>>>
> > > >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org>
> > wrote:
> > > >>>>
> > > >>>> > Hi everyone,
> > > >>>> >
> > > >>>> > Thank you for all the effort that went into this discussion. The
> > > split
> > > >>>> > level watermark alignment will be an important feature for Flink
> > > that
> > > >>>> > will address operational problems for various use cases. From
> > > reading
> > > >>>> > through this thread it appears that not too much remains to
> bring
> > > this
> > > >>>> > FLIP to acceptance and allow development to move forward. I
> would
> > > like
> > > >>>> > to contribute if possible.
> > > >>>> >
> > > >>>> > Regarding option 1 vs. option 2: I don't have a strong
> preference,
> > > >>>> > perhaps slightly leaning towards option 1.
> > > >>>> >
> > > >>>> > However, from a user perspective, should the split level
> alignment
> > > be
> > > >>>> > an opt-in feature, at least for a few releases? If yes, then we
> > > would
> > > >>>> > require a knob similar to supportsPausingSplits(), which I
> > > understand
> > > >>>> > won't be part of the revised FLIP. Such control may be
> beneficial:
> > > >>>> >
> > > >>>> > * Compare runtime behavior with split level alignment on/off
> > > >>>> > * Allow use of sources that don't implement pausing splits yet
> > > >>>> >
> > > >>>> > The second point would, from the user's perspective, be
> necessary
> > > for
> > > >>>> > backward compatibility? While the interface aspect and source
> > > >>>> > compatibility has been discussed in great detail, I don't think
> it
> > > >>>> > would be desirable if an application that already uses alignment
> > > fails
> > > >>>> > after upgrading to the new Flink version, forcing users to lock
> > step
> > > >>>> > modify sources for the new non-optional split level alignment.
> > > >>>> >
> > > >>>> > So I think clarification of the compatibility aspect on the FLIP
> > > page
> > > >>>> > would be necessary.
> > > >>>> >
> > > >>>> > Thanks,
> > > >>>> > Thomas
> > > >>>> >
> > > >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> > > >>>> piotr.nowojski@gmail.com>
> > > >>>> > wrote:
> > > >>>> > >
> > > >>>> > > Hi Becket,
> > > >>>> > >
> > > >>>> > > Thanks for summing this up. Just one correction:
> > > >>>> > >
> > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > >>>> > > >   e) It is OK that the code itself in option 2 indicates the
> > > >>>> developers
> > > >>>> > > that a feature is optional. We will rely on the documentation
> to
> > > >>>> correct
> > > >>>> > > that and clarify that the feature is actually obligatory.
> > > >>>> > >
> > > >>>> > > I would say based on a) and b) that feature would be still
> > > >>>> optional. So
> > > >>>> > > both the implementation and the documentation would be saying
> > > that.
> > > >>>> We
> > > >>>> > > could add a mention to the docs and release notes, that this
> > > >>>> feature will
> > > >>>> > > be obligatory in the next major release and plan such a
> release
> > > >>>> > accordingly.
> > > >>>> > >
> > > >>>> > > Re the option 1., as you mentioned:
> > > >>>> > > > As for option 1: For developers, the feature is still
> optional
> > > >>>> due to
> > > >>>> > the
> > > >>>> > > default implementation in the interface, regardless of what
> the
> > > >>>> default
> > > >>>> > > implementation does, because the code compiles without
> > overriding
> > > >>>> these
> > > >>>> > > methods
> > > >>>> > >
> > > >>>> > > Also importantly, the code will work in most cases.
> > > >>>> > >
> > > >>>> > > > Obligatory: Jobs may fail if these methods are not
> implemented
> > > >>>> > properly.
> > > >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common
> > pattern
> > > in
> > > >>>> > Java,
> > > >>>> > > e.g. Iterator.remove() by default throws
> > > >>>> "UnsupportedOperationException",
> > > >>>> > > informing the implementation that things may go wrong if this
> > > >>>> method is
> > > >>>> > not
> > > >>>> > > implemented.
> > > >>>> > >
> > > >>>> > > For me `Iterator#remove()` is an optional feature.
> Personally, I
> > > >>>> don't
> > > >>>> > > remember if I have ever implemented it.
> > > >>>> > >
> > > >>>> > > Best,
> > > >>>> > > Piotrek
> > > >>>> > >
> > > >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com>
> > > >>>> napisał(a):
> > > >>>> > >
> > > >>>> > > > I had an offline discussion with Piotr and here is the
> > summary.
> > > >>>> Please
> > > >>>> > > > correct me if I miss something, Piotr.
> > > >>>> > > >
> > > >>>> > > > There are two things we would like to seek more opinions
> from
> > > the
> > > >>>> > > > community, so we can make progress on this FLIP.
> > > >>>> > > >
> > > >>>> > > > 1. The General pattern to add obligatory features to
> existing
> > > >>>> > interfaces.
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> >
> > > >>>>
> > >
> >
> ***********************************************************************************
> > > >>>> > > > For interfaces exposed to the developers for implementation,
> > > they
> > > >>>> are
> > > >>>> > > > either intended to be *optional* or *obligatory. *While it
> is
> > > >>>> quite
> > > >>>> > clear
> > > >>>> > > > about how to convey that intention when creating the
> > interfaces,
> > > >>>> it is
> > > >>>> > not
> > > >>>> > > > as commonly agreed when we are adding new features to an
> > > existing
> > > >>>> > > > interface. In general, Flink uses decorative interfaces when
> > > >>>> adding
> > > >>>> > > > optional features to existing interfaces. Both Piotr and I
> > agree
> > > >>>> that
> > > >>>> > looks
> > > >>>> > > > good.
> > > >>>> > > >
> > > >>>> > > > Different opinions are mainly about how to add obligatory
> > > >>>> features to
> > > >>>> > the
> > > >>>> > > > existing interfaces, probably due to different
> understandings
> > of
> > > >>>> > > > "obligatory".
> > > >>>> > > >
> > > >>>> > > > We have discussed about four options:
> > > >>>> > > >
> > > >>>> > > > *Option 1:*
> > > >>>> > > >
> > > >>>> > > >    - Just add a new method to the existing interface.
> > > >>>> > > >    - For backwards compatibility, the method would have a
> > > default
> > > >>>> > > >    implementation throwing "UnsupportedOperationException".
> > > >>>> > > >    - In the next major version, remove the default
> > > implementation.
> > > >>>> > > >    - For the developers, any method with a default
> > > implementation
> > > >>>> > > >    throwing an "UnsupportedOperationException" should be
> taken
> > > as
> > > >>>> > obligatory.
> > > >>>> > > >
> > > >>>> > > > *Option 2:*
> > > >>>> > > >
> > > >>>> > > >    - Always make the features optional by adding a
> decorative
> > > >>>> > interface,
> > > >>>> > > >    just like ordinary optional features.
> > > >>>> > > >    - Inform the developers via documentation that this
> feature
> > > is
> > > >>>> > > >    obligatory, although it looks like optional from the
> code.
> > > >>>> > > >    - In case the developers did not implement the decorative
> > > >>>> interface,
> > > >>>> > > >    throw an exception
> > > >>>> > > >    - In the next major version, move the methods in the
> > > decorative
> > > >>>> > > >    interface to the base interface, and deprecate the
> > decorative
> > > >>>> > interface.
> > > >>>> > > >
> > > >>>> > > > *Option 3:*
> > > >>>> > > >
> > > >>>> > > >    - Always bump the major version when a new obligatory
> > feature
> > > >>>> is
> > > >>>> > > >    added, even if we may have to do it frequently.
> > > >>>> > > >
> > > >>>> > > > *Option 4:*
> > > >>>> > > >
> > > >>>> > > >    - Add a V2, V3... of the interface affected by the new
> > > >>>> obligatory
> > > >>>> > > >    feature.
> > > >>>> > > >    - In the next major versions, deprecate old versions of
> the
> > > >>>> > interfaces.
> > > >>>> > > >
> > > >>>> > > > Both Piotr and me agreed that option 3 and option 4 have a
> big
> > > >>>> side
> > > >>>> > effect
> > > >>>> > > > and should be avoided. We have different preference between
> > > >>>> option 1
> > > >>>> > and
> > > >>>> > > > option 2.
> > > >>>> > > >
> > > >>>> > > > Personally I prefer option 1, the reasons are:
> > > >>>> > > >   a) simple and intuitive. Java 8 introduced the default
> impl
> > in
> > > >>>> > > > interfaces exactly for interface evolving, and this is a
> > common
> > > >>>> > pattern in
> > > >>>> > > > many projects.
> > > >>>> > > >   b) prominent to the developers that the feature is
> expected
> > to
> > > >>>> be
> > > >>>> > > > implemented, because it explicitly throws an exception in
> the
> > > >>>> default
> > > >>>> > impl.
> > > >>>> > > >   c) low maintenance overhead - the Flink framework can
> always
> > > >>>> assume
> > > >>>> > the
> > > >>>> > > > method exists, so no special handling logic is needed.
> > > >>>> > > >   d) communicate a clear semantic boundary between optional
> > and
> > > >>>> > obligatory
> > > >>>> > > > features in the Flink to the developers.
> > > >>>> > > >       - Optional: Jobs still run without exception if these
> > > >>>> methods are
> > > >>>> > > > not implemented. e.g. all the SupportsXXXPushDown
> interfaces.
> > > >>>> > > >       - Obligatory: Jobs may fail if these methods are not
> > > >>>> implemented
> > > >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a
> > > >>>> common
> > > >>>> > pattern
> > > >>>> > > > in Java, e.g. Iterator.remove() by default throws
> > > >>>> > > > "UnsupportedOperationException", informing the
> implementation
> > > that
> > > >>>> > things
> > > >>>> > > > may go wrong if this method is not implemented.
> > > >>>> > > >
> > > >>>> > > > As for option 2, Although the API itself sounds clean, it
> > > misleads
> > > >>>> > people
> > > >>>> > > > to think of an obligatory feature to be optional - from the
> > code
> > > >>>> the
> > > >>>> > > > feature is optional, but the documents say it is obligatory.
> > We
> > > >>>> > probably
> > > >>>> > > > should avoid such code-doc inconsistency, as people will be
> > > >>>> confused.
> > > >>>> > And I
> > > >>>> > > > would actually be bewildered that sometimes not implementing
> > an
> > > >>>> > "optional"
> > > >>>> > > > feature is fine, but sometimes it causes the jobs to fail.
> > > >>>> > > >
> > > >>>> > > > In response to the argument that the method with a default
> > > >>>> > implementation
> > > >>>> > > > is always optional, if that is true, it actually means all
> the
> > > >>>> > interfaces
> > > >>>> > > > should be immutable once they are created. If we want to
> add a
> > > >>>> method
> > > >>>> > to an
> > > >>>> > > > existing interface, for backwards compatibility, we will
> have
> > to
> > > >>>> > provide a
> > > >>>> > > > default implementation. And the fact it has a default
> > > >>>> implementation
> > > >>>> > > > indicates the method is optional. If that method is
> optional,
> > it
> > > >>>> should
> > > >>>> > > > reside in a separate decorative interface, otherwise it
> clogs
> > > that
> > > >>>> > existing
> > > >>>> > > > interface. Therefore, people should never add a method to an
> > > >>>> existing
> > > >>>> > > > interface. I find this conclusion a bit extreme.
> > > >>>> > > >
> > > >>>> > > > Piotr prefers option 2, his opinions are:
> > > >>>> > > >     a) Obligatory methods are the methods that fail the code
> > > >>>> > compilation
> > > >>>> > > > if not implemented.
> > > >>>> > > >     b) All obligatory methods should reside in the base
> > > interface,
> > > >>>> > without
> > > >>>> > > > a default implementation. And all the optional methods
> should
> > be
> > > >>>> in
> > > >>>> > > > decorative interfaces. This is a clean API.
> > > >>>> > > >     c) due to b), there isn't a viable solution to add an
> > > >>>> obligatory
> > > >>>> > > > method to an existing interface in a backwards compatible
> way.
> > > >>>> Unless
> > > >>>> > we
> > > >>>> > > > are OK with breaking backwards compatibility, all the
> > interfaces
> > > >>>> > should be
> > > >>>> > > > treated as immutable. As a compromise, we might as well just
> > > >>>> treat all
> > > >>>> > the
> > > >>>> > > > features added later as optional features. This way we keep
> > the
> > > >>>> API
> > > >>>> > clean.
> > > >>>> > > >     d) based on b) and c), option 2 has a clean API, while
> > > option
> > > >>>> 1
> > > >>>> > does
> > > >>>> > > > not.
> > > >>>> > > >     e) It is OK that the code itself in option 2 indicates
> the
> > > >>>> > developers
> > > >>>> > > > that a feature is optional. We will rely on the
> documentation
> > to
> > > >>>> > correct
> > > >>>> > > > that and clarify that the feature is actually obligatory.
> > > >>>> > > >     f) Regarding the effectiveness of making people aware
> that
> > > the
> > > >>>> > feature
> > > >>>> > > > is obligatory, Option 1 and Option 2 are similar. For people
> > > that
> > > >>>> do
> > > >>>> > not
> > > >>>> > > > read the release note / documentation, they will mistake the
> > > >>>> feature
> > > >>>> > to be
> > > >>>> > > > optional anyways.
> > > >>>> > > >
> > > >>>> > > > As for option 1: For developers, the feature is still
> optional
> > > >>>> due to
> > > >>>> > the
> > > >>>> > > > default implementation in the interface, regardless of what
> > the
> > > >>>> default
> > > >>>> > > > implementation does, because the code compiles without
> > > overriding
> > > >>>> these
> > > >>>> > > > methods. Also, another problem of this option is that for
> > users
> > > >>>> that
> > > >>>> > do not
> > > >>>> > > > know about the history of the interface, they may be
> confused
> > by
> > > >>>> the
> > > >>>> > > > default implementation throwing an exception.
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > > 2. For this particular FLIP, should it be optional or not?
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> >
> > > >>>>
> > >
> >
> ***********************************************************************************
> > > >>>> > > > As mentioned in the previous email, I feel this FLIP should
> be
> > > >>>> > obligatory,
> > > >>>> > > > for the following reasons:
> > > >>>> > > > 1. The Flink framework exposes the watermark alignment API
> to
> > > the
> > > >>>> end
> > > >>>> > > > users. From the end users' perspective, the feature should
> be
> > > >>>> available
> > > >>>> > > > regardless of the implementation details in the pluggables.
> > This
> > > >>>> is
> > > >>>> > true
> > > >>>> > > > for any other methods exposed as the Flink API.
> > > >>>> > > > 2. If a Source is not pausable, the end user should receive
> an
> > > >>>> > exception
> > > >>>> > > > when enable the watermark alignment, (both Piotr and me
> agree
> > on
> > > >>>> > this). In
> > > >>>> > > > that case, it meets my criteria of obligatory feature
> because
> > > not
> > > >>>> > > > implementing the feature causes a framework API to throw
> > > >>>> exception and
> > > >>>> > > > fails the job.
> > > >>>> > > >
> > > >>>> > > > On the other hand, Piotr does not have a strong opinion
> > > regarding
> > > >>>> > whether
> > > >>>> > > > this feature should be optional or not.
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > > Thanks for reading through this long email. So basically in
> > > order
> > > >>>> to
> > > >>>> > make
> > > >>>> > > > progress on this FLIP, we want to see what do people feel
> > about
> > > >>>> the
> > > >>>> > above
> > > >>>> > > > two topics.
> > > >>>> > > >
> > > >>>> > > > Thanks,
> > > >>>> > > >
> > > >>>> > > > Jiangjie (Becket) Qin
> > > >>>> > > >
> > > >>>> > > >
> > > >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> > > >>>> pnowojski@apache.org>
> > > >>>> > > > wrote:
> > > >>>> > > >
> > > >>>> > > >> Hi Becket,
> > > >>>> > > >>
> > > >>>> > > >> I still sustain what I wrote before:
> > > >>>> > > >> > I think I would still vote soft -1 on this option, but I
> > > >>>> wouldn't
> > > >>>> > block
> > > >>>> > > >> it in case I am out-voted.
> > > >>>> > > >>
> > > >>>> > > >> > I think it might be helpful to agree on the definition of
> > > >>>> optional
> > > >>>> > in
> > > >>>> > > >> our
> > > >>>> > > >> case.
> > > >>>> > > >>
> > > >>>> > > >> For me it doesn't matter whether a default method throwing
> an
> > > >>>> > exception we
> > > >>>> > > >> call optional or non-optional. As long as we keep it this
> > way,
> > > >>>> the
> > > >>>> > effect
> > > >>>> > > >> is the same. It's effectively a method that a user doesn't
> > have
> > > >>>> to
> > > >>>> > > >> implement. If interface/system allows some methods to be
> not
> > > >>>> > implemented,
> > > >>>> > > >> some users will do just that, regardless if we call it and
> > > >>>> document as
> > > >>>> > > >> non-optional. And at the same time it's clogging the base
> > > >>>> interface.
> > > >>>> > > >>
> > > >>>> > > >> By the way, just the need for a java-doc/documentation
> > > >>>> explaining the
> > > >>>> > > >> existence of some construct is a bad smell (code should be
> > > >>>> > > >> self-documenting
> > > >>>> > > >> and default method throwing an
> UnsupportedOperationException
> > is
> > > >>>> not).
> > > >>>> > > >>
> > > >>>> > > >> > Please note that so far we do not assume whether the
> > feature
> > > >>>> is in
> > > >>>> > > >> > the original API or it is added later. A newly added
> > feature
> > > >>>> can
> > > >>>> > also be
> > > >>>> > > >> > non-optional, although it might take some time for all
> the
> > > >>>> pluggable
> > > >>>> > > >> > developers to catch up, and they should still work if the
> > new
> > > >>>> > feature is
> > > >>>> > > >> > not used until they catch up. In contrast, we may never
> > > expect
> > > >>>> an
> > > >>>> > > >> optional
> > > >>>> > > >> > feature to catch up, because leaving it unimplemented is
> > also
> > > >>>> > blessed.
> > > >>>> > > >> >
> > > >>>> > > >> > Let's take the checkpointing as an example. Imagine Flink
> > did
> > > >>>> not
> > > >>>> > > >> support
> > > >>>> > > >> > checkpointing before release 1.16. And now we are trying
> to
> > > add
> > > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > > >>>> configuration
> > > >>>> > to
> > > >>>> > > >> the
> > > >>>> > > >> > end users. In the meantime, will we tell the pluggable
> > (e.g.
> > > >>>> > operators,
> > > >>>> > > >> > connectors) developers that methods like
> "snapshotState()"
> > is
> > > >>>> > optional?
> > > >>>> > > >> If
> > > >>>> > > >> > we do that, the availability of checkpointing in Flink
> > would
> > > be
> > > >>>> > severely
> > > >>>> > > >> > weakened. But apparently we should still allow the
> existing
> > > >>>> > > >> implementations
> > > >>>> > > >> > to work without checkpointing. It looks to me that adding
> > the
> > > >>>> > method to
> > > >>>> > > >> the
> > > >>>> > > >> > pluggable interfaces with a default implementation
> throwing
> > > >>>> > > >> > "UnsupportedOperationException" would be the solution
> here.
> > > >>>> Please
> > > >>>> > note
> > > >>>> > > >> > that in this case, having the default implementation does
> > not
> > > >>>> mean
> > > >>>> > this
> > > >>>> > > >> is
> > > >>>> > > >> > optional. It is just the technique to support backwards
> > > >>>> > compatibility in
> > > >>>> > > >> > the feature evolution. The fact that this method is in
> the
> > > base
> > > >>>> > > >> interface
> > > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > > >>>> implement it.
> > > >>>> > > >>
> > > >>>> > > >> I would soft vote -1 for having the default method throwing
> > > >>>> > > >> UnsupportedOperationException as one of thing for this
> > > (FLIP-217)
> > > >>>> > special
> > > >>>> > > >> circumstances.
> > > >>>> > > >>
> > > >>>> > > >> At the moment, without thinking this over too much, I would
> > > vote
> > > >>>> > harder -1
> > > >>>> > > >> for having this as a general rule when adding new features.
> > If
> > > we
> > > >>>> > ever end
> > > >>>> > > >> up with an API that is littered with default methods
> throwing
> > > >>>> > > >> UnsupportedOperationException that are documented as "non
> > > >>>> optional" it
> > > >>>> > > >> would be IMO a big design failure. I would be ok-ish with
> > that,
> > > >>>> only
> > > >>>> > if
> > > >>>> > > >> that was a temporary thing and we had an aggressive plan to
> > > >>>> release
> > > >>>> > more
> > > >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking
> > API
> > > >>>> > > >> compatibility, that would get rid of those default methods.
> > > >>>> Adding
> > > >>>> > > >> checkpointing and methods like "snapshotState()" would IMO
> > > easily
> > > >>>> > justify
> > > >>>> > > >> a
> > > >>>> > > >> new major Flink release. In that case we could add those
> > > methods
> > > >>>> with
> > > >>>> > > >> default implementation for some transition period, a one or
> > two
> > > >>>> minor
> > > >>>> > > >> releases, followed by a clean up in a major release.
> However
> > I
> > > >>>> would
> > > >>>> > still
> > > >>>> > > >> argue that it would be cleaner/better to add a decorative
> > > >>>> interface
> > > >>>> > like
> > > >>>> > > >> `CheckpointedOperator` interface instead of adding those
> > > default
> > > >>>> > methods
> > > >>>> > > >> to
> > > >>>> > > >> the base `Operator` interface.
> > > >>>> > > >>
> > > >>>> > > >> I think I can sum up our disagreement here is that I would
> > like
> > > >>>> to
> > > >>>> > keep
> > > >>>> > > >> the
> > > >>>> > > >> interfaces simpler, with only obligatory methods/features
> on
> > > one
> > > >>>> side
> > > >>>> > and
> > > >>>> > > >> clearly optional features on the other. While you would
> like
> > to
> > > >>>> add an
> > > >>>> > > >> extra third state in between those two?
> > > >>>> > > >>
> > > >>>> > > >> Best,
> > > >>>> > > >> Piotrek
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >>
> > > >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <becket.qin@gmail.com
> >
> > > >>>> > napisał(a):
> > > >>>> > > >>
> > > >>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
> > > >>>> > > >> >
> > > >>>> > > >> > It looks like the key problem is still whether the
> > > >>>> implementation of
> > > >>>> > > >> > pausable splits in the Sources should be optional or not.
> > > >>>> > > >> >
> > > >>>> > > >> > I think it might be helpful to agree on the definition of
> > > >>>> optional
> > > >>>> > in
> > > >>>> > > >> our
> > > >>>> > > >> > case. To me:
> > > >>>> > > >> > Optional = "You CAN leave the method unimplemented, and
> > that
> > > is
> > > >>>> > fine."
> > > >>>> > > >> > Non-Optional = "You CAN leave the method unimplemented,
> but
> > > you
> > > >>>> > SHOULD
> > > >>>> > > >> NOT,
> > > >>>> > > >> > because people assume this works."
> > > >>>> > > >> >
> > > >>>> > > >> > I think one sufficient condition of a Non-Optional
> feature
> > is
> > > >>>> that
> > > >>>> > if
> > > >>>> > > >> the
> > > >>>> > > >> > feature is exposed through the framework API, Flink
> should
> > > >>>> expect
> > > >>>> > the
> > > >>>> > > >> > pluggables to support this feature by default. Otherwise
> > the
> > > >>>> > > >> availability
> > > >>>> > > >> > of that feature becomes undefined.
> > > >>>> > > >> >
> > > >>>> > > >> > Please note that so far we do not assume whether the
> > feature
> > > >>>> is in
> > > >>>> > > >> > the original API or it is added later. A newly added
> > feature
> > > >>>> can
> > > >>>> > also be
> > > >>>> > > >> > non-optional, although it might take some time for all
> the
> > > >>>> pluggable
> > > >>>> > > >> > developers to catch up, and they should still work if the
> > new
> > > >>>> > feature is
> > > >>>> > > >> > not used until they catch up. In contrast, we may never
> > > expect
> > > >>>> an
> > > >>>> > > >> optional
> > > >>>> > > >> > feature to catch up, because leaving it unimplemented is
> > also
> > > >>>> > blessed.
> > > >>>> > > >> >
> > > >>>> > > >> > Let's take the checkpointing as an example. Imagine Flink
> > did
> > > >>>> not
> > > >>>> > > >> support
> > > >>>> > > >> > checkpointing before release 1.16. And now we are trying
> to
> > > add
> > > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > > >>>> configuration
> > > >>>> > to
> > > >>>> > > >> the
> > > >>>> > > >> > end users. In the meantime, will we tell the pluggable
> > (e.g.
> > > >>>> > operators,
> > > >>>> > > >> > connectors) developers that methods like
> "snapshotState()"
> > is
> > > >>>> > optional?
> > > >>>> > > >> If
> > > >>>> > > >> > we do that, the availability of checkpointing in Flink
> > would
> > > be
> > > >>>> > severely
> > > >>>> > > >> > weakened. But apparently we should still allow the
> existing
> > > >>>> > > >> implementations
> > > >>>> > > >> > to work without checkpointing. It looks to me that adding
> > the
> > > >>>> > method to
> > > >>>> > > >> the
> > > >>>> > > >> > pluggable interfaces with a default implementation
> throwing
> > > >>>> > > >> > "UnsupportedOperationException" would be the solution
> here.
> > > >>>> Please
> > > >>>> > note
> > > >>>> > > >> > that in this case, having the default implementation does
> > not
> > > >>>> mean
> > > >>>> > this
> > > >>>> > > >> is
> > > >>>> > > >> > optional. It is just the technique to support backwards
> > > >>>> > compatibility in
> > > >>>> > > >> > the feature evolution. The fact that this method is in
> the
> > > base
> > > >>>> > > >> interface
> > > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > > >>>> implement it.
> > > >>>> > > >> >
> > > >>>> > > >> > When it comes to this FLIP, I think it meets the criteria
> > of
> > > >>>> > > >> non-optional
> > > >>>> > > >> > features, so we should just use the evolution path of
> > > >>>> non-optional
> > > >>>> > > >> > features.
> > > >>>> > > >> >
> > > >>>> > > >> > Thanks,
> > > >>>> > > >> >
> > > >>>> > > >> > Jiangjie (Becket) Qin
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> >
> > > >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > > >>>> > pnowojski@apache.org>
> > > >>>> > > >> > wrote:
> > > >>>> > > >> >
> > > >>>> > > >> > > Hi,
> > > >>>> > > >> > >
> > > >>>> > > >> > > Actually previously I thought about having a decorative
> > > >>>> interface
> > > >>>> > and
> > > >>>> > > >> > > whenever watermark alignment is enabled, checking that
> > the
> > > >>>> source
> > > >>>> > > >> > > implements the decorative interface. If not, throwing
> an
> > > >>>> > exception.
> > > >>>> > > >> > >
> > > >>>> > > >> > > The option with default methods in the source
> interfaces
> > > >>>> throwing
> > > >>>> > > >> > > `UnsupportedOperationException` I think still suffers
> > from
> > > >>>> the
> > > >>>> > same
> > > >>>> > > >> > > problems I mentioned before. It's still an optional
> > > >>>> implementation
> > > >>>> > > >> and at
> > > >>>> > > >> > > the same time it's clogging the base interface. I
> think I
> > > >>>> would
> > > >>>> > still
> > > >>>> > > >> > vote
> > > >>>> > > >> > > soft -1 on this option, but I wouldn't block it in
> case I
> > > am
> > > >>>> > > >> out-voted.
> > > >>>> > > >> > >
> > > >>>> > > >> > > Best,
> > > >>>> > > >> > > Piotrek
> > > >>>> > > >> > >
> > > >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > > >>>> > sebastian@ververica.com>
> > > >>>> > > >> > > napisał(a):
> > > >>>> > > >> > >
> > > >>>> > > >> > > > Hi Becket,
> > > >>>> > > >> > > >
> > > >>>> > > >> > > > Thanks a lot for your fast and detailed response. For
> > me,
> > > >>>> it
> > > >>>> > > >> converges
> > > >>>> > > >> > > and
> > > >>>> > > >> > > > dropping the supportsX method sounds very reasonable
> to
> > > me.
> > > >>>> > (Side
> > > >>>> > > >> note:
> > > >>>> > > >> > > > With "pausable splits" enabled as "default" I think
> we
> > > >>>> > > >> misunderstood.
> > > >>>> > > >> > As
> > > >>>> > > >> > > > you described now "default" I understand as that it
> > > should
> > > >>>> be
> > > >>>> > the
> > > >>>> > > >> new
> > > >>>> > > >> > > > recommended way of implementation, and I think that
> is
> > > >>>> fully
> > > >>>> > valid.
> > > >>>> > > >> > > Before,
> > > >>>> > > >> > > > I understood "default" here as the default
> > > implementation,
> > > >>>> i.e.,
> > > >>>> > > >> > throwing
> > > >>>> > > >> > > > UnsupportedOperationException, which is the exact
> > > >>>> opposite. :) )
> > > >>>> > > >> > > >
> > > >>>> > > >> > > > Nevertheless: As mentioned, an open question for me
> is
> > if
> > > >>>> > watermark
> > > >>>> > > >> > > > alignment should enforce pausable splits. For
> > > >>>> clarification, the
> > > >>>> > > >> > current
> > > >>>> > > >> > > > documentation [1] says:
> > > >>>> > > >> > > >
> > > >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning across
> > tasks
> > > >>>> of the
> > > >>>> > same
> > > >>>> > > >> > > >> source and/or different sources. It does not support
> > > >>>> aligning
> > > >>>> > > >> > > >> splits/partitions/shards in the same task.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> In a case where there are e.g. two Kafka partitions
> > that
> > > >>>> > produce
> > > >>>> > > >> > > >> watermarks at different pace, that get assigned to
> the
> > > >>>> same
> > > >>>> > task
> > > >>>> > > >> > > watermark
> > > >>>> > > >> > > >> might not behave as expected. Fortunately, worst
> case
> > it
> > > >>>> > should not
> > > >>>> > > >> > > perform
> > > >>>> > > >> > > >> worse than without alignment.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> Given the limitation above, we suggest applying
> > > watermark
> > > >>>> > > >> alignment in
> > > >>>> > > >> > > >> two situations:
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>    1. You have two different sources (e.g. Kafka and
> > > >>>> File) that
> > > >>>> > > >> > produce
> > > >>>> > > >> > > >>    watermarks at different speeds
> > > >>>> > > >> > > >>    2. You run your source with parallelism equal to
> > the
> > > >>>> number
> > > >>>> > of
> > > >>>> > > >> > > >>    splits/shards/partitions, which results in every
> > > >>>> subtask
> > > >>>> > being
> > > >>>> > > >> > > assigned a
> > > >>>> > > >> > > >>    single unit of work.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> I personally see no issue in implementing and I see
> no
> > > >>>> reason
> > > >>>> > > >> against
> > > >>>> > > >> > > > implementing this dependency of watermark alignment
> and
> > > >>>> pausable
> > > >>>> > > >> > splits.
> > > >>>> > > >> > > (I
> > > >>>> > > >> > > > think this would even be a good path towards shaping
> > > >>>> watermark
> > > >>>> > > >> > alignment
> > > >>>> > > >> > > in
> > > >>>> > > >> > > > 1.16.) However, "I don't see" means that I would be
> > happy
> > > >>>> to
> > > >>>> > hear
> > > >>>> > > >> > Dawid's
> > > >>>> > > >> > > > and Piotrek's opinions as they implemented watermark
> > > >>>> alignment
> > > >>>> > > >> based on
> > > >>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
> > > >>>> > rationale/background
> > > >>>> > > >> > info
> > > >>>> > > >> > > > from their side.
> > > >>>> > > >> > > >
> > > >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > > >>>> > > >> > > >
> > > >>>> > > >> > > > Regards,
> > > >>>> > > >> > > > Sebastian
> > > >>>> > > >> > > >
> > > >>>> > > >> > > > [1]
> > > >>>> > > >> > > >
> > > >>>> > > >> > >
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> >
> > > >>>>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > >>>> > > >> > > > [2]
> > > >>>> > > >> > > >
> > > >>>> > > >> > >
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > >>>> > > >> > > >
> > > >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > > >>>> > becket.qin@gmail.com>
> > > >>>> > > >> > wrote:
> > > >>>> > > >> > > >
> > > >>>> > > >> > > >> +dev
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> Hi Sebastian,
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> Thank you for the summary. Please see the detailed
> > > replies
> > > >>>> > inline.
> > > >>>> > > >> As
> > > >>>> > > >> > a
> > > >>>> > > >> > > >> recap of my suggestions.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> 1. Pausable splits API.
> > > >>>> > > >> > > >>   a) Add default implementations to methods
> > > >>>> > "pauseOrResumeSplits"
> > > >>>> > > >> in
> > > >>>> > > >> > > both
> > > >>>> > > >> > > >> SourceReader and SplitReader where both default
> > > >>>> implementations
> > > >>>> > > >> throw
> > > >>>> > > >> > > >>  UnsupportedOperationException.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> 2. User story.
> > > >>>> > > >> > > >>     a) We tell users to enable the watermark
> alignment
> > > as
> > > >>>> they
> > > >>>> > > >> like.
> > > >>>> > > >> > > This
> > > >>>> > > >> > > >> is exactly what the current Flink API is.
> > > >>>> > > >> > > >>     b) We tell the source developers, please
> implement
> > > >>>> pausable
> > > >>>> > > >> > splits,
> > > >>>> > > >> > > >> otherwise bad things may happen. Think of it like
> you
> > > are
> > > >>>> > expected
> > > >>>> > > >> to
> > > >>>> > > >> > > >> implement SourceReader#snapshotState() properly,
> > > otherwise
> > > >>>> > > >> exceptions
> > > >>>> > > >> > > will
> > > >>>> > > >> > > >> be thrown when users enable checkpointing.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> Thanks,
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> > > >>>> > > >> > > >> sebastian@ververica.com> wrote:
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>> Hi Becket, Hi everybody,
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> I'm sorry if I misread the messages but I could not
> > > >>>> derive an
> > > >>>> > > >> > agreement
> > > >>>> > > >> > > >>> from the mailing list. Nevertheless, if I
> understand
> > > you
> > > >>>> > right the
> > > >>>> > > >> > > >>> suggestion is:
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> * Add default implementations to methods
> > > >>>> > "pauseOrResumeSplits" in
> > > >>>> > > >> > both
> > > >>>> > > >> > > >>> SourceReader and SplitReader where both default
> > > >>>> > implementations
> > > >>>> > > >> throw
> > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >> Yes.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source
> > > >>>> interface.
> > > >>>> > (In
> > > >>>> > > >> the
> > > >>>> > > >> > > >>> following, I refer to supporting this as "pausable
> > > >>>> splits".)
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >> We may no longer need this if pausable splits are
> > > >>>> expected to
> > > >>>> > be
> > > >>>> > > >> > > >> implemented by the source developers, i.e.
> > non-optional.
> > > >>>> Having
> > > >>>> > > >> this
> > > >>>> > > >> > > method
> > > >>>> > > >> > > >> would then be somewhat misleading as it looks like
> the
> > > >>>> sources
> > > >>>> > > >> that do
> > > >>>> > > >> > > not
> > > >>>> > > >> > > >> support pausable splits are also acceptable in the
> > long
> > > >>>> term.
> > > >>>> > So
> > > >>>> > > >> API
> > > >>>> > > >> > > wise,
> > > >>>> > > >> > > >> I'd say maybe we should remove this for this FLIP,
> > > >>>> although I
> > > >>>> > > >> believe
> > > >>>> > > >> > > this
> > > >>>> > > >> > > >> supportXXX pattern itself is still attractive for
> > > optional
> > > >>>> > > >> features.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> To make the conclusions explicit:
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in
> both
> > > >>>> > interfaces
> > > >>>> > > >> > > >>> SourceReader and SplitReader are optional where the
> > > >>>> default is
> > > >>>> > > >> that
> > > >>>> > > >> > it
> > > >>>> > > >> > > >>> doesn't support it. (--> This means that the
> > > >>>> implementation is
> > > >>>> > > >> still
> > > >>>> > > >> > > >>> optional for the source developer.)
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >> It is optional for backwards compatibility with
> > existing
> > > >>>> > sources,
> > > >>>> > > >> as
> > > >>>> > > >> > > they
> > > >>>> > > >> > > >> may still compile without code change. But starting
> > from
> > > >>>> this
> > > >>>> > FLIP,
> > > >>>> > > >> > > Flink
> > > >>>> > > >> > > >> will always optimistically assume that all the
> sources
> > > >>>> support
> > > >>>> > > >> > pausable
> > > >>>> > > >> > > >> splits. If a source does not support pausable
> splits,
> > it
> > > >>>> goes
> > > >>>> > to an
> > > >>>> > > >> > > error
> > > >>>> > > >> > > >> handling path when watermark alignment is enabled on
> > it.
> > > >>>> This
> > > >>>> > is
> > > >>>> > > >> > > different
> > > >>>> > > >> > > >> from a usual optional feature, where no error is
> > > expected.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>> 2. If watermark alignment is enabled in the
> > application
> > > >>>> code
> > > >>>> > by
> > > >>>> > > >> > adding
> > > >>>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy
> while
> > > >>>> > > >> SourceReader or
> > > >>>> > > >> > > >>> SplitReader do not support pausableSplits, we throw
> > an
> > > >>>> > > >> > > >>> UnsupportedOperationException.
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >> Yes.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>> 3. With regard to your statement:
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>>> [...] basically means watermark alignment is an
> > > >>>> non-optional
> > > >>>> > > >> feature
> > > >>>> > > >> > > to
> > > >>>> > > >> > > >>>> the end users.
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> You actually mean that "pausable splits" are
> > > >>>> non-optional for
> > > >>>> > the
> > > >>>> > > >> app
> > > >>>> > > >> > > >>> developer if watermark alignment is enabled.
> However,
> > > >>>> > watermark
> > > >>>> > > >> > > alignment
> > > >>>> > > >> > > >>> is optional and can be enabled/disabled.
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >> Yes, watermark alignment can be enabled/disabled in
> > > >>>> individual
> > > >>>> > > >> sources
> > > >>>> > > >> > > in
> > > >>>> > > >> > > >> Flink jobs, which basically means the code
> supporting
> > > >>>> watermark
> > > >>>> > > >> > > alignment
> > > >>>> > > >> > > >> has to already be there. That again means the Source
> > > >>>> > developers are
> > > >>>> > > >> > also
> > > >>>> > > >> > > >> expected to support pausable splits by default. So
> > this
> > > >>>> way we
> > > >>>> > > >> > > essentially
> > > >>>> > > >> > > >> tell the end users that you may enable / disable
> this
> > > >>>> feature
> > > >>>> > as
> > > >>>> > > >> you
> > > >>>> > > >> > > wish,
> > > >>>> > > >> > > >> and tell the source developers that you SHOULD
> > implement
> > > >>>> this
> > > >>>> > > >> because
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >> end users may turn it on/off at will. And if the
> > source
> > > >>>> does
> > > >>>> > not
> > > >>>> > > >> > support
> > > >>>> > > >> > > >> pausable splits, that goes to an error handling path
> > > when
> > > >>>> > watermark
> > > >>>> > > >> > > >> alignment is enabled on it. So users know they have
> to
> > > >>>> > explicitly
> > > >>>> > > >> > > exclude
> > > >>>> > > >> > > >> this source.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> So far it's totally clear to me and I hope this is
> > what
> > > >>>> you
> > > >>>> > mean.
> > > >>>> > > >> I
> > > >>>> > > >> > > also
> > > >>>> > > >> > > >>> agree with both statements:
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> So making that expectation aligned with the source
> > > >>>> developers
> > > >>>> > > >> seems
> > > >>>> > > >> > > >>>> reasonable.
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> I think this is a simple and clean solution from
> both
> > > >>>> the end
> > > >>>> > user
> > > >>>> > > >> > and
> > > >>>> > > >> > > >>>> source developers' standpoint.
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> However, a last conclusion derives from 3. and is
> an
> > > open
> > > >>>> > question
> > > >>>> > > >> > for
> > > >>>> > > >> > > >>> me:
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> 4. The feature of "pausable splits" is now tightly
> > > bound
> > > >>>> to
> > > >>>> > > >> watermark
> > > >>>> > > >> > > >>> alignment, i.e., if sources do not support
> "pausable
> > > >>>> splits"
> > > >>>> > one
> > > >>>> > > >> can
> > > >>>> > > >> > > not
> > > >>>> > > >> > > >>> enable watermark alignment for these sources. This
> > > >>>> dependency
> > > >>>> > is
> > > >>>> > > >> not
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >>> current status of watermark alignment
> implementation
> > > >>>> because
> > > >>>> > it
> > > >>>> > > >> > is/was
> > > >>>> > > >> > > >>> implemented without pausable splits. Do we want to
> > > >>>> introduce
> > > >>>> > this
> > > >>>> > > >> > > >>> dependency? (This is an open question. I cannot
> judge
> > > >>>> that.)
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >> The watermark alignment basically relies on the
> > pausable
> > > >>>> > splits,
> > > >>>> > > >> > right?
> > > >>>> > > >> > > >> So personally I found it quite reasonable that if
> the
> > > >>>> source
> > > >>>> > does
> > > >>>> > > >> not
> > > >>>> > > >> > > >> support pausable splits, end users cannot enable
> > > watermark
> > > >>>> > > >> alignment
> > > >>>> > > >> > on
> > > >>>> > > >> > > it.
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>
> > > >>>> > > >> > > >>> If something is wrong, please correct me.
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> Regards,
> > > >>>> > > >> > > >>> Sebastian
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> > > >>>> > becket.qin@gmail.com>
> > > >>>> > > >> > > wrote:
> > > >>>> > > >> > > >>>
> > > >>>> > > >> > > >>>> Hi Sebastian,
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>> Thanks for the reply and patient discussion. I
> agree
> > > >>>> this is
> > > >>>> > a
> > > >>>> > > >> > tricky
> > > >>>> > > >> > > >>>> decision.
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about
> Option
> > > c)
> > > >>>> > which I
> > > >>>> > > >> see
> > > >>>> > > >> > as
> > > >>>> > > >> > > >>>>> follows:
> > > >>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
> > > >>>> makes the
> > > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> > default
> > > >>>> > > >> > implementation
> > > >>>> > > >> > > is and
> > > >>>> > > >> > > >>>>> will remain a way of making implementation
> optional
> > > >>>> because
> > > >>>> > > >> even in
> > > >>>> > > >> > > future
> > > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> > flavor"
> > > >>>> without
> > > >>>> > > >> > support
> > > >>>> > > >> > > for
> > > >>>> > > >> > > >>>>> pausable splits.
> > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
> > > >>>> suboptimal
> > > >>>> > > >> that
> > > >>>> > > >> > > with
> > > >>>> > > >> > > >>>>> a NOOP default implementation there is no way to
> > > check
> > > >>>> at
> > > >>>> > > >> runtime
> > > >>>> > > >> > if
> > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> > pausing.
> > > >>>> (To
> > > >>>> > do so,
> > > >>>> > > >> > one
> > > >>>> > > >> > > would
> > > >>>> > > >> > > >>>>> need a supportsX method which makes it again more
> > > >>>> > complicated.)\
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>> Based on the last few messages in the mailing
> list.
> > > >>>> Piotr
> > > >>>> > and I
> > > >>>> > > >> > > agreed
> > > >>>> > > >> > > >>>> that the default implementation should just throw
> an
> > > >>>> > > >> > > >>>> UnsupportedOperationException if the source is
> > > >>>> unpausable. So
> > > >>>> > > >> this
> > > >>>> > > >> > > >>>> basically tells the Source developers that this
> > > feature
> > > >>>> is
> > > >>>> > > >> expected
> > > >>>> > > >> > > to be
> > > >>>> > > >> > > >>>> supported. Because we cannot prevent end users
> from
> > > >>>> putting
> > > >>>> > an
> > > >>>> > > >> > > unpausable
> > > >>>> > > >> > > >>>> source into the watermark alignment group, that
> > > >>>> basically
> > > >>>> > means
> > > >>>> > > >> > > watermark
> > > >>>> > > >> > > >>>> alignment is an non-optional feature to the end
> > users.
> > > >>>> So
> > > >>>> > making
> > > >>>> > > >> > that
> > > >>>> > > >> > > >>>> expectation aligned with the source developers
> seems
> > > >>>> > reasonable.
> > > >>>> > > >> > And
> > > >>>> > > >> > > if a
> > > >>>> > > >> > > >>>> source does not support this feature, the end
> users
> > > >>>> should
> > > >>>> > > >> > explicitly
> > > >>>> > > >> > > >>>> remove that source from the watermark alignment
> > group.
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>> Personally speaking I think this is a simple and
> > clean
> > > >>>> > solution
> > > >>>> > > >> from
> > > >>>> > > >> > > >>>> both the end user and source developers'
> standpoint.
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>> Does this address your concerns?
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>> Thanks,
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian
> Mattheis <
> > > >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> > > >>>> > > >> > > >>>>
> > > >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> > > >>>> suggestions/options
> > > >>>> > and
> > > >>>> > > >> we
> > > >>>> > > >> > > >>>>> would be okay either way because we find neither
> > > >>>> solution is
> > > >>>> > > >> > perfect
> > > >>>> > > >> > > just
> > > >>>> > > >> > > >>>>> because of the already present complexity.
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
> > > >>>> SourceReader
> > > >>>> > and
> > > >>>> > > >> > > >>>>> SplitReader
> > > >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to be used
> > by
> > > >>>> > > >> SourceReader
> > > >>>> > > >> > and
> > > >>>> > > >> > > >>>>> SplitReader
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP [1],
> > it
> > > is
> > > >>>> > based on
> > > >>>> > > >> > > >>>>> Option c) which we find acceptable because the
> > > >>>> complexity
> > > >>>> > added
> > > >>>> > > >> is
> > > >>>> > > >> > > only a
> > > >>>> > > >> > > >>>>> single method.
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about
> Option
> > > c)
> > > >>>> > which I
> > > >>>> > > >> see
> > > >>>> > > >> > as
> > > >>>> > > >> > > >>>>> follows:
> > > >>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
> > > >>>> makes the
> > > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> > default
> > > >>>> > > >> > implementation
> > > >>>> > > >> > > is and
> > > >>>> > > >> > > >>>>> will remain a way of making implementation
> optional
> > > >>>> because
> > > >>>> > > >> even in
> > > >>>> > > >> > > future
> > > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> > flavor"
> > > >>>> without
> > > >>>> > > >> > support
> > > >>>> > > >> > > for
> > > >>>> > > >> > > >>>>> pausable splits.
> > > >>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
> > > >>>> suboptimal
> > > >>>> > > >> that
> > > >>>> > > >> > > with
> > > >>>> > > >> > > >>>>> a NOOP default implementation there is no way to
> > > check
> > > >>>> at
> > > >>>> > > >> runtime
> > > >>>> > > >> > if
> > > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> > pausing.
> > > >>>> (To
> > > >>>> > do so,
> > > >>>> > > >> > one
> > > >>>> > > >> > > would
> > > >>>> > > >> > > >>>>> need a supportsX method which makes it again more
> > > >>>> > complicated.)
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> However, we haven't changed it because Option a)
> is
> > > >>>> also not
> > > >>>> > > >> > optimal
> > > >>>> > > >> > > >>>>> or straight-forward:
> > > >>>> > > >> > > >>>>> (1) We need to add two distinct yet similar
> > > decorative
> > > >>>> > > >> interfaces
> > > >>>> > > >> > > >>>>> since, as mentioned, the signatures of the
> methods
> > > are
> > > >>>> > > >> different.
> > > >>>> > > >> > For
> > > >>>> > > >> > > >>>>> example, we would need decorative interfaces like
> > > >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > > >>>> > > >> > `SourceReaderWithPausableSplits`.
> > > >>>> > > >> > > >>>>> (2) As a consequence, we would need to somehow
> > > document
> > > >>>> > > >> how/where
> > > >>>> > > >> > to
> > > >>>> > > >> > > >>>>> implement both interfaces and how this relates to
> > > each
> > > >>>> > other.
> > > >>>> > > >> This
> > > >>>> > > >> > > we could
> > > >>>> > > >> > > >>>>> solve by adding a note in the interface of
> > > >>>> SourceReader and
> > > >>>> > > >> > > SplitReader and
> > > >>>> > > >> > > >>>>> reference to the decorative interfaces but it
> still
> > > >>>> > increases
> > > >>>> > > >> > > complexity
> > > >>>> > > >> > > >>>>> too.
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> In summary, we see both as acceptable and
> preferred
> > > >>>> over
> > > >>>> > other
> > > >>>> > > >> > > >>>>> options. The question is if we can find a
> solution
> > or
> > > >>>> > compromise
> > > >>>> > > >> > > that is
> > > >>>> > > >> > > >>>>> acceptable for everybody to reach consensus.
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> Please let us know what you think because we
> would
> > be
> > > >>>> happy
> > > >>>> > if
> > > >>>> > > >> we
> > > >>>> > > >> > can
> > > >>>> > > >> > > >>>>> conclude the discussion to avoid dropping the
> > > >>>> initiative on
> > > >>>> > this
> > > >>>> > > >> > > FLIP.
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> Regards,
> > > >>>> > > >> > > >>>>> Sebastian
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> [1]
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > >
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > >>>> > > >> > > >>>>> (v. 12)
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> > > >>>> > > >> > pnowojski@apache.org
> > > >>>> > > >> > > >
> > > >>>> > > >> > > >>>>> wrote:
> > > >>>> > > >> > > >>>>>
> > > >>>> > > >> > > >>>>>> Hi Guowei,
> > > >>>> > > >> > > >>>>>>
> > > >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
> > > >>>> > > >> > > >>>>>>
> > > >>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1] which
> > has
> > > >>>> been
> > > >>>> > > >> already
> > > >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be
> enabled
> > > >>>> with the
> > > >>>> > > >> > watermark
> > > >>>> > > >> > > >>>>>> strategy which also configures drift and update
> > > >>>> interval
> > > >>>> > > >> > > >>>>>>
> > > >>>> > > >> > > >>>>>> So by default watermark alignment is disabled,
> > > >>>> regardless
> > > >>>> > if a
> > > >>>> > > >> > > source
> > > >>>> > > >> > > >>>>>> supports it or not.
> > > >>>> > > >> > > >>>>>>
> > > >>>> > > >> > > >>>>>> Best,
> > > >>>> > > >> > > >>>>>> Piotrek
> > > >>>> > > >> > > >>>>>>
> > > >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> > > >>>> guowei.mgw@gmail.com>
> > > >>>> > > >> > > napisał(a):
> > > >>>> > > >> > > >>>>>>
> > > >>>> > > >> > > >>>>>>> Hi,
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>> We know that in the case of Bounded input Flink
> > > >>>> supports
> > > >>>> > the
> > > >>>> > > >> > Batch
> > > >>>> > > >> > > >>>>>>> execution mode. Currently in Batch execution
> > mode,
> > > >>>> flink
> > > >>>> > is
> > > >>>> > > >> > > executed
> > > >>>> > > >> > > >>>>>>> on a
> > > >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps
> > > watermark
> > > >>>> > alignment
> > > >>>> > > >> > > might
> > > >>>> > > >> > > >>>>>>> not
> > > >>>> > > >> > > >>>>>>> gain much.
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>> So my question is: Is watermark alignment the
> > > default
> > > >>>> > > >> > behavior(for
> > > >>>> > > >> > > >>>>>>> implemented source only)? If so, have you
> > > considered
> > > >>>> > > >> evaluating
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> impact
> > > >>>> > > >> > > >>>>>>> of this behavior on the Batch execution mode?
> Or
> > > >>>> thinks
> > > >>>> > it is
> > > >>>> > > >> not
> > > >>>> > > >> > > >>>>>>> necessary.
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>> Correct me if I miss something.
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>> Best,
> > > >>>> > > >> > > >>>>>>> Guowei
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> > > >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > > >>>> > > >> > > >>>>>>> wrote:
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > > I feel that no matter which option we
> choose
> > > >>>> this can
> > > >>>> > not
> > > >>>> > > >> be
> > > >>>> > > >> > > >>>>>>> solved
> > > >>>> > > >> > > >>>>>>> > entirely in either of the options, because of
> > the
> > > >>>> point
> > > >>>> > > >> above
> > > >>>> > > >> > and
> > > >>>> > > >> > > >>>>>>> because
> > > >>>> > > >> > > >>>>>>> > the signature of
> > SplitReader#pauseOrResumeSplits
> > > >>>> and
> > > >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly
> > > >>>> different
> > > >>>> > (one
> > > >>>> > > >> > > >>>>>>> identifies
> > > >>>> > > >> > > >>>>>>> > splits with splitId the other one passes the
> > > splits
> > > >>>> > > >> directly).
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case and for
> > > >>>> features
> > > >>>> > that
> > > >>>> > > >> > need
> > > >>>> > > >> > > >>>>>>> to be
> > > >>>> > > >> > > >>>>>>> > implemented in more than one place.
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading
> from
> > a
> > > >>>> split
> > > >>>> > an
> > > >>>> > > >> > > optional
> > > >>>> > > >> > > >>>>>>> feature,
> > > >>>> > > >> > > >>>>>>> > > other than that this was not included in
> the
> > > >>>> original
> > > >>>> > > >> > > interface?
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > An additional argument in favor of making it
> > > >>>> optional
> > > >>>> > is to
> > > >>>> > > >> > > >>>>>>> simplify source
> > > >>>> > > >> > > >>>>>>> > implementation. But on its own I'm not sure
> if
> > > that
> > > >>>> > would be
> > > >>>> > > >> > > >>>>>>> enough to
> > > >>>> > > >> > > >>>>>>> > justify making this feature optional. Maybe.
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer
> > to
> > > >>>> just
> > > >>>> > let
> > > >>>> > > >> end
> > > >>>> > > >> > > >>>>>>> users and
> > > >>>> > > >> > > >>>>>>> > Flink
> > > >>>> > > >> > > >>>>>>> > > assume all the connectors will implement
> this
> > > >>>> feature.
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > As I wrote above that would be an interesting
> > > >>>> choice to
> > > >>>> > make
> > > >>>> > > >> > > (ease
> > > >>>> > > >> > > >>>>>>> of
> > > >>>> > > >> > > >>>>>>> > implementation for new users, vs system
> > > >>>> consistency).
> > > >>>> > > >> > Regardless
> > > >>>> > > >> > > >>>>>>> of that,
> > > >>>> > > >> > > >>>>>>> > yes, for me the main argument is the API
> > backward
> > > >>>> > > >> > compatibility.
> > > >>>> > > >> > > >>>>>>> But let's
> > > >>>> > > >> > > >>>>>>> > clear a couple of points:
> > > >>>> > > >> > > >>>>>>> > - The current proposal adding methods to the
> > base
> > > >>>> > interface
> > > >>>> > > >> > with
> > > >>>> > > >> > > >>>>>>> default
> > > >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same
> as
> > > the
> > > >>>> > > >> decorative
> > > >>>> > > >> > > >>>>>>> version
> > > >>>> > > >> > > >>>>>>> > would be.
> > > >>>> > > >> > > >>>>>>> > - Decorative version could implement "throw
> > > >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> > > >>>> > > >> > > >>>>>>> > if user enabled watermark alignment just as
> > well
> > > >>>> and I
> > > >>>> > agree
> > > >>>> > > >> > > >>>>>>> that's a
> > > >>>> > > >> > > >>>>>>> > better option compared to logging a warning.
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > Best,
> > > >>>> > > >> > > >>>>>>> > Piotrek
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > > >>>> > becket.qin@gmail.com>
> > > >>>> > > >> > > >>>>>>> napisał(a):
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient
> discussion,
> > > >>>> Piotr and
> > > >>>> > > >> Dawid.
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading
> from
> > a
> > > >>>> split
> > > >>>> > an
> > > >>>> > > >> > > optional
> > > >>>> > > >> > > >>>>>>> feature,
> > > >>>> > > >> > > >>>>>>> > > other than that this was not included in
> the
> > > >>>> original
> > > >>>> > > >> > > interface?
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > To be honest I am really worried about the
> > > >>>> complexity
> > > >>>> > of
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> user story
> > > >>>> > > >> > > >>>>>>> > > here. Optional features like this have a
> high
> > > >>>> > overhead.
> > > >>>> > > >> > Imagine
> > > >>>> > > >> > > >>>>>>> this
> > > >>>> > > >> > > >>>>>>> > > feature is optional, now a user enabled
> > > watermark
> > > >>>> > > >> alignment
> > > >>>> > > >> > and
> > > >>>> > > >> > > >>>>>>> defined a
> > > >>>> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm,
> > that
> > > >>>> > depends on
> > > >>>> > > >> > > >>>>>>> whether the
> > > >>>> > > >> > > >>>>>>> > > involved Source has implmemented this
> > feature.
> > > >>>> If the
> > > >>>> > > >> Sources
> > > >>>> > > >> > > >>>>>>> are well
> > > >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end users
> > may
> > > >>>> have to
> > > >>>> > > >> look
> > > >>>> > > >> > > into
> > > >>>> > > >> > > >>>>>>> the code
> > > >>>> > > >> > > >>>>>>> > > of the Source to see whether the feature is
> > > >>>> supported.
> > > >>>> > > >> Which
> > > >>>> > > >> > is
> > > >>>> > > >> > > >>>>>>> something
> > > >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer
> > to
> > > >>>> just
> > > >>>> > let
> > > >>>> > > >> end
> > > >>>> > > >> > > >>>>>>> users and
> > > >>>> > > >> > > >>>>>>> > Flink
> > > >>>> > > >> > > >>>>>>> > > assume all the connectors will implement
> this
> > > >>>> feature.
> > > >>>> > > >> After
> > > >>>> > > >> > > all
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to the end
> > > >>>> users. If
> > > >>>> > in
> > > >>>> > > >> some
> > > >>>> > > >> > > >>>>>>> rare cases,
> > > >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
> > > >>>> > > >> > > >>>>>>> UnsupportedOperationException
> > > >>>> > > >> > > >>>>>>> > will
> > > >>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly
> remove
> > > this
> > > >>>> > Source
> > > >>>> > > >> from
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >>>>>>> > watermark
> > > >>>> > > >> > > >>>>>>> > > group. I don't think we should have a
> warning
> > > >>>> message
> > > >>>> > > >> here,
> > > >>>> > > >> > as
> > > >>>> > > >> > > >>>>>>> they tend
> > > >>>> > > >> > > >>>>>>> > to
> > > >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do this, we
> > > >>>> don't even
> > > >>>> > > >> need
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> > supportXXX
> > > >>>> > > >> > > >>>>>>> > > method in the Source for this feature. In
> > fact
> > > >>>> this is
> > > >>>> > > >> > exactly
> > > >>>> > > >> > > >>>>>>> how many
> > > >>>> > > >> > > >>>>>>> > > interfaces works today. For example,
> > > >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > >>>> > > >> > > >>>>>>> > not
> > > >>>> > > >> > > >>>>>>> > > supported by Pravega source because it does
> > not
> > > >>>> > support
> > > >>>> > > >> > partial
> > > >>>> > > >> > > >>>>>>> failover.
> > > >>>> > > >> > > >>>>>>> > > In that case, it simply throws an exception
> > to
> > > >>>> > trigger a
> > > >>>> > > >> > global
> > > >>>> > > >> > > >>>>>>> recovery.
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > The reason we add a default implementation
> in
> > > >>>> this
> > > >>>> > case
> > > >>>> > > >> would
> > > >>>> > > >> > > >>>>>>> just for
> > > >>>> > > >> > > >>>>>>> > the
> > > >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so the old
> > > >>>> source can
> > > >>>> > > >> still
> > > >>>> > > >> > > >>>>>>> compile.
> > > >>>> > > >> > > >>>>>>> > Sure,
> > > >>>> > > >> > > >>>>>>> > > in short term, this feature might not be
> > > >>>> supported by
> > > >>>> > many
> > > >>>> > > >> > > >>>>>>> existing
> > > >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite
> visible
> > to
> > > >>>> the
> > > >>>> > source
> > > >>>> > > >> > > >>>>>>> developers
> > > >>>> > > >> > > >>>>>>> > that
> > > >>>> > > >> > > >>>>>>> > > they did not override the default impl
> which
> > > >>>> throws an
> > > >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > @Dawid,
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in
> > the
> > > >>>> Source
> > > >>>> > > >> would
> > > >>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> the single
> > > >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to
> implement
> > > >>>> this
> > > >>>> > > >> feature.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > I also don't find it entirely true. Half of
> > the
> > > >>>> > classes
> > > >>>> > > >> are
> > > >>>> > > >> > > >>>>>>> theoretically
> > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
> > > >>>> point of
> > > >>>> > view
> > > >>>> > > >> how
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >>>>>>> > > interfaces
> > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not
> > > need
> > > >>>> to
> > > >>>> > use
> > > >>>> > > >> any
> > > >>>> > > >> > of
> > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be
> > > weird
> > > >>>> to
> > > >>>> > list
> > > >>>> > > >> > their
> > > >>>> > > >> > > >>>>>>> methods in
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is
> to
> > > >>>> guide
> > > >>>> > users
> > > >>>> > > >> to
> > > >>>> > > >> > > >>>>>>> implement the
> > > >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the
> preferred
> > > way
> > > >>>> to
> > > >>>> > > >> > implement a
> > > >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning
> that.
> > > >>>> Even the
> > > >>>> > > >> Java
> > > >>>> > > >> > > >>>>>>> language
> > > >>>> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
> > > >>>> > implementations
> > > >>>> > > >> [1]
> > > >>>> > > >> > so
> > > >>>> > > >> > > >>>>>>> people
> > > >>>> > > >> > > >>>>>>> > can
> > > >>>> > > >> > > >>>>>>> > > leverage them. But for this particular
> case,
> > if
> > > >>>> we
> > > >>>> > make
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> feature
> > > >>>> > > >> > > >>>>>>> > > non-optional, we don't even need the
> > > supportXXX()
> > > >>>> > method
> > > >>>> > > >> for
> > > >>>> > > >> > > now.
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > Thanks,
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid
> > > Wysakowicz <
> > > >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > wrote:
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy with
> > > >>>> whichever
> > > >>>> > > >> option
> > > >>>> > > >> > is
> > > >>>> > > >> > > >>>>>>> agreed in
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > > discussion.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from the
> > > >>>> discussion
> > > >>>> > > >> > though:
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > @Becket:
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > The main argument for adding the methods
> to
> > > the
> > > >>>> > > >> > SourceReader
> > > >>>> > > >> > > >>>>>>> is that
> > > >>>> > > >> > > >>>>>>> > > these
> > > >>>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
> > > >>>> SourceReader
> > > >>>> > > >> impl,
> > > >>>> > > >> > > i.e.
> > > >>>> > > >> > > >>>>>>> > starting
> > > >>>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders
> impl
> > > are
> > > >>>> > expected
> > > >>>> > > >> to
> > > >>>> > > >> > > >>>>>>> support this
> > > >>>> > > >> > > >>>>>>> > > > method, although some old implementations
> > may
> > > >>>> not
> > > >>>> > have
> > > >>>> > > >> > > >>>>>>> implemented this
> > > >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish
> the
> > > new
> > > >>>> > features
> > > >>>> > > >> > from
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > optional
> > > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > > >>>> interface is a
> > > >>>> > > >> > solution
> > > >>>> > > >> > > >>>>>>> to the
> > > >>>> > > >> > > >>>>>>> > > > optional features. We should not use it
> for
> > > the
> > > >>>> > features
> > > >>>> > > >> > that
> > > >>>> > > >> > > >>>>>>> are
> > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > I don't think that this feature is
> > > >>>> NON-OPTIONAL.
> > > >>>> > Even
> > > >>>> > > >> > though
> > > >>>> > > >> > > >>>>>>> > preferred, I
> > > >>>> > > >> > > >>>>>>> > > > still think it can be simply optional.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method
> in
> > > the
> > > >>>> > Source
> > > >>>> > > >> would
> > > >>>> > > >> > > be
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > single
> > > >>>> > > >> > > >>>>>>> > > > source of truth regarding how to
> implement
> > > this
> > > >>>> > feature.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > I also don't find it entirely true. Half
> of
> > > the
> > > >>>> > classes
> > > >>>> > > >> are
> > > >>>> > > >> > > >>>>>>> > theoretically
> > > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
> > > >>>> point of
> > > >>>> > view
> > > >>>> > > >> how
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >>>>>>> > > interfaces
> > > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not
> > > need
> > > >>>> to
> > > >>>> > use
> > > >>>> > > >> any
> > > >>>> > > >> > of
> > > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be
> > > weird
> > > >>>> to
> > > >>>> > list
> > > >>>> > > >> > their
> > > >>>> > > >> > > >>>>>>> methods in
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > > Source interface.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > @Piotr
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > If we have all of the methods with
> default
> > > >>>> > > >> implementation
> > > >>>> > > >> > in
> > > >>>> > > >> > > >>>>>>> the base
> > > >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any clue
> to
> > > >>>> the user
> > > >>>> > > >> which
> > > >>>> > > >> > > set
> > > >>>> > > >> > > >>>>>>> of
> > > >>>> > > >> > > >>>>>>> > methods
> > > >>>> > > >> > > >>>>>>> > > > are required to be implemented at the
> same
> > > >>>> time.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > I feel that no matter which option we
> > choose
> > > >>>> this
> > > >>>> > can
> > > >>>> > > >> not
> > > >>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> solved
> > > >>>> > > >> > > >>>>>>> > > > entirely in either of the options,
> because
> > of
> > > >>>> the
> > > >>>> > point
> > > >>>> > > >> > above
> > > >>>> > > >> > > >>>>>>> and
> > > >>>> > > >> > > >>>>>>> > because
> > > >>>> > > >> > > >>>>>>> > > > the signature of
> > > >>>> SplitReader#pauseOrResumeSplits and
> > > >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are
> > slightly
> > > >>>> > different
> > > >>>> > > >> > (one
> > > >>>> > > >> > > >>>>>>> identifies
> > > >>>> > > >> > > >>>>>>> > > > splits with splitId the other one passes
> > the
> > > >>>> splits
> > > >>>> > > >> > > directly).
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Best,
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Dawid
> > > >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the
> > > >>>> decorative
> > > >>>> > > >> > interfaces,
> > > >>>> > > >> > > >>>>>>> but I do
> > > >>>> > > >> > > >>>>>>> > > > think we should use them with caution.
> The
> > > main
> > > >>>> > argument
> > > >>>> > > >> > for
> > > >>>> > > >> > > >>>>>>> adding the
> > > >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that these
> > > >>>> methods
> > > >>>> > are
> > > >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader
> > > impl,
> > > >>>> i.e.
> > > >>>> > > >> > starting
> > > >>>> > > >> > > >>>>>>> from this
> > > >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are
> > expected
> > > >>>> to
> > > >>>> > support
> > > >>>> > > >> > this
> > > >>>> > > >> > > >>>>>>> > > > method, although some old implementations
> > may
> > > >>>> not
> > > >>>> > have
> > > >>>> > > >> > > >>>>>>> implemented this
> > > >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish
> the
> > > new
> > > >>>> > features
> > > >>>> > > >> > from
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > optional
> > > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > > >>>> interface is a
> > > >>>> > > >> > solution
> > > >>>> > > >> > > >>>>>>> to the
> > > >>>> > > >> > > >>>>>>> > > > optional features. We should not use it
> for
> > > the
> > > >>>> > features
> > > >>>> > > >> > that
> > > >>>> > > >> > > >>>>>>> are
> > > >>>> > > >> > > >>>>>>> > > > non-optional.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > That said, this feature is optional for
> > > >>>> > SplitReaders.
> > > >>>> > > >> > > Arguably
> > > >>>> > > >> > > >>>>>>> we can
> > > >>>> > > >> > > >>>>>>> > > have
> > > >>>> > > >> > > >>>>>>> > > > a decorative interface for that, but for
> > > >>>> simplicity
> > > >>>> > and
> > > >>>> > > >> > > >>>>>>> symmetry of the
> > > >>>> > > >> > > >>>>>>> > > > interface, personally I prefer just
> adding
> > a
> > > >>>> new
> > > >>>> > method.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned
> > about
> > > >>>> the
> > > >>>> > > >> decorative
> > > >>>> > > >> > > >>>>>>> interfaces,
> > > >>>> > > >> > > >>>>>>> > > > they would make sense if:
> > > >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > > >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative interface
> > > >>>> involved
> > > >>>> > for a
> > > >>>> > > >> > > >>>>>>> feature.
> > > >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the
> methods
> > > are
> > > >>>> > grouped
> > > >>>> > > >> > > >>>>>>> together will
> > > >>>> > > >> > > >>>>>>> > not
> > > >>>> > > >> > > >>>>>>> > > > stand.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Compared with that, I think the current
> > > >>>> solution
> > > >>>> > works
> > > >>>> > > >> fine
> > > >>>> > > >> > > in
> > > >>>> > > >> > > >>>>>>> all
> > > >>>> > > >> > > >>>>>>> > cases,
> > > >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in
> Source,
> > > and
> > > >>>> > default
> > > >>>> > > >> > > >>>>>>> methods /
> > > >>>> > > >> > > >>>>>>> > > > decorative interfaces in base
> interfaces.".
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > The advantages are:
> > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > interface
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > In the current approach, the Java doc of
> > the
> > > >>>> > > >> SupportXXX()
> > > >>>> > > >> > > >>>>>>> method in the
> > > >>>> > > >> > > >>>>>>> > > > Source would be the single source of
> truth
> > > >>>> regarding
> > > >>>> > > >> how to
> > > >>>> > > >> > > >>>>>>> implement
> > > >>>> > > >> > > >>>>>>> > > this
> > > >>>> > > >> > > >>>>>>> > > > feature. It lists the method that has to
> be
> > > >>>> > implemented
> > > >>>> > > >> to
> > > >>>> > > >> > > >>>>>>> support this
> > > >>>> > > >> > > >>>>>>> > > > feature, regardless of how many classes /
> > > >>>> > interfaces are
> > > >>>> > > >> > > >>>>>>> involved.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > When implementing the base interface,
> users
> > > do
> > > >>>> not
> > > >>>> > need
> > > >>>> > > >> to
> > > >>>> > > >> > > >>>>>>> implement a
> > > >>>> > > >> > > >>>>>>> > > > method with default implementation. If
> they
> > > are
> > > >>>> > curious
> > > >>>> > > >> > what
> > > >>>> > > >> > > >>>>>>> the method
> > > >>>> > > >> > > >>>>>>> > > is
> > > >>>> > > >> > > >>>>>>> > > > for, the java doc of that method simply
> > > points
> > > >>>> > users to
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > SupportXXX()
> > > >>>> > > >> > > >>>>>>> > > > method in the Source. It seems not adding
> > > work
> > > >>>> to
> > > >>>> > the
> > > >>>> > > >> users
> > > >>>> > > >> > > >>>>>>> compared
> > > >>>> > > >> > > >>>>>>> > with
> > > >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much
> > better
> > > >>>> > > >> > discoverability.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > - all of the methods from a single
> feature
> > > are
> > > >>>> > grouped
> > > >>>> > > >> in a
> > > >>>> > > >> > > >>>>>>> single
> > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
> > > >>>> dedicated
> > > >>>> > java
> > > >>>> > > >> > doc.
> > > >>>> > > >> > > >>>>>>> It's also
> > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help using
> the
> > > >>>> > decorator
> > > >>>> > > >> name
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > - if an optional feature requires two
> > methods
> > > >>>> to be
> > > >>>> > > >> > > >>>>>>> implemented at
> > > >>>> > > >> > > >>>>>>> > once,
> > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > These two points are not true when
> multiple
> > > >>>> > components
> > > >>>> > > >> and
> > > >>>> > > >> > > >>>>>>> classes are
> > > >>>> > > >> > > >>>>>>> > > > involved collaboratively to provide a
> > > feature.
> > > >>>> In
> > > >>>> > our
> > > >>>> > > >> case,
> > > >>>> > > >> > > we
> > > >>>> > > >> > > >>>>>>> have
> > > >>>> > > >> > > >>>>>>> > both
> > > >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved.
> And
> > > >>>> there
> > > >>>> > might
> > > >>>> > > >> be
> > > >>>> > > >> > > other
> > > >>>> > > >> > > >>>>>>> > > interfaces
> > > >>>> > > >> > > >>>>>>> > > > on the JM side involved for some future
> > > >>>> features.
> > > >>>> > So the
> > > >>>> > > >> > > >>>>>>> relevant
> > > >>>> > > >> > > >>>>>>> > methods
> > > >>>> > > >> > > >>>>>>> > > > can actually be scattered over the
> places.
> > > That
> > > >>>> > said, we
> > > >>>> > > >> > may
> > > >>>> > > >> > > >>>>>>> still use
> > > >>>> > > >> > > >>>>>>> > > > decorative interfaces for each component,
> > if
> > > >>>> the
> > > >>>> > > >> feature is
> > > >>>> > > >> > > >>>>>>> optional,
> > > >>>> > > >> > > >>>>>>> > > given
> > > >>>> > > >> > > >>>>>>> > > > there is a single source of truth for the
> > > >>>> feature.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards making
> > > life
> > > >>>> > easier
> > > >>>> > > >> for
> > > >>>> > > >> > new
> > > >>>> > > >> > > >>>>>>> users,
> > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
> > > >>>> slight
> > > >>>> > expense
> > > >>>> > > >> > for
> > > >>>> > > >> > > >>>>>>> the power
> > > >>>> > > >> > > >>>>>>> > > >> users.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > I actually think the current approach is
> > > >>>> simpler,
> > > >>>> > more
> > > >>>> > > >> > > >>>>>>> extensible and
> > > >>>> > > >> > > >>>>>>> > > more
> > > >>>> > > >> > > >>>>>>> > > > general for all the users. Can you
> > articulate
> > > >>>> a bit
> > > >>>> > > >> more on
> > > >>>> > > >> > > >>>>>>> which part
> > > >>>> > > >> > > >>>>>>> > > you
> > > >>>> > > >> > > >>>>>>> > > > think makes users harder to understand?
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > There is another benefit of the
> decorative
> > > >>>> > interfaces
> > > >>>> > > >> which
> > > >>>> > > >> > > is
> > > >>>> > > >> > > >>>>>>> not
> > > >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth considering
> > > here.
> > > >>>> > Usually
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> decorative
> > > >>>> > > >> > > >>>>>>> > > > interfaces give slightly better backwards
> > > >>>> > compatibility
> > > >>>> > > >> > than
> > > >>>> > > >> > > >>>>>>> the new
> > > >>>> > > >> > > >>>>>>> > > > default method in the interfaces. That is
> > > when
> > > >>>> > users are
> > > >>>> > > >> > > using
> > > >>>> > > >> > > >>>>>>> a jar
> > > >>>> > > >> > > >>>>>>> > that
> > > >>>> > > >> > > >>>>>>> > > > was compiled with an older version of
> Flink
> > > >>>> which
> > > >>>> > does
> > > >>>> > > >> not
> > > >>>> > > >> > > >>>>>>> have the
> > > >>>> > > >> > > >>>>>>> > > default
> > > >>>> > > >> > > >>>>>>> > > > method in the interfaces in question. A
> > > >>>> decorative
> > > >>>> > > >> > interface
> > > >>>> > > >> > > >>>>>>> may still
> > > >>>> > > >> > > >>>>>>> > > > provide backwards compatibility in that
> > case,
> > > >>>> while
> > > >>>> > > >> default
> > > >>>> > > >> > > >>>>>>> method impl
> > > >>>> > > >> > > >>>>>>> > > > cannot.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do not
> > > guarantee
> > > >>>> > custom
> > > >>>> > > >> > > >>>>>>> components
> > > >>>> > > >> > > >>>>>>> > > compiled
> > > >>>> > > >> > > >>>>>>> > > > with an older version can run with a
> newer
> > > >>>> version
> > > >>>> > of
> > > >>>> > > >> > Flink.
> > > >>>> > > >> > > A
> > > >>>> > > >> > > >>>>>>> > recompile
> > > >>>> > > >> > > >>>>>>> > > > with a newer version would be required.
> > That
> > > >>>> said,
> > > >>>> > if
> > > >>>> > > >> we do
> > > >>>> > > >> > > >>>>>>> care about
> > > >>>> > > >> > > >>>>>>> > > > this, we can just change the
> "supportXXX()"
> > > >>>> method
> > > >>>> > in
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> Source
> > > >>>> > > >> > > >>>>>>> > > interface
> > > >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave
> the
> > > >>>> other
> > > >>>> > parts
> > > >>>> > > >> > > >>>>>>> unchanged.
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Thanks,
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr
> > > Nowojski <
> > > >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> > > >>>> > > >> > > >>>>>>> > > > wrote:
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > >> Hi,
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was
> > OoO
> > > >>>> for
> > > >>>> > the
> > > >>>> > > >> last
> > > >>>> > > >> > > two
> > > >>>> > > >> > > >>>>>>> weeks.
> > > >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be
> > > actually
> > > >>>> > against
> > > >>>> > > >> > adding
> > > >>>> > > >> > > >>>>>>> those
> > > >>>> > > >> > > >>>>>>> > > methods
> > > >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons
> > > >>>> mentioned
> > > >>>> > above.
> > > >>>> > > >> > > >>>>>>> Clogging the
> > > >>>> > > >> > > >>>>>>> > > base
> > > >>>> > > >> > > >>>>>>> > > >> interface for new users with tons of
> > methods
> > > >>>> that
> > > >>>> > they
> > > >>>> > > >> do
> > > >>>> > > >> > > not
> > > >>>> > > >> > > >>>>>>> need, do
> > > >>>> > > >> > > >>>>>>> > > not
> > > >>>> > > >> > > >>>>>>> > > >> understand and do not know what to do
> with
> > > >>>> them.
> > > >>>> > > >> Moreover,
> > > >>>> > > >> > > >>>>>>> such
> > > >>>> > > >> > > >>>>>>> > > decorative
> > > >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a
> > > feature
> > > >>>> > requires
> > > >>>> > > >> two
> > > >>>> > > >> > > or
> > > >>>> > > >> > > >>>>>>> more
> > > >>>> > > >> > > >>>>>>> > > methods
> > > >>>> > > >> > > >>>>>>> > > >> to be implemented at the same time. If
> we
> > > >>>> have all
> > > >>>> > of
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> methods with
> > > >>>> > > >> > > >>>>>>> > > >> default implementation in the base
> > > interface,
> > > >>>> the
> > > >>>> > API
> > > >>>> > > >> > > doesn't
> > > >>>> > > >> > > >>>>>>> give any
> > > >>>> > > >> > > >>>>>>> > > >> clue
> > > >>>> > > >> > > >>>>>>> > > >> to the user which set of methods are
> > > required
> > > >>>> to be
> > > >>>> > > >> > > >>>>>>> implemented at the
> > > >>>> > > >> > > >>>>>>> > > >> same
> > > >>>> > > >> > > >>>>>>> > > >> time.
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of
> > > decorative
> > > >>>> > > >> interfaces
> > > >>>> > > >> > is
> > > >>>> > > >> > > >>>>>>> which
> > > >>>> > > >> > > >>>>>>> > > >> interface
> > > >>>> > > >> > > >>>>>>> > > >> > they can decorate and which
> combinations
> > > of
> > > >>>> > multiple
> > > >>>> > > >> > > >>>>>>> decorative
> > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> > > >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new
> feature
> > > >>>> added
> > > >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data
> > > >>>> aware), are
> > > >>>> > we
> > > >>>> > > >> > going
> > > >>>> > > >> > > >>>>>>> to create
> > > >>>> > > >> > > >>>>>>> > > >> another
> > > >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
> > > >>>> > SortedSplitReader or
> > > >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > > >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the
> > > >>>> additional
> > > >>>> > > >> > decorative
> > > >>>> > > >> > > >>>>>>> interface
> > > >>>> > > >> > > >>>>>>> > > >> like
> > > >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases
> > the
> > > >>>> > > >> understanding
> > > >>>> > > >> > > >>>>>>> cost of
> > > >>>> > > >> > > >>>>>>> > users
> > > >>>> > > >> > > >>>>>>> > > >> > because they have to know what
> > decorative
> > > >>>> > interfaces
> > > >>>> > > >> are
> > > >>>> > > >> > > >>>>>>> there,
> > > >>>> > > >> > > >>>>>>> > which
> > > >>>> > > >> > > >>>>>>> > > >> > interface they can decorate and which
> > > >>>> > combinations of
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> decorative
> > > >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are
> not.
> > > >>>> Ideally
> > > >>>> > we
> > > >>>> > > >> want
> > > >>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> avoid
> > > >>>> > > >> > > >>>>>>> > that.
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how
> embedding
> > > >>>> default
> > > >>>> > > >> methods
> > > >>>> > > >> > > in
> > > >>>> > > >> > > >>>>>>> the base
> > > >>>> > > >> > > >>>>>>> > > >> interface is solving the problem: what
> can
> > > be
> > > >>>> > combined
> > > >>>> > > >> or
> > > >>>> > > >> > > >>>>>>> not? If
> > > >>>> > > >> > > >>>>>>> > there
> > > >>>> > > >> > > >>>>>>> > > >> are
> > > >>>> > > >> > > >>>>>>> > > >> two conflicting features, having
> > decorative
> > > >>>> > interfaces
> > > >>>> > > >> > that
> > > >>>> > > >> > > >>>>>>> can not be
> > > >>>> > > >> > > >>>>>>> > > >> mixed together actually makes much more
> > > sense
> > > >>>> to me
> > > >>>> > > >> rather
> > > >>>> > > >> > > >>>>>>> than having
> > > >>>> > > >> > > >>>>>>> > > >> them
> > > >>>> > > >> > > >>>>>>> > > >> all in one base class. How would you
> allow
> > > >>>> users to
> > > >>>> > > >> > > implement
> > > >>>> > > >> > > >>>>>>> only one
> > > >>>> > > >> > > >>>>>>> > > of
> > > >>>> > > >> > > >>>>>>> > > >> those two features?
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there
> are
> > > >>>> > drawbacks:
> > > >>>> > > >> > > >>>>>>> > > >> - how can a user discover what
> decorators
> > > are
> > > >>>> > there?
> > > >>>> > > >> > > >>>>>>> > > >> - how can a user know where the
> decorator
> > > can
> > > >>>> be
> > > >>>> > > >> applied
> > > >>>> > > >> > to?
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for more
> power
> > > >>>> users,
> > > >>>> > that
> > > >>>> > > >> can
> > > >>>> > > >> > > be
> > > >>>> > > >> > > >>>>>>> > mitigated
> > > >>>> > > >> > > >>>>>>> > > by
> > > >>>> > > >> > > >>>>>>> > > >> the documentation. For example listing
> all
> > > of
> > > >>>> the
> > > >>>> > > >> > decorators
> > > >>>> > > >> > > >>>>>>> with
> > > >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the docs
> and
> > in
> > > >>>> the
> > > >>>> > java
> > > >>>> > > >> > docs.
> > > >>>> > > >> > > >>>>>>> More
> > > >>>> > > >> > > >>>>>>> > > >> experienced users will be able to deal
> > with
> > > >>>> those
> > > >>>> > > >> issues
> > > >>>> > > >> > > >>>>>>> easier, as
> > > >>>> > > >> > > >>>>>>> > they
> > > >>>> > > >> > > >>>>>>> > > >> will already have some basic
> understanding
> > > of
> > > >>>> > Flink.
> > > >>>> > > >> Also
> > > >>>> > > >> > if
> > > >>>> > > >> > > >>>>>>> user has
> > > >>>> > > >> > > >>>>>>> > a
> > > >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he will
> > > google
> > > >>>> > search a
> > > >>>> > > >> > > >>>>>>> potential
> > > >>>> > > >> > > >>>>>>> > > solution
> > > >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing
> > that
> > > >>>> he is
> > > >>>> > very
> > > >>>> > > >> > > likely
> > > >>>> > > >> > > >>>>>>> to
> > > >>>> > > >> > > >>>>>>> > > discover
> > > >>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in
> the
> > > >>>> docs.
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> The advantages are:
> > > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> > interface
> > > >>>> > > >> > > >>>>>>> > > >> - all of the methods from a single
> feature
> > > are
> > > >>>> > grouped
> > > >>>> > > >> in
> > > >>>> > > >> > a
> > > >>>> > > >> > > >>>>>>> single
> > > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
> > > >>>> dedicated
> > > >>>> > java
> > > >>>> > > >> > doc.
> > > >>>> > > >> > > >>>>>>> It's also
> > > >>>> > > >> > > >>>>>>> > > >> easier to google search for help using
> the
> > > >>>> > decorator
> > > >>>> > > >> name
> > > >>>> > > >> > > >>>>>>> > > >> - if an optional feature requires two
> > > methods
> > > >>>> to be
> > > >>>> > > >> > > >>>>>>> implemented at
> > > >>>> > > >> > > >>>>>>> > once,
> > > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards
> making
> > > life
> > > >>>> > easier
> > > >>>> > > >> for
> > > >>>> > > >> > > new
> > > >>>> > > >> > > >>>>>>> users,
> > > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
> > > >>>> slight
> > > >>>> > expense
> > > >>>> > > >> > for
> > > >>>> > > >> > > >>>>>>> the power
> > > >>>> > > >> > > >>>>>>> > > >> users.
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> Piotrek
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> > > >>>> > > >> becket.qin@gmail.com
> > > >>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > napisał(a):
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and
> > Dawid,
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good summary.
> > This
> > > >>>> is a
> > > >>>> > > >> really
> > > >>>> > > >> > > >>>>>>> helpful
> > > >>>> > > >> > > >>>>>>> > > >> discussion.
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it
> > might
> > > >>>> still
> > > >>>> > be
> > > >>>> > > >> > better
> > > >>>> > > >> > > >>>>>>> to add the
> > > >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source
> > rather
> > > >>>> than
> > > >>>> > > >> > > SourceReader.
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are trying
> > to
> > > do
> > > >>>> > here is
> > > >>>> > > >> to
> > > >>>> > > >> > > let
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > Flink
> > > >>>> > > >> > > >>>>>>> > > >> > framework know what the Source is
> > capable
> > > >>>> of. In
> > > >>>> > this
> > > >>>> > > >> > > FLIP,
> > > >>>> > > >> > > >>>>>>> it
> > > >>>> > > >> > > >>>>>>> > happens
> > > >>>> > > >> > > >>>>>>> > > >> to
> > > >>>> > > >> > > >>>>>>> > > >> > be the capability that only involves
> > > >>>> > SourceReader.
> > > >>>> > > >> But
> > > >>>> > > >> > in
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > future,
> > > >>>> > > >> > > >>>>>>> > > >> it is
> > > >>>> > > >> > > >>>>>>> > > >> > possible that another functionality
> > > involves
> > > >>>> > both the
> > > >>>> > > >> > > >>>>>>> > SplitEnumerator
> > > >>>> > > >> > > >>>>>>> > > >> and
> > > >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case, following
> > the
> > > >>>> current
> > > >>>> > > >> > > approach,
> > > >>>> > > >> > > >>>>>>> we
> > > >>>> > > >> > > >>>>>>> > should
> > > >>>> > > >> > > >>>>>>> > > >> put
> > > >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
> > > >>>> > SplitEnumerator
> > > >>>> > > >> and
> > > >>>> > > >> > > >>>>>>> SourceReader.
> > > >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
> > > >>>> SourceReader,
> > > >>>> > then
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> JM would
> > > >>>> > > >> > > >>>>>>> > > have
> > > >>>> > > >> > > >>>>>>> > > >> to
> > > >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to know
> > > >>>> whether
> > > >>>> > this
> > > >>>> > > >> > > feature
> > > >>>> > > >> > > >>>>>>> is
> > > >>>> > > >> > > >>>>>>> > > >> supported,
> > > >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we put
> > the
> > > >>>> > > >> > "supportsXXX()"
> > > >>>> > > >> > > >>>>>>> method in
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > >> > Source, we will break the "symmetric"
> > > design
> > > >>>> > because
> > > >>>> > > >> > this
> > > >>>> > > >> > > >>>>>>> FLIP
> > > >>>> > > >> > > >>>>>>> > chose a
> > > >>>> > > >> > > >>>>>>> > > >> > different way.
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > This is also why I think supportsXXX()
> > > >>>> method
> > > >>>> > seems a
> > > >>>> > > >> > good
> > > >>>> > > >> > > >>>>>>> thing to
> > > >>>> > > >> > > >>>>>>> > > >> have,
> > > >>>> > > >> > > >>>>>>> > > >> > because when there are a few
> interfaces
> > /
> > > >>>> methods
> > > >>>> > > >> that
> > > >>>> > > >> > are
> > > >>>> > > >> > > >>>>>>> expected
> > > >>>> > > >> > > >>>>>>> > to
> > > >>>> > > >> > > >>>>>>> > > >> be
> > > >>>> > > >> > > >>>>>>> > > >> > implemented at the same time in order
> to
> > > >>>> deliver
> > > >>>> > a
> > > >>>> > > >> > > feature,
> > > >>>> > > >> > > >>>>>>> it is
> > > >>>> > > >> > > >>>>>>> > > always
> > > >>>> > > >> > > >>>>>>> > > >> > good to have a single source of truth
> to
> > > >>>> tell the
> > > >>>> > > >> > > framework
> > > >>>> > > >> > > >>>>>>> what to
> > > >>>> > > >> > > >>>>>>> > > do,
> > > >>>> > > >> > > >>>>>>> > > >> so
> > > >>>> > > >> > > >>>>>>> > > >> > the framework can do consistent things
> > in
> > > >>>> > different
> > > >>>> > > >> > parts.
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> > > >>>> sebastian@ververica.com>
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> > > >>>> > > >> AlignedSourceReader
> > > >>>> > > >> > +
> > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel
> awkward
> > > >>>> about is
> > > >>>> > > >> that we
> > > >>>> > > >> > > are
> > > >>>> > > >> > > >>>>>>> > > essentially
> > > >>>> > > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
> > > >>>> > > >> implementations to
> > > >>>> > > >> > > >>>>>>> extend
> > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively
> > makes
> > > >>>> the
> > > >>>> > > >> > SourceReader
> > > >>>> > > >> > > >>>>>>> interface
> > > >>>> > > >> > > >>>>>>> > > >> > without the pausing support useless.
> So
> > > this
> > > >>>> > > >> indicates
> > > >>>> > > >> > > that
> > > >>>> > > >> > > >>>>>>> public
> > > >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or
> sub-interfaces
> > > >>>> for the
> > > >>>> > same
> > > >>>> > > >> > > >>>>>>> purpose) only
> > > >>>> > > >> > > >>>>>>> > > >> > make sense if the original interface
> is
> > > also
> > > >>>> > > >> expected to
> > > >>>> > > >> > > be
> > > >>>> > > >> > > >>>>>>> used.
> > > >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense
> to
> > > add
> > > >>>> the
> > > >>>> > > >> method
> > > >>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > original
> > > >>>> > > >> > > >>>>>>> > > >> > interface itself.
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > Cheers,
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid
> > > >>>> Wysakowicz
> > > >>>> > <
> > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > wrote:
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice
> > summary.
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > I think most of your points aligned
> > with
> > > >>>> the
> > > >>>> > > >> > suggestions
> > > >>>> > > >> > > >>>>>>> I made to
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your
> > reply
> > > (I
> > > >>>> > believe
> > > >>>> > > >> we
> > > >>>> > > >> > > hit
> > > >>>> > > >> > > >>>>>>> enter
> > > >>>> > > >> > > >>>>>>> > > >> nearly at
> > > >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > > >>>> > > >> supportsWatermarksSplitAlignment
> > > >>>> > > >> > > to
> > > >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the
> > > >>>> general
> > > >>>> > > >> > capability
> > > >>>> > > >> > > of
> > > >>>> > > >> > > >>>>>>> > pausing.
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > > >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > >>>> > > >> > > >>>>>>> > > (option
> > > >>>> > > >> > > >>>>>>> > > >> b)
> > > >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the methods
> > > >>>> (option
> > > >>>> > c), I
> > > >>>> > > >> > > >>>>>>> suggest to
> > > >>>> > > >> > > >>>>>>> > simply
> > > >>>> > > >> > > >>>>>>> > > >> add
> > > >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is
> much
> > > >>>> > preferred
> > > >>>> > > >> > > approach
> > > >>>> > > >> > > >>>>>>> Becket,
> > > >>>> > > >> > > >>>>>>> > > >> which
> > > >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless there
> is
> > an
> > > >>>> > opposition
> > > >>>> > > >> > > let's
> > > >>>> > > >> > > >>>>>>> go with
> > > >>>> > > >> > > >>>>>>> > > this
> > > >>>> > > >> > > >>>>>>> > > >> > > option c.
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Dawid
> > > >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian
> > Mattheis
> > > >>>> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many
> > > thanks
> > > >>>> for
> > > >>>> > all
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> input and
> > > >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I
> > needed
> > > >>>> some
> > > >>>> > time
> > > >>>> > > >> to
> > > >>>> > > >> > > >>>>>>> catch up and
> > > >>>> > > >> > > >>>>>>> > > >> > structure
> > > >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I
> > wanted
> > > >>>> to
> > > >>>> > find a
> > > >>>> > > >> way
> > > >>>> > > >> > > to
> > > >>>> > > >> > > >>>>>>> > structure
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the
> > > >>>> feeling
> > > >>>> > that
> > > >>>> > > >> some
> > > >>>> > > >> > > >>>>>>> concerns
> > > >>>> > > >> > > >>>>>>> > got
> > > >>>> > > >> > > >>>>>>> > > >> lost
> > > >>>> > > >> > > >>>>>>> > > >> > in
> > > >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt
> > and
> > > >>>> please
> > > >>>> > > >> correct
> > > >>>> > > >> > > me
> > > >>>> > > >> > > >>>>>>> if
> > > >>>> > > >> > > >>>>>>> > > >> something is
> > > >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to
> > > >>>> collect and
> > > >>>> > > >> > assemble
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > > opinions,
> > > >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the
> > > best
> > > >>>> of my
> > > >>>> > > >> > > >>>>>>> knowledge):
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
> > > >>>> (pause/resume
> > > >>>> > > >> > behavior)
> > > >>>> > > >> > > >>>>>>> be a
> > > >>>> > > >> > > >>>>>>> > general
> > > >>>> > > >> > > >>>>>>> > > >> > > capability?
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any reason no
> > to
> > > >>>> have
> > > >>>> > it a
> > > >>>> > > >> > > general
> > > >>>> > > >> > > >>>>>>> > capability
> > > >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit method it
> > is
> > > >>>> > actually
> > > >>>> > > >> > > >>>>>>> independent of
> > > >>>> > > >> > > >>>>>>> > the
> > > >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have
> > it
> > > a
> > > >>>> > general
> > > >>>> > > >> > > >>>>>>> capability, we
> > > >>>> > > >> > > >>>>>>> > > >> should
> > > >>>> > > >> > > >>>>>>> > > >> > > also agree on the right wording.
> Does
> > > >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > > >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is
> then
> > > >>>> actually
> > > >>>> > > >> meant?
> > > >>>> > > >> > (I
> > > >>>> > > >> > > >>>>>>> see it as
> > > >>>> > > >> > > >>>>>>> > > >> okay.
> > > >>>> > > >> > > >>>>>>> > > >> > I
> > > >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst
> > Arvid
> > > >>>> > suggested
> > > >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
> > > >>>> > enable/disable
> > > >>>> > > >> split
> > > >>>> > > >> > > >>>>>>> alignment?
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > I would personally not disable the
> > split
> > > >>>> > alignment
> > > >>>> > > >> on
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >>>>>>> source
> > > >>>> > > >> > > >>>>>>> > > >> reader
> > > >>>> > > >> > > >>>>>>> > > >> > > side because if split alignment is
> > used
> > > >>>> for
> > > >>>> > some
> > > >>>> > > >> other
> > > >>>> > > >> > > >>>>>>> use case
> > > >>>> > > >> > > >>>>>>> > (see
> > > >>>> > > >> > > >>>>>>> > > >> A)
> > > >>>> > > >> > > >>>>>>> > > >> > it
> > > >>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
> > > >>>> other/future
> > > >>>> > use
> > > >>>> > > >> > cases.
> > > >>>> > > >> > > >>>>>>> Instead,
> > > >>>> > > >> > > >>>>>>> > I
> > > >>>> > > >> > > >>>>>>> > > >> would
> > > >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split alignment"
> > > where
> > > >>>> I
> > > >>>> > think
> > > >>>> > > >> it
> > > >>>> > > >> > > >>>>>>> should
> > > >>>> > > >> > > >>>>>>> > disable
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for
> split
> > > >>>> > alignment.
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX
> > > method?
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define the
> > scope
> > > >>>> of a
> > > >>>> > > >> supportsX
> > > >>>> > > >> > > >>>>>>> method
> > > >>>> > > >> > > >>>>>>> > w.r.t.
> > > >>>> > > >> > > >>>>>>> > > >> to
> > > >>>> > > >> > > >>>>>>> > > >> > > the following questions: a) Where is
> > it
> > > >>>> used?
> > > >>>> > and
> > > >>>> > > >> b)
> > > >>>> > > >> > > What
> > > >>>> > > >> > > >>>>>>> is the
> > > >>>> > > >> > > >>>>>>> > > >> expected
> > > >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not
> > straight-forward
> > > >>>> to
> > > >>>> > > >> provide a
> > > >>>> > > >> > > >>>>>>> meaningful
> > > >>>> > > >> > > >>>>>>> > > >> output,
> > > >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split
> > > >>>> alignment
> > > >>>> > but
> > > >>>> > > >> > > >>>>>>> SplitReader
> > > >>>> > > >> > > >>>>>>> > not.
> > > >>>> > > >> > > >>>>>>> > > >> This
> > > >>>> > > >> > > >>>>>>> > > >> > is
> > > >>>> > > >> > > >>>>>>> > > >> > > because with the current
> > implementation,
> > > >>>> we can
> > > >>>> > > >> > > determine
> > > >>>> > > >> > > >>>>>>> whether
> > > >>>> > > >> > > >>>>>>> > > >> split
> > > >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported only
> > during
> > > >>>> > runtime
> > > >>>> > > >> and
> > > >>>> > > >> > > >>>>>>> specifically
> > > >>>> > > >> > > >>>>>>> > > >> > actually
> > > >>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down
> the
> > > >>>> call
> > > >>>> > > >> hierarchy
> > > >>>> > > >> > up
> > > >>>> > > >> > > >>>>>>> to the
> > > >>>> > > >> > > >>>>>>> > > actual
> > > >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either
> > > >>>> raise an
> > > >>>> > > >> error or
> > > >>>> > > >> > > >>>>>>> warning if
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not
> supported
> > at
> > > >>>> some
> > > >>>> > > >> point. I
> > > >>>> > > >> > > >>>>>>> know we
> > > >>>> > > >> > > >>>>>>> > > should
> > > >>>> > > >> > > >>>>>>> > > >> > > carefully think about when this
> could
> > be
> > > >>>> the
> > > >>>> > case
> > > >>>> > > >> > > because
> > > >>>> > > >> > > >>>>>>> we don't
> > > >>>> > > >> > > >>>>>>> > > >> want
> > > >>>> > > >> > > >>>>>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings.
> > > However,
> > > >>>> > warnings
> > > >>>> > > >> > > could
> > > >>>> > > >> > > >>>>>>> be an
> > > >>>> > > >> > > >>>>>>> > > >> indicator
> > > >>>> > > >> > > >>>>>>> > > >> > > for the user that for watermark
> split
> > > >>>> > alignment use
> > > >>>> > > >> > case
> > > >>>> > > >> > > >>>>>>> split
> > > >>>> > > >> > > >>>>>>> > > >> reading is
> > > >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to
> > either
> > > >>>> > disable
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> trigger for
> > > >>>> > > >> > > >>>>>>> > > >> > watermark
> > > >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
> > > >>>> > use/implement a
> > > >>>> > > >> > source
> > > >>>> > > >> > > >>>>>>> and
> > > >>>> > > >> > > >>>>>>> > reader
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the
> discussion
> > > >>>> with the
> > > >>>> > the
> > > >>>> > > >> > > various
> > > >>>> > > >> > > >>>>>>> > > >> possibilities
> > > >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and
> > emails, I
> > > >>>> would
> > > >>>> > > >> like to
> > > >>>> > > >> > > >>>>>>> summarize
> > > >>>> > > >> > > >>>>>>> > the
> > > >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> > > >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
> > > >>>> > ("symmetric"),
> > > >>>> > > >> i.e.,
> > > >>>> > > >> > > >>>>>>> similar
> > > >>>> > > >> > > >>>>>>> > > >> semantics
> > > >>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces with
> > > >>>> similar
> > > >>>> > usage.
> > > >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which
> implementations
> > > >>>> implement
> > > >>>> > > >> > > >>>>>>> interfaces/support
> > > >>>> > > >> > > >>>>>>> > > >> > > behavior.
> > > >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
> > > >>>> implementations
> > > >>>> > and
> > > >>>> > > >> how
> > > >>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> implement
> > > >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the
> > > relations
> > > >>>> > between
> > > >>>> > > >> > > >>>>>>> relevant
> > > >>>> > > >> > > >>>>>>> > classes
> > > >>>> > > >> > > >>>>>>> > > of
> > > >>>> > > >> > > >>>>>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> > > >>>> SourceReaderBase
> > > >>>> > > >> (Internal)
> > > >>>> > > >> > > >>>>>>> <|-- ..
> > > >>>> > > >> > > >>>>>>> > <|--
> > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>--
> > SplitFetcherManager
> > > >>>> > (Internal)
> > > >>>> > > >> > <>--
> > > >>>> > > >> > > >>>>>>> > SplitFetcher
> > > >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public)
> > > <|--
> > > >>>> > > >> > MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>--
> > B:
> > > A
> > > >>>> "has
> > > >>>> > a"
> > > >>>> > > >> B)
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> > > >>>> > SplitFetcherManager
> > > >>>> > > >> > > >>>>>>> implement most
> > > >>>> > > >> > > >>>>>>> > of
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just
> > because
> > > >>>> we
> > > >>>> > wanted
> > > >>>> > > >> to
> > > >>>> > > >> > > >>>>>>> implement
> > > >>>> > > >> > > >>>>>>> > > split
> > > >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be
> > available
> > > as
> > > >>>> > kind of
> > > >>>> > > >> a
> > > >>>> > > >> > > >>>>>>> default. As a
> > > >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
> > > >>>> > implementation" for
> > > >>>> > > >> > > >>>>>>> SourceReader
> > > >>>> > > >> > > >>>>>>> > > that
> > > >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic for
> split
> > > >>>> > alignment.
> > > >>>> > > >> For
> > > >>>> > > >> > > that
> > > >>>> > > >> > > >>>>>>> reason,
> > > >>>> > > >> > > >>>>>>> > I
> > > >>>> > > >> > > >>>>>>> > > >> find
> > > >>>> > > >> > > >>>>>>> > > >> > it
> > > >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP
> default
> > > >>>> > > >> implementation
> > > >>>> > > >> > in
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > > interface
> > > >>>> > > >> > > >>>>>>> > > >> for
> > > >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence,
> > > >>>> interface
> > > >>>> > > >> strategy
> > > >>>> > > >> > > c)
> > > >>>> > > >> > > >>>>>>> is
> > > >>>> > > >> > > >>>>>>> > > difficult
> > > >>>> > > >> > > >>>>>>> > > >> > > because this would require NOOP
> > default
> > > >>>> > > >> > implementations
> > > >>>> > > >> > > >>>>>>> in the
> > > >>>> > > >> > > >>>>>>> > > public
> > > >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and
> > > >>>> SplitReader.
> > > >>>> > This is
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> same for
> > > >>>> > > >> > > >>>>>>> > > >> strategy
> > > >>>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP
> > default
> > > >>>> > > >> > implementation
> > > >>>> > > >> > > >>>>>>> in the
> > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid
> > > described
> > > >>>> > method
> > > >>>> > > >> > > >>>>>>> signatures of
> > > >>>> > > >> > > >>>>>>> > > >> alignSplit
> > > >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader
> > differ
> > > >>>> and it
> > > >>>> > > >> would
> > > >>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> extremely
> > > >>>> > > >> > > >>>>>>> > > >> > difficult
> > > >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the same
> (with
> > > even
> > > >>>> > > >> potential
> > > >>>> > > >> > > >>>>>>> performance
> > > >>>> > > >> > > >>>>>>> > > >> impact
> > > >>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of
> > split
> > > >>>> ids).
> > > >>>> > > >> > Therefore,
> > > >>>> > > >> > > >>>>>>> having a
> > > >>>> > > >> > > >>>>>>> > > >> > symmetric
> > > >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of strategy
> a)
> > > is
> > > >>>> > actually
> > > >>>> > > >> not
> > > >>>> > > >> > > >>>>>>> possible
> > > >>>> > > >> > > >>>>>>> > and
> > > >>>> > > >> > > >>>>>>> > > >> > having
> > > >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with
> > different
> > > >>>> method
> > > >>>> > > >> > > >>>>>>> signatures is
> > > >>>> > > >> > > >>>>>>> > > >> confusing.
> > > >>>> > > >> > > >>>>>>> > > >> > My
> > > >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best with
> > > >>>> strategy b)
> > > >>>> > > >> which
> > > >>>> > > >> > > >>>>>>> means to
> > > >>>> > > >> > > >>>>>>> > have
> > > >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that
> > inherit
> > > >>>> from
> > > >>>> > the
> > > >>>> > > >> > parent
> > > >>>> > > >> > > >>>>>>> > interface:
> > > >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|--
> AlignedSourceReader,
> > > >>>> > SplitReader
> > > >>>> > > >> > <|--
> > > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure
> > what
> > > >>>> the
> > > >>>> > > >> > > implications
> > > >>>> > > >> > > >>>>>>> are and
> > > >>>> > > >> > > >>>>>>> > if
> > > >>>> > > >> > > >>>>>>> > > >> this
> > > >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest
> that
> > > >>>> Dawid
> > > >>>> > and I
> > > >>>> > > >> just
> > > >>>> > > >> > > >>>>>>> try to
> > > >>>> > > >> > > >>>>>>> > > >> implement
> > > >>>> > > >> > > >>>>>>> > > >> > and
> > > >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > # Summary
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know
> your
> > > >>>> > > >> perspectives.
> > > >>>> > > >> > > >>>>>>> Please
> > > >>>> > > >> > > >>>>>>> > correct
> > > >>>> > > >> > > >>>>>>> > > >> me,
> > > >>>> > > >> > > >>>>>>> > > >> > if
> > > >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I
> > misunderstood
> > > >>>> > > >> something. My
> > > >>>> > > >> > > >>>>>>> perspective
> > > >>>> > > >> > > >>>>>>> > > >> would
> > > >>>> > > >> > > >>>>>>> > > >> > be:
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > > >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark
> > > trigger
> > > >>>> for
> > > >>>> > split
> > > >>>> > > >> > > >>>>>>> alignment)
> > > >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> > > >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > > >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM
> Dawid
> > > >>>> > Wysakowicz <
> > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > > wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> capability
> > > >>>> > approach, I
> > > >>>> > > >> > fear
> > > >>>> > > >> > > >>>>>>> that it
> > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > >>>> > > >> > > >>>>>>> > > >> > work
> > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> Sources
> > > can
> > > >>>> > always
> > > >>>> > > >> be
> > > >>>> > > >> > > >>>>>>> aligned
> > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > >>>> > > >> > > >>>>>>> > > >> > and
> > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > alignment.
> > > >>>> So
> > > >>>> > it's
> > > >>>> > > >> > > >>>>>>> plausible to put
> > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though
> > > they
> > > >>>> do
> > > >>>> > not
> > > >>>> > > >> use
> > > >>>> > > >> > any
> > > >>>> > > >> > > >>>>>>> of the
> > > >>>> > > >> > > >>>>>>> > > >> presented
> > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> > > issue a
> > > >>>> > > >> warning, if
> > > >>>> > > >> > > >>>>>>> they handle
> > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method
> > would
> > > >>>> be for
> > > >>>> > > >> telling
> > > >>>> > > >> > > if
> > > >>>> > > >> > > >>>>>>> it
> > > >>>> > > >> > > >>>>>>> > > supports
> > > >>>> > > >> > > >>>>>>> > > >> > that
> > > >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task
> > > >>>> alignment
> > > >>>> > would
> > > >>>> > > >> > always
> > > >>>> > > >> > > >>>>>>> be
> > > >>>> > > >> > > >>>>>>> > > supported.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I
> > believe
> > > >>>> to be
> > > >>>> > > >> closest
> > > >>>> > > >> > > to a
> > > >>>> > > >> > > >>>>>>> > consensus
> > > >>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do you
> mind
> > > >>>> taking a
> > > >>>> > > >> look?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing
> the
> > > >>>> nits?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise
> > wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I
> > > fully
> > > >>>> agree
> > > >>>> > > >> that
> > > >>>> > > >> > > >>>>>>> decorator
> > > >>>> > > >> > > >>>>>>> > > >> interfaces
> > > >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not
> a
> > > big
> > > >>>> fan
> > > >>>> > of
> > > >>>> > > >> > > >>>>>>> overloading the
> > > >>>> > > >> > > >>>>>>> > > base
> > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to
> > implement
> > > as
> > > >>>> > is). The
> > > >>>> > > >> > > usual
> > > >>>> > > >> > > >>>>>>> feedback
> > > >>>> > > >> > > >>>>>>> > > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are
> always
> > > that
> > > >>>> > they are
> > > >>>> > > >> > > >>>>>>> overwhelming
> > > >>>> > > >> > > >>>>>>> > and
> > > >>>> > > >> > > >>>>>>> > > >> too
> > > >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd
> also
> > > not
> > > >>>> > oppose
> > > >>>> > > >> 1c)
> > > >>>> > > >> > as
> > > >>>> > > >> > > >>>>>>> scattered
> > > >>>> > > >> > > >>>>>>> > > >> > interfaces
> > > >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just
> dislike
> > > >>>> 1a) and
> > > >>>> > 1d).
> > > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's
> capability
> > > >>>> > approach, I
> > > >>>> > > >> > fear
> > > >>>> > > >> > > >>>>>>> that it
> > > >>>> > > >> > > >>>>>>> > > >> doesn't
> > > >>>> > > >> > > >>>>>>> > > >> > work
> > > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case:
> Sources
> > > can
> > > >>>> > always
> > > >>>> > > >> be
> > > >>>> > > >> > > >>>>>>> aligned
> > > >>>> > > >> > > >>>>>>> > > >> cross-task
> > > >>>> > > >> > > >>>>>>> > > >> > and
> > > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > > alignment.
> > > >>>> So
> > > >>>> > it's
> > > >>>> > > >> > > >>>>>>> plausible to put
> > > >>>> > > >> > > >>>>>>> > > >> > sources
> > > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though
> > > they
> > > >>>> do
> > > >>>> > not
> > > >>>> > > >> use
> > > >>>> > > >> > any
> > > >>>> > > >> > > >>>>>>> of the
> > > >>>> > > >> > > >>>>>>> > > >> presented
> > > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> > > issue a
> > > >>>> > > >> warning, if
> > > >>>> > > >> > > >>>>>>> they handle
> > > >>>> > > >> > > >>>>>>> > > >> > multiple
> > > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to
> > > >>>> facilitate
> > > >>>> > future
> > > >>>> > > >> use
> > > >>>> > > >> > > >>>>>>> cases makes
> > > >>>> > > >> > > >>>>>>> > > >> sense
> > > >>>> > > >> > > >>>>>>> > > >> > but
> > > >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is
> > chosen)
> > > >>>> should
> > > >>>> > be
> > > >>>> > > >> > > adjusted
> > > >>>> > > >> > > >>>>>>> > > >> accordingly.
> > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> > > >>>> > PausingSourceReader
> > > >>>> > > >> and
> > > >>>> > > >> > > I'd
> > > >>>> > > >> > > >>>>>>> go for
> > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's
> > proposal
> > > >>>> > afaik). We
> > > >>>> > > >> > could
> > > >>>> > > >> > > >>>>>>> also
> > > >>>> > > >> > > >>>>>>> > split
> > > >>>> > > >> > > >>>>>>> > > it
> > > >>>> > > >> > > >>>>>>> > > >> > into
> > > >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> > > >>>> > > >> pauseOrResumeSplits
> > > >>>> > > >> > > >>>>>>> may allow
> > > >>>> > > >> > > >>>>>>> > > >> Sources
> > > >>>> > > >> > > >>>>>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library
> calls
> > > (as
> > > >>>> > written
> > > >>>> > > >> in
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >>>>>>> > Javadoc),
> > > >>>> > > >> > > >>>>>>> > > >> both
> > > >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and
> I'm
> > > not
> > > >>>> > sure if
> > > >>>> > > >> > there
> > > >>>> > > >> > > >>>>>>> is a
> > > >>>> > > >> > > >>>>>>> > system
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >> can.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with
> "pause".
> > > >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the
> > > >>>> capability
> > > >>>> > > >> > section:
> > > >>>> > > >> > > >>>>>>> Sources
> > > >>>> > > >> > > >>>>>>> > that
> > > >>>> > > >> > > >>>>>>> > > >> > adopt
> > > >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in
> > > earlier
> > > >>>> > > >> versions. So
> > > >>>> > > >> > > it
> > > >>>> > > >> > > >>>>>>> feels
> > > >>>> > > >> > > >>>>>>> > like
> > > >>>> > > >> > > >>>>>>> > > >> we
> > > >>>> > > >> > > >>>>>>> > > >> > are
> > > >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old
> sources
> > > can
> > > >>>> be
> > > >>>> > used
> > > >>>> > > >> > after
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > change);
> > > >>>> > > >> > > >>>>>>> > > >> but
> > > >>>> > > >> > > >>>>>>> > > >> > I
> > > >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API
> > addition.
> > > >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what
> happens
> > > >>>> when all
> > > >>>> > > >> splits
> > > >>>> > > >> > > are
> > > >>>> > > >> > > >>>>>>> paused.
> > > >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how the
> 3
> > > >>>> flavors
> > > >>>> > of
> > > >>>> > > >> > > >>>>>>> SourceReaderBase
> > > >>>> > > >> > > >>>>>>> > > >> > interact
> > > >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> > > >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to
> > > >>>> include
> > > >>>> > Kafka
> > > >>>> > > >> and
> > > >>>> > > >> > > >>>>>>> Pulsar in
> > > >>>> > > >> > > >>>>>>> > the
> > > >>>> > > >> > > >>>>>>> > > >> > FLIP.
> > > >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate
> > > >>>> follow-up
> > > >>>> > work.
> > > >>>> > > >> > (could
> > > >>>> > > >> > > >>>>>>> be in the
> > > >>>> > > >> > > >>>>>>> > > >> same
> > > >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM
> > Dawid
> > > >>>> > Wysakowicz
> > > >>>> > > >> <
> > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > >>>> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > >>>> SourceReader,
> > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >>>> > > >> > > >>>>>>> > > >> along
> > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > > >>>> SplitReader,
> > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >>>> > > >> > > >>>>>>> > > or
> > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > >>>> > AlignedSourceReader"
> > > >>>> > > >> > and
> > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > >>>> SourceReader"
> > > >>>> > and
> > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> according
> > > to
> > > >>>> Dawid
> > > >>>> > > >> would
> > > >>>> > > >> > > be:
> > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > >>>> SourceReader"
> > > >>>> > and
> > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> > > >>>> personally
> > > >>>> > > >> > speaking I
> > > >>>> > > >> > > >>>>>>> think it
> > > >>>> > > >> > > >>>>>>> > is
> > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the
> split
> > > >>>> reader
> > > >>>> > with
> > > >>>> > > >> > > >>>>>>> default impl.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to have
> > it
> > > >>>> > aligned as
> > > >>>> > > >> > much
> > > >>>> > > >> > > as
> > > >>>> > > >> > > >>>>>>> > possible.
> > > >>>> > > >> > > >>>>>>> > > >> I'd
> > > >>>> > > >> > > >>>>>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
> > > >>>> > > >> > AlignedSplitReader
> > > >>>> > > >> > > >>>>>>> with
> > > >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> > > >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin
> > wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align
> > > other
> > > >>>> > things in
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> future?
> > > >>>> > > >> > > >>>>>>> > For
> > > >>>> > > >> > > >>>>>>> > > >> > example,
> > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> > > >>>> limits. This
> > > >>>> > > >> > feature
> > > >>>> > > >> > > >>>>>>> may also
> > > >>>> > > >> > > >>>>>>> > > need
> > > >>>> > > >> > > >>>>>>> > > >> an
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > >>>> interface to
> > > >>>> > > >> support
> > > >>>> > > >> > > >>>>>>> align other
> > > >>>> > > >> > > >>>>>>> > > >> > things, I
> > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > >>>> > > >> > > >>>>>>> > > >> instead
> > > >>>> > > >> > > >>>>>>> > > >> > of
> > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise,
> it
> > > >>>> would
> > > >>>> > > >> usually
> > > >>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> more
> > > >>>> > > >> > > >>>>>>> > > >> extensible
> > > >>>> > > >> > > >>>>>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> just describe what the method
> > actually
> > > >>>> does,
> > > >>>> > > >> instead
> > > >>>> > > >> > of
> > > >>>> > > >> > > >>>>>>> assuming
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For example,
> > in
> > > >>>> this
> > > >>>> > case,
> > > >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible because
> this
> > > >>>> can be
> > > >>>> > used
> > > >>>> > > >> for
> > > >>>> > > >> > > >>>>>>> any kind of
> > > >>>> > > >> > > >>>>>>> > > >> flow
> > > >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark alignment
> or
> > > >>>> simple
> > > >>>> > rate
> > > >>>> > > >> > > >>>>>>> limiting.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on
> > this.
> > > I
> > > >>>> think
> > > >>>> > > >> they
> > > >>>> > > >> > > have
> > > >>>> > > >> > > >>>>>>> their
> > > >>>> > > >> > > >>>>>>> > own
> > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > Interfaces
> > > >>>> for
> > > >>>> > > >> extending
> > > >>>> > > >> > > >>>>>>> abilities
> > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > TableSource/TableSink,
> > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
> > > >>>> methods in
> > > >>>> > > >> this
> > > >>>> > > >> > > >>>>>>> case. When
> > > >>>> > > >> > > >>>>>>> > you
> > > >>>> > > >> > > >>>>>>> > > >> have
> > > >>>> > > >> > > >>>>>>> > > >> > a
> > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> Interfaces
> > > can
> > > >>>> help
> > > >>>> > to
> > > >>>> > > >> > > >>>>>>> organize them
> > > >>>> > > >> > > >>>>>>> > and
> > > >>>> > > >> > > >>>>>>> > > >> make
> > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to
> > have
> > > >>>> an
> > > >>>> > > >> ability.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces
> if
> > > >>>> this is
> > > >>>> > a
> > > >>>> > > >> > general
> > > >>>> > > >> > > >>>>>>> design
> > > >>>> > > >> > > >>>>>>> > > >> pattern
> > > >>>> > > >> > > >>>>>>> > > >> > in
> > > >>>> > > >> > > >>>>>>> > > >> > >> the other components in Flink. But
> it
> > > >>>> looks
> > > >>>> > like
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> current API
> > > >>>> > > >> > > >>>>>>> > > >> > proposal
> > > >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is essentially
> > > >>>> > > >> "MySourceReader
> > > >>>> > > >> > > >>>>>>> implements
> > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment",
> > > >>>> along with
> > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> implements
> > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric?
> I'd
> > > >>>> > consider any
> > > >>>> > > >> > one
> > > >>>> > > >> > > >>>>>>> of the
> > > >>>> > > >> > > >>>>>>> > > >> following
> > > >>>> > > >> > > >>>>>>> > > >> > as
> > > >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > > >>>> SourceReader,
> > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >>>> > > >> > > >>>>>>> > > >> along
> > > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > > >>>> SplitReader,
> > > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >>>> > > >> > > >>>>>>> > > or
> > > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > > >>>> > AlignedSourceReader"
> > > >>>> > > >> > and
> > > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > > >>>> SourceReader"
> > > >>>> > and
> > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal
> according
> > > to
> > > >>>> Dawid
> > > >>>> > > >> would
> > > >>>> > > >> > > be:
> > > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > > >>>> SourceReader"
> > > >>>> > and
> > > >>>> > > >> > > >>>>>>> "MySplitReader
> > > >>>> > > >> > > >>>>>>> > > >> > implements
> > > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> > > >>>> personally
> > > >>>> > > >> > speaking I
> > > >>>> > > >> > > >>>>>>> think it
> > > >>>> > > >> > > >>>>>>> > is
> > > >>>> > > >> > > >>>>>>> > > >> > simpler
> > > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the
> split
> > > >>>> reader
> > > >>>> > with
> > > >>>> > > >> > > >>>>>>> default impl.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> > > >>>> dwysakowicz@apache.org> <
> > > >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > > >> <
> > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > >>>> dwysakowicz@apache.org
> > > >>>> > >,
> > > >>>> > > >> > thanks
> > > >>>> > > >> > > >>>>>>> for the
> > > >>>> > > >> > > >>>>>>> > > reply.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> > > >>>> preference
> > > >>>> > > >> and I
> > > >>>> > > >> > > >>>>>>> agree most
> > > >>>> > > >> > > >>>>>>> > of
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment
> I
> > am
> > > >>>> fine
> > > >>>> > > >> > following
> > > >>>> > > >> > > >>>>>>> your
> > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > >>>> > > >> > > >>>>>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending
> from
> > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > >>>> > > >> > > >>>>>>> > > >> but
> > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there,
> > not
> > > >>>> in the
> > > >>>> > > >> Source
> > > >>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> keep the
> > > >>>> > > >> > > >>>>>>> > > two
> > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the
> > "supportsXXX"
> > > >>>> in
> > > >>>> > Source
> > > >>>> > > >> is
> > > >>>> > > >> > > >>>>>>> that this
> > > >>>> > > >> > > >>>>>>> > > allows
> > > >>>> > > >> > > >>>>>>> > > >> > some
> > > >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For example,
> if a
> > > >>>> user
> > > >>>> > enabled
> > > >>>> > > >> > > >>>>>>> watermark
> > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by the
> > > Source,
> > > >>>> an
> > > >>>> > > >> exception
> > > >>>> > > >> > > >>>>>>> can be
> > > >>>> > > >> > > >>>>>>> > thrown
> > > >>>> > > >> > > >>>>>>> > > >> at
> > > >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general
> > > >>>> useful. That
> > > >>>> > > >> said,
> > > >>>> > > >> > I
> > > >>>> > > >> > > >>>>>>> agree that
> > > >>>> > > >> > > >>>>>>> > > API
> > > >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to
> put
> > > the
> > > >>>> two
> > > >>>> > > >> methods
> > > >>>> > > >> > > >>>>>>> together.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM
> Jark
> > > Wu <
> > > >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> > > >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com>
> <
> > > >>>> > > >> imjark@gmail.com>
> > > >>>> > > >> > > >>>>>>> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on
> > > FLIP-182.
> > > >>>> > Sorry I
> > > >>>> > > >> did
> > > >>>> > > >> > > >>>>>>> miss it.
> > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
> > > >>>> > Wysakowicz <
> > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> > > >>>> dwysakowicz@apache.org>
> > > >>>> > <
> > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align
> > with
> > > >>>> > watermarks
> > > >>>> > > >> > when
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > source
> > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > > watermark
> > > >>>> > > >> alignment
> > > >>>> > > >> > > even
> > > >>>> > > >> > > >>>>>>> if Kafka
> > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> > > >>>> somehow. I
> > > >>>> > > >> agree
> > > >>>> > > >> > > >>>>>>> with Becket
> > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method
> for
> > > >>>> users to
> > > >>>> > > >> > > configure
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > source
> > > >>>> > > >> > > >>>>>>> > > >> to
> > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > > >>>> > > >> > > maxDesiredWatermark?
> > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > >>>> maxDesiredWatermark
> > > >>>> > will
> > > >>>> > > >> > > >>>>>>> greatly affect
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> resume.
> > > Can
> > > >>>> users
> > > >>>> > > >> > > configure
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous
> > FLIP[1]
> > > >>>> which
> > > >>>> > has
> > > >>>> > > >> > been
> > > >>>> > > >> > > >>>>>>> already
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> implemented
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be
> enabled
> > > >>>> with the
> > > >>>> > > >> > > watermark
> > > >>>> > > >> > > >>>>>>> strategy
> > > >>>> > > >> > > >>>>>>> > > >> which
> > > >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and update
> > > >>>> interval.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > >>>> interface to
> > > >>>> > > >> support
> > > >>>> > > >> > > >>>>>>> align other
> > > >>>> > > >> > > >>>>>>> > > >> things,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > >>>> > > >> > > >>>>>>> > > >> instead
> > > >>>> > > >> > > >>>>>>> > > >> > of
> > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the
> other
> > > >>>> hand
> > > >>>> > putting
> > > >>>> > > >> > all
> > > >>>> > > >> > > >>>>>>> methods,
> > > >>>> > > >> > > >>>>>>> > > even
> > > >>>> > > >> > > >>>>>>> > > >> > with
> > > >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling
> > > >>>> certain
> > > >>>> > > >> features,
> > > >>>> > > >> > > >>>>>>> makes the
> > > >>>> > > >> > > >>>>>>> > > entry
> > > >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new source
> > > >>>> higher.
> > > >>>> > > >> Instead of
> > > >>>> > > >> > > >>>>>>> focusing on
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> basic
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> and required properties of the
> > Source,
> > > >>>> the
> > > >>>> > person
> > > >>>> > > >> > > >>>>>>> implementing a
> > > >>>> > > >> > > >>>>>>> > > >> source
> > > >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to figure
> > out
> > > >>>> what
> > > >>>> > all
> > > >>>> > > >> of
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> extra
> > > >>>> > > >> > > >>>>>>> > > >> features
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> are
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with them. It
> > > >>>> makes it
> > > >>>> > also
> > > >>>> > > >> > > harder
> > > >>>> > > >> > > >>>>>>> to
> > > >>>> > > >> > > >>>>>>> > > organize
> > > >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark
> > said.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> > > >>>> preference
> > > >>>> > > >> and I
> > > >>>> > > >> > > >>>>>>> agree most
> > > >>>> > > >> > > >>>>>>> > of
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment
> I
> > am
> > > >>>> fine
> > > >>>> > > >> > following
> > > >>>> > > >> > > >>>>>>> your
> > > >>>> > > >> > > >>>>>>> > > >> suggestion
> > > >>>> > > >> > > >>>>>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending
> from
> > > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > >>>> > > >> > > >>>>>>> > > >> but
> > > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there,
> > not
> > > >>>> in the
> > > >>>> > > >> Source
> > > >>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> keep the
> > > >>>> > > >> > > >>>>>>> > > two
> > > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really
> > > unfortunate
> > > >>>> the
> > > >>>> > > >> > > >>>>>>> "alignSplits"
> > > >>>> > > >> > > >>>>>>> > methods
> > > >>>> > > >> > > >>>>>>> > > >> > differ
> > > >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and
> > > >>>> SpitReader. The
> > > >>>> > > >> reason
> > > >>>> > > >> > > for
> > > >>>> > > >> > > >>>>>>> that is
> > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with
> > > >>>> SplitIds,
> > > >>>> > whereas
> > > >>>> > > >> > > >>>>>>> SplitReader
> > > >>>> > > >> > > >>>>>>> > > needs
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I
> found
> > > the
> > > >>>> > > >> discrepancy
> > > >>>> > > >> > > >>>>>>> acceptable
> > > >>>> > > >> > > >>>>>>> > for
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> > > >>>> significantly,
> > > >>>> > > >> especially
> > > >>>> > > >> > > as
> > > >>>> > > >> > > >>>>>>> they
> > > >>>> > > >> > > >>>>>>> > would
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> highly
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we
> would
> > > >>>> have to
> > > >>>> > > >> perform
> > > >>>> > > >> > > >>>>>>> additional
> > > >>>> > > >> > > >>>>>>> > > >> > lookups.
> > > >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a
> > secondary
> > > >>>> > interface.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and
> > > >>>> Sebastian!
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor questions
> > > (maybe I
> > > >>>> > missed
> > > >>>> > > >> > > >>>>>>> something).
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align
> > with
> > > >>>> > watermarks
> > > >>>> > > >> > when
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > source
> > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > > watermark
> > > >>>> > > >> alignment
> > > >>>> > > >> > > even
> > > >>>> > > >> > > >>>>>>> if Kafka
> > > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> > > >>>> somehow. I
> > > >>>> > > >> agree
> > > >>>> > > >> > > >>>>>>> with Becket
> > > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method
> for
> > > >>>> users to
> > > >>>> > > >> > > configure
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > source
> > > >>>> > > >> > > >>>>>>> > > >> to
> > > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > > >>>> > > >> > > maxDesiredWatermark?
> > > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > > >>>> maxDesiredWatermark
> > > >>>> > will
> > > >>>> > > >> > > >>>>>>> greatly affect
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and
> resume.
> > > Can
> > > >>>> users
> > > >>>> > > >> > > configure
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > > >> alignment
> > > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align
> > > other
> > > >>>> > things in
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> future?
> > > >>>> > > >> > > >>>>>>> > For
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> example,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> > > >>>> limits. This
> > > >>>> > > >> > feature
> > > >>>> > > >> > > >>>>>>> may also
> > > >>>> > > >> > > >>>>>>> > > need
> > > >>>> > > >> > > >>>>>>> > > >> an
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > > >>>> interface to
> > > >>>> > > >> support
> > > >>>> > > >> > > >>>>>>> align other
> > > >>>> > > >> > > >>>>>>> > > >> things,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > >>>> > > >> > > >>>>>>> > > >> instead
> > > >>>> > > >> > > >>>>>>> > > >> > of
> > > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on
> > this.
> > > I
> > > >>>> think
> > > >>>> > > >> they
> > > >>>> > > >> > > have
> > > >>>> > > >> > > >>>>>>> their
> > > >>>> > > >> > > >>>>>>> > own
> > > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> > Interfaces
> > > >>>> for
> > > >>>> > > >> extending
> > > >>>> > > >> > > >>>>>>> abilities
> > > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > > TableSource/TableSink,
> > > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
> > > >>>> methods in
> > > >>>> > > >> this
> > > >>>> > > >> > > >>>>>>> case. When
> > > >>>> > > >> > > >>>>>>> > you
> > > >>>> > > >> > > >>>>>>> > > >> have
> > > >>>> > > >> > > >>>>>>> > > >> > a
> > > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method,
> Interfaces
> > > can
> > > >>>> help
> > > >>>> > to
> > > >>>> > > >> > > >>>>>>> organize them
> > > >>>> > > >> > > >>>>>>> > and
> > > >>>> > > >> > > >>>>>>> > > >> make
> > > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to
> > have
> > > >>>> an
> > > >>>> > > >> ability.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13,
> Becket
> > > Qin
> > > >>>> <
> > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > >>>> > > >> > > >>>>>>> > <
> > > >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> > > becket.qin@gmail.com>
> > > >>>> <
> > > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > >>>> > > >> > > >>>>>>> > <
> > > >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation.
> Apologies
> > > >>>> that I
> > > >>>> > > >> somehow
> > > >>>> > > >> > > >>>>>>> misread a
> > > >>>> > > >> > > >>>>>>> > > bunch
> > > >>>> > > >> > > >>>>>>> > > >> of
> > > >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were
> > "assign".
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default
> > implementation,
> > > >>>> I was
> > > >>>> > > >> > thinking
> > > >>>> > > >> > > >>>>>>> of the
> > > >>>> > > >> > > >>>>>>> > > default
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> no-op
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little
> worried
> > > >>>> about
> > > >>>> > the
> > > >>>> > > >> > > >>>>>>> proliferation of
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> decorative
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most
> > important
> > > >>>> thing
> > > >>>> > about
> > > >>>> > > >> > > >>>>>>> interfaces is
> > > >>>> > > >> > > >>>>>>> > > that
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> they
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this
> > case, I
> > > >>>> prefer
> > > >>>> > > >> adding
> > > >>>> > > >> > > >>>>>>> new method
> > > >>>> > > >> > > >>>>>>> > to
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the
> following
> > > >>>> reasons:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of
> > > >>>> decorative
> > > >>>> > > >> > interfaces
> > > >>>> > > >> > > >>>>>>> is which
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> > > combinations
> > > >>>> of
> > > >>>> > > >> multiple
> > > >>>> > > >> > > >>>>>>> decorative
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> > > >>>> > > >> > withSplitsAlignment
> > > >>>> > > >> > > >>>>>>> interface
> > > >>>> > > >> > > >>>>>>> > > is
> > > >>>> > > >> > > >>>>>>> > > >> > only
> > > >>>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader
> which
> > > >>>> means it
> > > >>>> > > >> can't
> > > >>>> > > >> > > >>>>>>> decorate any
> > > >>>> > > >> > > >>>>>>> > > >> other
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface design
> > > >>>> > perspective, a
> > > >>>> > > >> > > >>>>>>> natural
> > > >>>> > > >> > > >>>>>>> > question
> > > >>>> > > >> > > >>>>>>> > > >> is
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> why
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> > > >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > > >>>> > > >> > > >>>>>>> > > is
> > > >>>> > > >> > > >>>>>>> > > >> > also
> > > >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split
> reader
> > > >>>> > implementing
> > > >>>> > > >> > both
> > > >>>> > > >> > > >>>>>>> > SplitReader
> > > >>>> > > >> > > >>>>>>> > > >> and
> > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work,
> > because
> > > a
> > > >>>> > source
> > > >>>> > > >> > reader
> > > >>>> > > >> > > >>>>>>> > implementing
> > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and
> withSplitsAlignment
> > > >>>> works.
> > > >>>> > So why
> > > >>>> > > >> > > isn't
> > > >>>> > > >> > > >>>>>>> there an
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the
> > future,
> > > if
> > > >>>> > there
> > > >>>> > > >> is a
> > > >>>> > > >> > > new
> > > >>>> > > >> > > >>>>>>> feature
> > > >>>> > > >> > > >>>>>>> > > >> added
> > > >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned
> data
> > > >>>> aware),
> > > >>>> > are
> > > >>>> > > >> we
> > > >>>> > > >> > > >>>>>>> going to
> > > >>>> > > >> > > >>>>>>> > create
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> another
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
> > > >>>> > > >> SortedSplitReader or
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think
> the
> > > >>>> > additional
> > > >>>> > > >> > > >>>>>>> decorative
> > > >>>> > > >> > > >>>>>>> > > interface
> > > >>>> > > >> > > >>>>>>> > > >> > like
> > > >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually
> > increases
> > > >>>> the
> > > >>>> > > >> > > understanding
> > > >>>> > > >> > > >>>>>>> cost of
> > > >>>> > > >> > > >>>>>>> > > >> users
> > > >>>> > > >> > > >>>>>>> > > >> > >> because they have to know what
> > > decorative
> > > >>>> > > >> interfaces
> > > >>>> > > >> > > are
> > > >>>> > > >> > > >>>>>>> there,
> > > >>>> > > >> > > >>>>>>> > > which
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and
> which
> > > >>>> > > >> combinations of
> > > >>>> > > >> > > the
> > > >>>> > > >> > > >>>>>>> > > decorative
> > > >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which are
> > not.
> > > >>>> > Ideally we
> > > >>>> > > >> > want
> > > >>>> > > >> > > >>>>>>> to avoid
> > > >>>> > > >> > > >>>>>>> > > >> that.
> > > >>>> > > >> > > >>>>>>> > > >> > To
> > > >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having
> an
> > > >>>> > interface of
> > > >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an
> > > >>>> internal
> > > >>>> > > >> interface
> > > >>>> > > >> > > and
> > > >>>> > > >> > > >>>>>>> let
> > > >>>> > > >> > > >>>>>>> > > >> > SourceReader
> > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the
> > > >>>> SourceReader
> > > >>>> > with a
> > > >>>> > > >> > > default
> > > >>>> > > >> > > >>>>>>> > > >> implementation
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic
> > branching
> > > >>>> in the
> > > >>>> > > >> source
> > > >>>> > > >> > > >>>>>>> logic,
> > > >>>> > > >> > > >>>>>>> > > >> especially
> > > >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast
> > > >>>> majority of
> > > >>>> > the
> > > >>>> > > >> > > >>>>>>> SourceReader
> > > >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all, would
> > just
> > > >>>> extend
> > > >>>> > > >> from
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > > >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new method to
> the
> > > >>>> > interface
> > > >>>> > > >> would
> > > >>>> > > >> > > >>>>>>> effectively
> > > >>>> > > >> > > >>>>>>> > > >> give
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle that
> > may
> > > be
> > > >>>> > worth
> > > >>>> > > >> > > >>>>>>> discussing is how
> > > >>>> > > >> > > >>>>>>> > > do
> > > >>>> > > >> > > >>>>>>> > > >> we
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> let
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell
> Flink
> > > >>>> what
> > > >>>> > > >> capability
> > > >>>> > > >> > > is
> > > >>>> > > >> > > >>>>>>> > supported
> > > >>>> > > >> > > >>>>>>> > > >> and
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> what
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel
> > the
> > > >>>> most
> > > >>>> > > >> intuitive
> > > >>>> > > >> > > >>>>>>> place to me
> > > >>>> > > >> > > >>>>>>> > > is
> > > >>>> > > >> > > >>>>>>> > > >> in
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is the
> > > >>>> entrance
> > > >>>> > of the
> > > >>>> > > >> > > >>>>>>> entire Source
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> connector
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> logic.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am
> > > >>>> wondering
> > > >>>> > if the
> > > >>>> > > >> > > >>>>>>> following
> > > >>>> > > >> > > >>>>>>> > > >> interface
> > > >>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand by
> the
> > > >>>> users.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to
> > > >>>> internal
> > > >>>> > > >> interface,
> > > >>>> > > >> > > >>>>>>> let both
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it,
> with
> > a
> > > >>>> default
> > > >>>> > > >> no-op
> > > >>>> > > >> > > >>>>>>> > > implementation.
> > > >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > > >>>> > > >> supportSplitsAlignment()"
> > > >>>> > > >> > > to
> > > >>>> > > >> > > >>>>>>> the
> > > >>>> > > >> > > >>>>>>> > Source
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> > > implementation
> > > >>>> > returning
> > > >>>> > > >> > > >>>>>>> false. Sources
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> have
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic can
> > > >>>> change
> > > >>>> > this to
> > > >>>> > > >> > > >>>>>>> return true,
> > > >>>> > > >> > > >>>>>>> > and
> > > >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits() methods
> in
> > > the
> > > >>>> > > >> > SourceReader
> > > >>>> > > >> > > /
> > > >>>> > > >> > > >>>>>>> > > SplitReader
> > > >>>> > > >> > > >>>>>>> > > >> if
> > > >>>> > > >> > > >>>>>>> > > >> > >> needed.
> > > >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new optional
> > > >>>> feature is
> > > >>>> > > >> going
> > > >>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> be added
> > > >>>> > > >> > > >>>>>>> > to
> > > >>>> > > >> > > >>>>>>> > > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires
> the
> > > >>>> > awareness
> > > >>>> > > >> from
> > > >>>> > > >> > > >>>>>>> Flink, we
> > > >>>> > > >> > > >>>>>>> > can
> > > >>>> > > >> > > >>>>>>> > > >> add
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> more
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM
> Dawid
> > > >>>> > Wysakowicz <
> > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > >>>> dwysakowicz@apache.org>
> > > >>>> > <
> > > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > > >>>> > dwysakowicz@apache.org>
> > > >>>> > > >> <
> > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > >>>> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > >>>> `AlignedSplitReader`
> > > >>>> > > >> > > interface
> > > >>>> > > >> > > >>>>>>> (aka the
> > > >>>> > > >> > > >>>>>>> > > >> stop &
> > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> implemented
> > > for
> > > >>>> > Kafka and
> > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > >>>> > > >> > > >>>>>>> > > >> correct?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though,
> > those
> > > >>>> are
> > > >>>> > the
> > > >>>> > > >> only
> > > >>>> > > >> > > >>>>>>> sources
> > > >>>> > > >> > > >>>>>>> > which
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> consume
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits
> and
> > > >>>> thus
> > > >>>> > > >> alignment
> > > >>>> > > >> > > >>>>>>> applies.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > > alignment
> > > >>>> in
> > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > >>>> > > >> > > >>>>>>> > > then
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does
> not
> > > >>>> implement
> > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > >>>> > > >> > > >>>>>>> > > >> could
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > WithSplitsAlignment
> > > >>>> can
> > > >>>> > > >> instead
> > > >>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> added to
> > > >>>> > > >> > > >>>>>>> > the
> > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > >>>> > KafkaSourceReader) to
> > > >>>> > > >> > make
> > > >>>> > > >> > > >>>>>>> it
> > > >>>> > > >> > > >>>>>>> > explicit
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I
> > think
> > > >>>> we
> > > >>>> > could
> > > >>>> > > >> > > >>>>>>> actually do
> > > >>>> > > >> > > >>>>>>> > that.
> > > >>>> > > >> > > >>>>>>> > > >> Given
> > > >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
> > > >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > > >>>> > > >> > > >>>>>>> > > >> rather
> > > >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the
> > > >>>> corresponding
> > > >>>> > method
> > > >>>> > > >> of
> > > >>>> > > >> > > >>>>>>> > SplitFetcher),
> > > >>>> > > >> > > >>>>>>> > > >> we
> > > >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the actual
> > > source
> > > >>>> > > >> > > >>>>>>> implementations. This
> > > >>>> > > >> > > >>>>>>> > > >> solution
> > > >>>> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors of
> > new
> > > >>>> > sources
> > > >>>> > > >> would
> > > >>>> > > >> > > >>>>>>> have to do
> > > >>>> > > >> > > >>>>>>> > > two
> > > >>>> > > >> > > >>>>>>> > > >> > >> things: extend from
> > AlignedSplitReader
> > > >>>> and
> > > >>>> > > >> implement
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> > > >>>> AlignedSplitReader.
> > > >>>> > I
> > > >>>> > > >> would
> > > >>>> > > >> > > be
> > > >>>> > > >> > > >>>>>>> fine
> > > >>>> > > >> > > >>>>>>> > with
> > > >>>> > > >> > > >>>>>>> > > >> such
> > > >>>> > > >> > > >>>>>>> > > >> > a
> > > >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> > > >>>> section, is
> > > >>>> > it
> > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > >>>> > > >> > > >>>>>>> > > >> assume
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > partitions
> > > >>>> and
> > > >>>> > one of
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> partition
> > > >>>> > > >> > > >>>>>>> > is
> > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > situation,
> > > >>>> > watermark
> > > >>>> > > >> > for
> > > >>>> > > >> > > >>>>>>> this
> > > >>>> > > >> > > >>>>>>> > source
> > > >>>> > > >> > > >>>>>>> > > >> task
> > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading
> > recent
> > > >>>> data
> > > >>>> > from
> > > >>>> > > >> > other
> > > >>>> > > >> > > >>>>>>> two Kafka
> > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't
> > hold
> > > >>>> back
> > > >>>> > the
> > > >>>> > > >> > overall
> > > >>>> > > >> > > >>>>>>> > watermark.
> > > >>>> > > >> > > >>>>>>> > > I
> > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may
> > have
> > > >>>> late
> > > >>>> > data
> > > >>>> > > >> for
> > > >>>> > > >> > > >>>>>>> this slow
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark.
> > > >>>> Watermark of
> > > >>>> > an
> > > >>>> > > >> > > >>>>>>> operator is the
> > > >>>> > > >> > > >>>>>>> > > >> minimum
> > > >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> > > >>>> restart. Say
> > > >>>> > > >> split
> > > >>>> > > >> > > >>>>>>> alignment is
> > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed.
> > job
> > > >>>> > failed and
> > > >>>> > > >> > > >>>>>>> restored from
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > >> > last
> > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > decision
> > > >>>> is not
> > > >>>> > > >> > > >>>>>>> checkpointed,
> > > >>>> > > >> > > >>>>>>> > > initially
> > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until
> we
> > > get
> > > >>>> a
> > > >>>> > cycle
> > > >>>> > > >> of
> > > >>>> > > >> > > >>>>>>> watermark
> > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying
> > this
> > > >>>> > corner is
> > > >>>> > > >> a
> > > >>>> > > >> > > >>>>>>> problem. Just
> > > >>>> > > >> > > >>>>>>> > > >> want
> > > >>>> > > >> > > >>>>>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is
> > sort
> > > >>>> of a
> > > >>>> > > >> general
> > > >>>> > > >> > use
> > > >>>> > > >> > > >>>>>>> case, so
> > > >>>> > > >> > > >>>>>>> > > >> should
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > > >>>> SourceReader
> > > >>>> > > >> directly
> > > >>>> > > >> > > >>>>>>> instead of
> > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > > >>>> > > >> WithSplitAssignment?
> > > >>>> > > >> > > We
> > > >>>> > > >> > > >>>>>>> can
> > > >>>> > > >> > > >>>>>>> > > provide
> > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> backwards
> > > >>>> > > >> compatibility
> > > >>>> > > >> > > >>>>>>> won't be an
> > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a
> > default
> > > >>>> > > >> > implementation.
> > > >>>> > > >> > > >>>>>>> How would
> > > >>>> > > >> > > >>>>>>> > we
> > > >>>> > > >> > > >>>>>>> > > >> do
> > > >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is
> it
> > > >>>> better
> > > >>>> > than
> > > >>>> > > >> > > having
> > > >>>> > > >> > > >>>>>>> an
> > > >>>> > > >> > > >>>>>>> > opt-in
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface? The default
> implementation
> > > >>>> would
> > > >>>> > have
> > > >>>> > > >> to
> > > >>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> added
> > > >>>> > > >> > > >>>>>>> > > >> exclusively
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> in
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface.
> By
> > > >>>> the way
> > > >>>> > > >> notice
> > > >>>> > > >> > > >>>>>>> > > SourceReaderBase
> > > >>>> > > >> > > >>>>>>> > > >> > >> does extend from
> WithSplitsAlignment,
> > > so
> > > >>>> > > >> effectively
> > > >>>> > > >> > > all
> > > >>>> > > >> > > >>>>>>> > > >> implementations
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be
> > > honest I
> > > >>>> > think
> > > >>>> > > >> it is
> > > >>>> > > >> > > >>>>>>> impossible
> > > >>>> > > >> > > >>>>>>> > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader
> interface
> > > >>>> directly
> > > >>>> > by
> > > >>>> > > >> end
> > > >>>> > > >> > > >>>>>>> users.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> SplitReader
> > > >>>> interface
> > > >>>> > > >> > probably
> > > >>>> > > >> > > >>>>>>> also
> > > >>>> > > >> > > >>>>>>> > needs
> > > >>>> > > >> > > >>>>>>> > > >> some
> > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the
> > > split
> > > >>>> > > >> > granularity.
> > > >>>> > > >> > > >>>>>>> Can you
> > > >>>> > > >> > > >>>>>>> > add
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> > > >>>> interface
> > > >>>> > > >> section as
> > > >>>> > > >> > > >>>>>>> well?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the
> beginning.
> > > See
> > > >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the
> method
> > > >>>> name
> > > >>>> > > >> > assignSplits
> > > >>>> > > >> > > >>>>>>> here,
> > > >>>> > > >> > > >>>>>>> > given
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> > > >>>> > assignments? It
> > > >>>> > > >> > > seems
> > > >>>> > > >> > > >>>>>>> > something
> > > >>>> > > >> > > >>>>>>> > > >> like
> > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > >>>> > > >> adjustSplitsThrottling() is
> > > >>>> > > >> > > >>>>>>> more
> > > >>>> > > >> > > >>>>>>> > > accurate.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*,
> > not
> > > >>>> > assign. Do
> > > >>>> > > >> you
> > > >>>> > > >> > > >>>>>>> still
> > > >>>> > > >> > > >>>>>>> > prefer
> > > >>>> > > >> > > >>>>>>> > > a
> > > >>>> > > >> > > >>>>>>> > > >> > >> different name for that?
> Personally,
> > I
> > > am
> > > >>>> > open for
> > > >>>> > > >> > > >>>>>>> suggestions
> > > >>>> > > >> > > >>>>>>> > > here.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > >
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> >
> > > >>>>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin
> > wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort,
> > > >>>> Sebastion. I
> > > >>>> > think
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> motivation
> > > >>>> > > >> > > >>>>>>> > > >> makes a
> > > >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few
> suggestions
> > /
> > > >>>> > questions.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is
> > sort
> > > >>>> of a
> > > >>>> > > >> general
> > > >>>> > > >> > use
> > > >>>> > > >> > > >>>>>>> case, so
> > > >>>> > > >> > > >>>>>>> > > >> should
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> we
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > > >>>> SourceReader
> > > >>>> > > >> directly
> > > >>>> > > >> > > >>>>>>> instead of
> > > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > > >>>> > > >> WithSplitAssignment?
> > > >>>> > > >> > > We
> > > >>>> > > >> > > >>>>>>> can
> > > >>>> > > >> > > >>>>>>> > > provide
> > > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so
> backwards
> > > >>>> > > >> compatibility
> > > >>>> > > >> > > >>>>>>> won't be an
> > > >>>> > > >> > > >>>>>>> > > >> issue.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the
> SplitReader
> > > >>>> interface
> > > >>>> > > >> > probably
> > > >>>> > > >> > > >>>>>>> also
> > > >>>> > > >> > > >>>>>>> > needs
> > > >>>> > > >> > > >>>>>>> > > >> some
> > > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the
> > > split
> > > >>>> > > >> > granularity.
> > > >>>> > > >> > > >>>>>>> Can you
> > > >>>> > > >> > > >>>>>>> > add
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> > > >>>> interface
> > > >>>> > > >> section as
> > > >>>> > > >> > > >>>>>>> well?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the
> method
> > > >>>> name
> > > >>>> > > >> > assignSplits
> > > >>>> > > >> > > >>>>>>> here,
> > > >>>> > > >> > > >>>>>>> > given
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> it
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> > > >>>> > assignments? It
> > > >>>> > > >> > > seems
> > > >>>> > > >> > > >>>>>>> > something
> > > >>>> > > >> > > >>>>>>> > > >> like
> > > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > >>>> > > >> adjustSplitsThrottling() is
> > > >>>> > > >> > > >>>>>>> more
> > > >>>> > > >> > > >>>>>>> > > accurate.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM
> > Steven
> > > >>>> Wu <
> > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> > > <
> > > >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> > > stevenz3wu@gmail.com>
> > > >>>> <
> > > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > > >>>> > > >> > > >>>>>>> > <
> > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> However, a single source operator
> may
> > > >>>> read
> > > >>>> > data
> > > >>>> > > >> from
> > > >>>> > > >> > > >>>>>>> multiple
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple
> > Kafka
> > > >>>> > > >> partitions,
> > > >>>> > > >> > > such
> > > >>>> > > >> > > >>>>>>> that
> > > >>>> > > >> > > >>>>>>> > even
> > > >>>> > > >> > > >>>>>>> > > >> with
> > > >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source
> > operator
> > > >>>> may
> > > >>>> > need
> > > >>>> > > >> to
> > > >>>> > > >> > > >>>>>>> buffer
> > > >>>> > > >> > > >>>>>>> > > excessive
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> amount
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data
> > faster
> > > >>>> than
> > > >>>> > > >> another.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> > > >>>> section, is
> > > >>>> > it
> > > >>>> > > >> > > >>>>>>> accurate? Let's
> > > >>>> > > >> > > >>>>>>> > > >> assume
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> one
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> > partitions
> > > >>>> and
> > > >>>> > one of
> > > >>>> > > >> the
> > > >>>> > > >> > > >>>>>>> partition
> > > >>>> > > >> > > >>>>>>> > is
> > > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > > situation,
> > > >>>> > watermark
> > > >>>> > > >> > for
> > > >>>> > > >> > > >>>>>>> this
> > > >>>> > > >> > > >>>>>>> > source
> > > >>>> > > >> > > >>>>>>> > > >> task
> > > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading
> > recent
> > > >>>> data
> > > >>>> > from
> > > >>>> > > >> > other
> > > >>>> > > >> > > >>>>>>> two Kafka
> > > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't
> > hold
> > > >>>> back
> > > >>>> > the
> > > >>>> > > >> > overall
> > > >>>> > > >> > > >>>>>>> > watermark.
> > > >>>> > > >> > > >>>>>>> > > I
> > > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may
> > have
> > > >>>> late
> > > >>>> > data
> > > >>>> > > >> for
> > > >>>> > > >> > > >>>>>>> this slow
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> > > >>>> restart. Say
> > > >>>> > > >> split
> > > >>>> > > >> > > >>>>>>> alignment is
> > > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed.
> > job
> > > >>>> > failed and
> > > >>>> > > >> > > >>>>>>> restored from
> > > >>>> > > >> > > >>>>>>> > > the
> > > >>>> > > >> > > >>>>>>> > > >> > last
> > > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> > decision
> > > >>>> is not
> > > >>>> > > >> > > >>>>>>> checkpointed,
> > > >>>> > > >> > > >>>>>>> > > initially
> > > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until
> we
> > > get
> > > >>>> a
> > > >>>> > cycle
> > > >>>> > > >> of
> > > >>>> > > >> > > >>>>>>> watermark
> > > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying
> > this
> > > >>>> > corner is
> > > >>>> > > >> a
> > > >>>> > > >> > > >>>>>>> problem. Just
> > > >>>> > > >> > > >>>>>>> > > >> want
> > > >>>> > > >> > > >>>>>>> > > >> > to
> > > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM
> > Thomas
> > > >>>> Weise <
> > > >>>> > > >> > > >>>>>>> thw@apache.org> <
> > > >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
> > > >>>> thw@apache.org>
> > > >>>> > <
> > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > > alignment
> > > >>>> in
> > > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > > >>>> > > >> > > >>>>>>> > > then
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does
> not
> > > >>>> implement
> > > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > > >>>> > > >> > > >>>>>>> > > >> could
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> be
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> > WithSplitsAlignment
> > > >>>> can
> > > >>>> > > >> instead
> > > >>>> > > >> > be
> > > >>>> > > >> > > >>>>>>> added to
> > > >>>> > > >> > > >>>>>>> > the
> > > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > > >>>> > KafkaSourceReader) to
> > > >>>> > > >> > make
> > > >>>> > > >> > > >>>>>>> it
> > > >>>> > > >> > > >>>>>>> > explicit
> > > >>>> > > >> > > >>>>>>> > > >> that
> > > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM
> > > >>>> Konstantin
> > > >>>> > Knauf <
> > > >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> <
> > > >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <knaufk@apache.org
> >
> > <
> > > >>>> > > >> > knaufk@apache.org
> > > >>>> > > >> > > >
> > > >>>> > > >> > > >>>>>>> <
> > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > > >>>> `AlignedSplitReader`
> > > >>>> > > >> > > interface
> > > >>>> > > >> > > >>>>>>> (aka the
> > > >>>> > > >> > > >>>>>>> > > >> stop
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> &
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be
> implemented
> > > for
> > > >>>> > Kafka and
> > > >>>> > > >> > > >>>>>>> Pulsar only,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> correct?
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is
> > valuable
> > > >>>> to
> > > >>>> > > >> complete
> > > >>>> > > >> > the
> > > >>>> > > >> > > >>>>>>> watermark
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> aligned
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM
> > Dawid
> > > >>>> > Wysakowicz
> > > >>>> > > >> <
> > > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > > >>>> dwysakowicz@apache.org>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on
> it,
> > I
> > > >>>> > support it
> > > >>>> > > >> ;)
> > > >>>> > > >> > I
> > > >>>> > > >> > > >>>>>>> think we
> > > >>>> > > >> > > >>>>>>> > can
> > > >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as
> there
> > > >>>> are no
> > > >>>> > > >> concerns
> > > >>>> > > >> > > so
> > > >>>> > > >> > > >>>>>>> far.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian
> > Mattheis
> > > >>>> wrote:
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion
> on
> > > >>>> FLIP 217
> > > >>>> > [1]
> > > >>>> > > >> for
> > > >>>> > > >> > > an
> > > >>>> > > >> > > >>>>>>> > extension
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> of
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform
> > > alignment
> > > >>>> also
> > > >>>> > in
> > > >>>> > > >> > > >>>>>>> SplitReaders. To
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> do
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> so,
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to
> suspend
> > > and
> > > >>>> > resume
> > > >>>> > > >> > reading
> > > >>>> > > >> > > >>>>>>> from
> > > >>>> > > >> > > >>>>>>> > split
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> sources
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator
> coordinates
> > > and
> > > >>>> > controlls
> > > >>>> > > >> > > >>>>>>> suspend and
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> resume.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> To
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> gather information about current
> > > >>>> watermarks
> > > >>>> > of the
> > > >>>> > > >> > > >>>>>>> SplitReaders,
> > > >>>> > > >> > > >>>>>>> > we
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> extend
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> the internal
> > WatermarkOutputMulitplexer
> > > >>>> and
> > > >>>> > report
> > > >>>> > > >> > > >>>>>>> watermarks to
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> the
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2],
> > > >>>> prototyped
> > > >>>> > by
> > > >>>> > > >> Arvid
> > > >>>> > > >> > > >>>>>>> Heise and
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> revised
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> and
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He
> did
> > > >>>> most of
> > > >>>> > the
> > > >>>> > > >> > work.)
> > > >>>> > > >> > > >>>>>>> and me.
> > > >>>> > > >> > > >>>>>>> > The
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> changes
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way
> > that
> > > if
> > > >>>> > affected
> > > >>>> > > >> > > >>>>>>> components do
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> not
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the
> behavior
> > is
> > > >>>> as
> > > >>>> > before.
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > > >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > >
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> >
> > > >>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> [2]
> > > >>>> > > >> > > >>>>>>>
> > > >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> --
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> twitter.com/snntrablehttps://github.com/knaufk
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> > >>
> > > >>>> > > >> > > >>>>>>> > > >> >
> > > >>>> > > >> > > >>>>>>> > > >>
> > > >>>> > > >> > > >>>>>>> > > >
> > > >>>> > > >> > > >>>>>>> > >
> > > >>>> > > >> > > >>>>>>> >
> > > >>>> > > >> > > >>>>>>>
> > > >>>> > > >> > > >>>>>>
> > > >>>> > > >> > >
> > > >>>> > > >> >
> > > >>>> > > >>
> > > >>>> > > >
> > > >>>> >
> > > >>>>
> > > >>>
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Hi Piotr,

Please see the reply inline below:

On Wed, Jun 29, 2022 at 5:11 PM Piotr Nowojski <pn...@apache.org> wrote:

> Hi Becket,
>
> > My main concern of having a supportsPausingSplits() knob
>
> What is the problem with `supportsPausingSplits()` that you see?  Do you
> want to remove it?
>
Just to make sure we are on the same page, I assume we are talking about
this supportingPausingSplits() method in the Source interface. If we go
with the obligatory features addition path, having this method seems
misleading. And also, later on at some point when we see all the sources
have implemented this feature, we will have to worry about deprecating this
method, which is backwards incompatible.


> Also I don't understand your proposal for Solution 1. How do you want to
> decide whether to throw an exception? For that we would need to have
> `supportsPausingSplits()`, right?
>

What I am thinking is the following:

1. The Flink framework always assumes split pausing is supported and just
naively invokes SourceReader#pauseOrResumeSplits().
2. The SourceReaderBase will basically again try to ask the SplitReader to
pause the splits.
3. Because the default implementation throws an
UnsupportedOperationException, if the source developer did not override it,
this exception will be thrown and bubbled up.
4. After catching this exception, the SourceReaderBase will just check if
there is only one split that is currently assigned to the split reader. If
so, it swallows the exception, stops polling the split reader and returns
NOTHING_AVAILABLE. This is the same as the current logic in the
SourceOperator. If we are not comfortable with moving this logic to the
SourceReaderBase, we can also just keep the logic there and simply let
SourceOperator remember if there are more than one split assigned to the
source reader, when SourceOperator.handleAddSplitsEvent() is invoked.

This way the existing FLIP-182 users won't be impacted by this FLIP. For
those source readers that only have one split assigned, it works fine
without any change. For those source readers with multiple splits assigned,
they are already in a limp state with unpredictable side effects. We might
as well let them know this instead of pretending the
coarse-grained watermark alignment works fine for them.

The advantage of this solution is that we don't have to do anything after
this. That would work fine as the final state, as in:
1. We have already done the best we can do for the Sources that do not
support split pausing.
2. It is fully compatible with FLIP-182, if we consider it as the right
thing to throw an exception for readers reading from multiple splits
without supporting split pausing.
3. There is nothing to deprecate in the future.


>
> If so, I would prefer solution 2, to provide a graceful migration path for
> any users that are already using FLIP-182 with multiple splits per
> operator. I don't think there are many of those, but such a flag seems easy
> to implement while making migration easier. Having said that, technically
> we could change the behaviour and start throwing an exception always in
> such a case, as this feature is marked as Experimental.
>

I actually think neither solution 1 or 2 breaks FLIP-182 users, but
solution 2 needs a deprecation process for the option in the future.


>
> Best,
> Piotrek
>
> śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com> napisał(a):
>
> > Hi Sebastian,
> >
> > Regarding the question,
> >
> > >
> > > @Becket: I'm not sure about the intention of solution 1. Can you
> explain
> > > that a bit more? In particular, I don't understand: "The reasoning
> behind
> > > this solution is that existing users should only use the
> > > coarse watermark alignment when a source reader only reads from a
> single
> > > split." Why should a user not use coarse watermark alignment when
> source
> > > reader reads from multiple splits? The split alignment uses the "coarse
> > > watermark", i.e., maxDesiredWatermark, as described in the FLIP for
> > > alignment.
> >
> >
> > Imagine you have a source reader reading from two splits, and the
> > watermarks look like the following:
> > 1. Watermark of Split 1: 10:00 AM Jun 29,
> > 2. Watermark of Split 2: 11:00 AM Jun 29
> > 3. maxDesiredWatermark:10:30 AM Jun 29
> >
> > At this point, the source reader's watermark is 10:00 AM which is lower
> > than the maxDesiredWatermark, so the source reader won't be paused from
> > reading. However, because the source reader cannot specify which split to
> > read from, if it continues to read, the watermark gap between the two
> > splits may become even bigger. This essentially fails the main purpose of
> > watermark alignment - to reduce the number of records buffered in the
> > state. This does not necessarily happen, but this is not what FLIP-182
> was
> > designed for to begin with. So I'd rather avoid extending the feature
> > to that case.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> > sebastian@ververica.com>
> > wrote:
> >
> > > (Sorry I didn't send to the list but only to Becket. My bad and thanks
> > > Piotr. Next attempt:)
> > >
> > > Hi all,
> > >
> > > Thanks for pushing the FLIP. I would drive it and would be happy to get
> > > back to you, @Thomas, for reviews. (Sorry for low responsiveness, there
> > > were several efforts with high priority on my side ...) As next step, I
> > > would revise the FLIP to get the discussion concluded.
> > >
> > > However, as Becket mentioned I feel that some things are still not
> clear
> > > yet:
> > >
> > > Re: Thomas
> > >>
> > >> However, from a user perspective, should the split level alignment be
> > >>> an opt-in feature, at least for a few releases? If yes, then we would
> > >>> require a knob similar to supportsPausingSplits(), which I understand
> > >>> won't be part of the revised FLIP. Such control may be beneficial:
> > >>
> > >>
> > >>> * Compare runtime behavior with split level alignment on/off
> > >>> * Allow use of sources that don't implement pausing splits yet
> > >>
> > >>
> > >>> The second point would, from the user's perspective, be necessary for
> > >>> backward compatibility? While the interface aspect and source
> > >>> compatibility has been discussed in great detail, I don't think it
> > >>> would be desirable if an application that already uses alignment
> fails
> > >>> after upgrading to the new Flink version, forcing users to lock step
> > >>> modify sources for the new non-optional split level alignment.
> > >>
> > >>
> > >>> So I think clarification of the compatibility aspect on the FLIP page
> > >>> would be necessary.
> > >>
> > >>
> > >> That is a good point. Currently FLIP-182 is marked as experimental. So
> > >> technically speaking it could change. That said, I agree that it would
> > be
> > >> good to not break the existing sources.
> > >>
> > >> My main concern of having a supportsPausingSplits() knob is that this
> > >> might be something requiring code change on future derepcation. I am
> > >> thinking of two potential solutions:
> > >>
> > >> Solution 1:
> > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
> > >> source reader only has one split assigned, the source reader simply
> > stops
> > >> polling but just returns NOTHING_AVAILABLE. If there are more than one
> > >> splits assigned, it throws an exception with a message such as "The
> > >> unpausable SplitReader CLASS_NAME only works with watermark alignment
> > >> when assigned a single split. There are more than one split assigned
> to
> > the
> > >> SplitReader".
> > >> The reasoning behind this solution is that existing users should only
> > use
> > >> the coarse watermark alignment when a source reader only reads from a
> > >> single split. Reading from more than one split might have unwanted
> side
> > >> effects, so we might as well throw an exception in this case.
> > >>
> > >> Solution 2:
> > >> Having a configuration "enable.coarse-grained.watermark.alignment",
> the
> > >> default value is false. Once it is set to true, we will allow
> > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > >> This solution allows users to keep the current FLIP-182 behavior, with
> > >> the risk of side effects.
> > >>
> > >> Personally speaking, I feel solution 1 seems better because
> > >> coarse-grained watermark alignment could be frustrating to the users
> > >> when more than one split is assigned. So we might as well not support
> > it at
> > >> all. And also there is nothing to deprecate in the future with this
> > >> solution.
> > >>
> > >> What do you think?
> > >>
> > >
> > > @Thomas: My understanding is that you intend a simple switch/knob to
> test
> > > w/ and w/o (split) watermark alignment, right? Isn't the coarse-grained
> > w/
> > > vs w/o watermark alignment sufficient for that? Or do you think that
> > > switching watermark aligment explicitly on split level is required?
> > >
> > > @Becket: I'm not sure about the intention of solution 1. Can you
> explain
> > > that a bit more? In particular, I don't understand: "The reasoning
> behind
> > > this solution is that existing users should only use the coarse
> watermark
> > > alignment when a source reader only reads from a single split." Why
> > > should a user not use coarse watermark alignment when source reader
> reads
> > > from multiple splits? The split alignment uses the "coarse watermark",
> > > i.e., maxDesiredWatermark, as described in the FLIP for alignment.
> > >
> > > Could you please clarify?
> > >
> > > Regards,
> > > Sebastian
> > >
> > > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > >> Thanks for the feedback, Thomas and Steve. And thanks Piotr for the
> > >> patient and detailed discussion.
> > >>
> > >> Let's move forward with option 1 then.
> > >>
> > >> Re: Thomas
> > >>
> > >> However, from a user perspective, should the split level alignment be
> > >>> an opt-in feature, at least for a few releases? If yes, then we would
> > >>> require a knob similar to supportsPausingSplits(), which I understand
> > >>> won't be part of the revised FLIP. Such control may be beneficial:
> > >>
> > >>
> > >>> * Compare runtime behavior with split level alignment on/off
> > >>> * Allow use of sources that don't implement pausing splits yet
> > >>
> > >>
> > >>> The second point would, from the user's perspective, be necessary for
> > >>> backward compatibility? While the interface aspect and source
> > >>> compatibility has been discussed in great detail, I don't think it
> > >>> would be desirable if an application that already uses alignment
> fails
> > >>> after upgrading to the new Flink version, forcing users to lock step
> > >>> modify sources for the new non-optional split level alignment.
> > >>
> > >>
> > >>> So I think clarification of the compatibility aspect on the FLIP page
> > >>> would be necessary.
> > >>
> > >>
> > >> That is a good point. Currently FLIP-182 is marked as experimental. So
> > >> technically speaking it could change. That said, I agree that it would
> > be
> > >> good to not break the existing sources.
> > >>
> > >> My main concern of having a supportsPausingSplits() knob is that this
> > >> might be something requiring code change on future derepcation. I am
> > >> thinking of two potential solutions:
> > >>
> > >> Solution 1:
> > >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
> > >> source reader only has one split assigned, the source reader simply
> > stops
> > >> polling but just returns NOTHING_AVAILABLE. If there are more than one
> > >> splits assigned, it throws an exception with a message such as "The
> > >> unpausable SplitReader CLASS_NAME only works with watermark alignment
> > when
> > >> assigned a single split. There are more than one split assigned to the
> > >> SplitReader".
> > >> The reasoning behind this solution is that existing users should only
> > use
> > >> the coarse watermark alignment when a source reader only reads from a
> > >> single split. Reading from more than one split might have unwanted
> side
> > >> effects, so we might as well throw an exception in this case.
> > >>
> > >> Solution 2:
> > >> Having a configuration "enable.coarse-grained.watermark.alignment",
> the
> > >> default value is false. Once it is set to true, we will allow
> > >> coarse-grained watermark alignment if a SplitReader is pausable.
> > >> This solution allows users to keep the current FLIP-182 behavior, with
> > >> the risk of side effects.
> > >>
> > >> Personally speaking, I feel solution 1 seems better because
> > >> coarse-grained watermark alignment could be frustrating to the users
> > when
> > >> more than one split is assigned. So we might as well not support it at
> > all.
> > >> And also there is nothing to deprecate in the future with this
> solution.
> > >>
> > >> What do you think?
> > >>
> > >> Thank,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >>
> > >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <pn...@apache.org>
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> It looks like option 1 wins overall? So let's go with that.
> > >>>
> > >>> Best,
> > >>> Piotrek
> > >>>
> > >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com>
> napisał(a):
> > >>>
> > >>>> Both option 1 (default impl in base interface) and option 2
> > (decorative
> > >>>> interface) are pretty common patterns. I would also be fine with
> > either.
> > >>>> The important thing is that an exception is thrown if a source
> doesn't
> > >>>> support the alignment capability.
> > >>>>
> > >>>> The other point is that we can validate the source capability if
> > >>>> alignment
> > >>>> is enabled in WatermarkStrategy. I believe either option can achieve
> > >>>> this
> > >>>> goal too.
> > >>>> public interface WatermarkStrategy<T> {
> > >>>>     WatermarkStrategy<T> withWatermarkAlignment(String
> watermarkGroup,
> > >>>> Duration maxAllowedWatermarkDrift);
> > >>>> }
> > >>>>
> > >>>> If I have to pick one, I am slightly favoring option 1 (base
> > >>>> interface). As
> > >>>> watermark is already an essential concept of source, maybe watermark
> > >>>> alignment capability can also be a property of the base
> source/reader
> > >>>> interface.
> > >>>>
> > >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org>
> wrote:
> > >>>>
> > >>>> > Hi everyone,
> > >>>> >
> > >>>> > Thank you for all the effort that went into this discussion. The
> > split
> > >>>> > level watermark alignment will be an important feature for Flink
> > that
> > >>>> > will address operational problems for various use cases. From
> > reading
> > >>>> > through this thread it appears that not too much remains to bring
> > this
> > >>>> > FLIP to acceptance and allow development to move forward. I would
> > like
> > >>>> > to contribute if possible.
> > >>>> >
> > >>>> > Regarding option 1 vs. option 2: I don't have a strong preference,
> > >>>> > perhaps slightly leaning towards option 1.
> > >>>> >
> > >>>> > However, from a user perspective, should the split level alignment
> > be
> > >>>> > an opt-in feature, at least for a few releases? If yes, then we
> > would
> > >>>> > require a knob similar to supportsPausingSplits(), which I
> > understand
> > >>>> > won't be part of the revised FLIP. Such control may be beneficial:
> > >>>> >
> > >>>> > * Compare runtime behavior with split level alignment on/off
> > >>>> > * Allow use of sources that don't implement pausing splits yet
> > >>>> >
> > >>>> > The second point would, from the user's perspective, be necessary
> > for
> > >>>> > backward compatibility? While the interface aspect and source
> > >>>> > compatibility has been discussed in great detail, I don't think it
> > >>>> > would be desirable if an application that already uses alignment
> > fails
> > >>>> > after upgrading to the new Flink version, forcing users to lock
> step
> > >>>> > modify sources for the new non-optional split level alignment.
> > >>>> >
> > >>>> > So I think clarification of the compatibility aspect on the FLIP
> > page
> > >>>> > would be necessary.
> > >>>> >
> > >>>> > Thanks,
> > >>>> > Thomas
> > >>>> >
> > >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> > >>>> piotr.nowojski@gmail.com>
> > >>>> > wrote:
> > >>>> > >
> > >>>> > > Hi Becket,
> > >>>> > >
> > >>>> > > Thanks for summing this up. Just one correction:
> > >>>> > >
> > >>>> > > > Piotr prefers option 2, his opinions are:
> > >>>> > > >   e) It is OK that the code itself in option 2 indicates the
> > >>>> developers
> > >>>> > > that a feature is optional. We will rely on the documentation to
> > >>>> correct
> > >>>> > > that and clarify that the feature is actually obligatory.
> > >>>> > >
> > >>>> > > I would say based on a) and b) that feature would be still
> > >>>> optional. So
> > >>>> > > both the implementation and the documentation would be saying
> > that.
> > >>>> We
> > >>>> > > could add a mention to the docs and release notes, that this
> > >>>> feature will
> > >>>> > > be obligatory in the next major release and plan such a release
> > >>>> > accordingly.
> > >>>> > >
> > >>>> > > Re the option 1., as you mentioned:
> > >>>> > > > As for option 1: For developers, the feature is still optional
> > >>>> due to
> > >>>> > the
> > >>>> > > default implementation in the interface, regardless of what the
> > >>>> default
> > >>>> > > implementation does, because the code compiles without
> overriding
> > >>>> these
> > >>>> > > methods
> > >>>> > >
> > >>>> > > Also importantly, the code will work in most cases.
> > >>>> > >
> > >>>> > > > Obligatory: Jobs may fail if these methods are not implemented
> > >>>> > properly.
> > >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common
> pattern
> > in
> > >>>> > Java,
> > >>>> > > e.g. Iterator.remove() by default throws
> > >>>> "UnsupportedOperationException",
> > >>>> > > informing the implementation that things may go wrong if this
> > >>>> method is
> > >>>> > not
> > >>>> > > implemented.
> > >>>> > >
> > >>>> > > For me `Iterator#remove()` is an optional feature. Personally, I
> > >>>> don't
> > >>>> > > remember if I have ever implemented it.
> > >>>> > >
> > >>>> > > Best,
> > >>>> > > Piotrek
> > >>>> > >
> > >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com>
> > >>>> napisał(a):
> > >>>> > >
> > >>>> > > > I had an offline discussion with Piotr and here is the
> summary.
> > >>>> Please
> > >>>> > > > correct me if I miss something, Piotr.
> > >>>> > > >
> > >>>> > > > There are two things we would like to seek more opinions from
> > the
> > >>>> > > > community, so we can make progress on this FLIP.
> > >>>> > > >
> > >>>> > > > 1. The General pattern to add obligatory features to existing
> > >>>> > interfaces.
> > >>>> > > >
> > >>>> > > >
> > >>>> >
> > >>>>
> >
> ***********************************************************************************
> > >>>> > > > For interfaces exposed to the developers for implementation,
> > they
> > >>>> are
> > >>>> > > > either intended to be *optional* or *obligatory. *While it is
> > >>>> quite
> > >>>> > clear
> > >>>> > > > about how to convey that intention when creating the
> interfaces,
> > >>>> it is
> > >>>> > not
> > >>>> > > > as commonly agreed when we are adding new features to an
> > existing
> > >>>> > > > interface. In general, Flink uses decorative interfaces when
> > >>>> adding
> > >>>> > > > optional features to existing interfaces. Both Piotr and I
> agree
> > >>>> that
> > >>>> > looks
> > >>>> > > > good.
> > >>>> > > >
> > >>>> > > > Different opinions are mainly about how to add obligatory
> > >>>> features to
> > >>>> > the
> > >>>> > > > existing interfaces, probably due to different understandings
> of
> > >>>> > > > "obligatory".
> > >>>> > > >
> > >>>> > > > We have discussed about four options:
> > >>>> > > >
> > >>>> > > > *Option 1:*
> > >>>> > > >
> > >>>> > > >    - Just add a new method to the existing interface.
> > >>>> > > >    - For backwards compatibility, the method would have a
> > default
> > >>>> > > >    implementation throwing "UnsupportedOperationException".
> > >>>> > > >    - In the next major version, remove the default
> > implementation.
> > >>>> > > >    - For the developers, any method with a default
> > implementation
> > >>>> > > >    throwing an "UnsupportedOperationException" should be taken
> > as
> > >>>> > obligatory.
> > >>>> > > >
> > >>>> > > > *Option 2:*
> > >>>> > > >
> > >>>> > > >    - Always make the features optional by adding a decorative
> > >>>> > interface,
> > >>>> > > >    just like ordinary optional features.
> > >>>> > > >    - Inform the developers via documentation that this feature
> > is
> > >>>> > > >    obligatory, although it looks like optional from the code.
> > >>>> > > >    - In case the developers did not implement the decorative
> > >>>> interface,
> > >>>> > > >    throw an exception
> > >>>> > > >    - In the next major version, move the methods in the
> > decorative
> > >>>> > > >    interface to the base interface, and deprecate the
> decorative
> > >>>> > interface.
> > >>>> > > >
> > >>>> > > > *Option 3:*
> > >>>> > > >
> > >>>> > > >    - Always bump the major version when a new obligatory
> feature
> > >>>> is
> > >>>> > > >    added, even if we may have to do it frequently.
> > >>>> > > >
> > >>>> > > > *Option 4:*
> > >>>> > > >
> > >>>> > > >    - Add a V2, V3... of the interface affected by the new
> > >>>> obligatory
> > >>>> > > >    feature.
> > >>>> > > >    - In the next major versions, deprecate old versions of the
> > >>>> > interfaces.
> > >>>> > > >
> > >>>> > > > Both Piotr and me agreed that option 3 and option 4 have a big
> > >>>> side
> > >>>> > effect
> > >>>> > > > and should be avoided. We have different preference between
> > >>>> option 1
> > >>>> > and
> > >>>> > > > option 2.
> > >>>> > > >
> > >>>> > > > Personally I prefer option 1, the reasons are:
> > >>>> > > >   a) simple and intuitive. Java 8 introduced the default impl
> in
> > >>>> > > > interfaces exactly for interface evolving, and this is a
> common
> > >>>> > pattern in
> > >>>> > > > many projects.
> > >>>> > > >   b) prominent to the developers that the feature is expected
> to
> > >>>> be
> > >>>> > > > implemented, because it explicitly throws an exception in the
> > >>>> default
> > >>>> > impl.
> > >>>> > > >   c) low maintenance overhead - the Flink framework can always
> > >>>> assume
> > >>>> > the
> > >>>> > > > method exists, so no special handling logic is needed.
> > >>>> > > >   d) communicate a clear semantic boundary between optional
> and
> > >>>> > obligatory
> > >>>> > > > features in the Flink to the developers.
> > >>>> > > >       - Optional: Jobs still run without exception if these
> > >>>> methods are
> > >>>> > > > not implemented. e.g. all the SupportsXXXPushDown interfaces.
> > >>>> > > >       - Obligatory: Jobs may fail if these methods are not
> > >>>> implemented
> > >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a
> > >>>> common
> > >>>> > pattern
> > >>>> > > > in Java, e.g. Iterator.remove() by default throws
> > >>>> > > > "UnsupportedOperationException", informing the implementation
> > that
> > >>>> > things
> > >>>> > > > may go wrong if this method is not implemented.
> > >>>> > > >
> > >>>> > > > As for option 2, Although the API itself sounds clean, it
> > misleads
> > >>>> > people
> > >>>> > > > to think of an obligatory feature to be optional - from the
> code
> > >>>> the
> > >>>> > > > feature is optional, but the documents say it is obligatory.
> We
> > >>>> > probably
> > >>>> > > > should avoid such code-doc inconsistency, as people will be
> > >>>> confused.
> > >>>> > And I
> > >>>> > > > would actually be bewildered that sometimes not implementing
> an
> > >>>> > "optional"
> > >>>> > > > feature is fine, but sometimes it causes the jobs to fail.
> > >>>> > > >
> > >>>> > > > In response to the argument that the method with a default
> > >>>> > implementation
> > >>>> > > > is always optional, if that is true, it actually means all the
> > >>>> > interfaces
> > >>>> > > > should be immutable once they are created. If we want to add a
> > >>>> method
> > >>>> > to an
> > >>>> > > > existing interface, for backwards compatibility, we will have
> to
> > >>>> > provide a
> > >>>> > > > default implementation. And the fact it has a default
> > >>>> implementation
> > >>>> > > > indicates the method is optional. If that method is optional,
> it
> > >>>> should
> > >>>> > > > reside in a separate decorative interface, otherwise it clogs
> > that
> > >>>> > existing
> > >>>> > > > interface. Therefore, people should never add a method to an
> > >>>> existing
> > >>>> > > > interface. I find this conclusion a bit extreme.
> > >>>> > > >
> > >>>> > > > Piotr prefers option 2, his opinions are:
> > >>>> > > >     a) Obligatory methods are the methods that fail the code
> > >>>> > compilation
> > >>>> > > > if not implemented.
> > >>>> > > >     b) All obligatory methods should reside in the base
> > interface,
> > >>>> > without
> > >>>> > > > a default implementation. And all the optional methods should
> be
> > >>>> in
> > >>>> > > > decorative interfaces. This is a clean API.
> > >>>> > > >     c) due to b), there isn't a viable solution to add an
> > >>>> obligatory
> > >>>> > > > method to an existing interface in a backwards compatible way.
> > >>>> Unless
> > >>>> > we
> > >>>> > > > are OK with breaking backwards compatibility, all the
> interfaces
> > >>>> > should be
> > >>>> > > > treated as immutable. As a compromise, we might as well just
> > >>>> treat all
> > >>>> > the
> > >>>> > > > features added later as optional features. This way we keep
> the
> > >>>> API
> > >>>> > clean.
> > >>>> > > >     d) based on b) and c), option 2 has a clean API, while
> > option
> > >>>> 1
> > >>>> > does
> > >>>> > > > not.
> > >>>> > > >     e) It is OK that the code itself in option 2 indicates the
> > >>>> > developers
> > >>>> > > > that a feature is optional. We will rely on the documentation
> to
> > >>>> > correct
> > >>>> > > > that and clarify that the feature is actually obligatory.
> > >>>> > > >     f) Regarding the effectiveness of making people aware that
> > the
> > >>>> > feature
> > >>>> > > > is obligatory, Option 1 and Option 2 are similar. For people
> > that
> > >>>> do
> > >>>> > not
> > >>>> > > > read the release note / documentation, they will mistake the
> > >>>> feature
> > >>>> > to be
> > >>>> > > > optional anyways.
> > >>>> > > >
> > >>>> > > > As for option 1: For developers, the feature is still optional
> > >>>> due to
> > >>>> > the
> > >>>> > > > default implementation in the interface, regardless of what
> the
> > >>>> default
> > >>>> > > > implementation does, because the code compiles without
> > overriding
> > >>>> these
> > >>>> > > > methods. Also, another problem of this option is that for
> users
> > >>>> that
> > >>>> > do not
> > >>>> > > > know about the history of the interface, they may be confused
> by
> > >>>> the
> > >>>> > > > default implementation throwing an exception.
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > 2. For this particular FLIP, should it be optional or not?
> > >>>> > > >
> > >>>> > > >
> > >>>> >
> > >>>>
> >
> ***********************************************************************************
> > >>>> > > > As mentioned in the previous email, I feel this FLIP should be
> > >>>> > obligatory,
> > >>>> > > > for the following reasons:
> > >>>> > > > 1. The Flink framework exposes the watermark alignment API to
> > the
> > >>>> end
> > >>>> > > > users. From the end users' perspective, the feature should be
> > >>>> available
> > >>>> > > > regardless of the implementation details in the pluggables.
> This
> > >>>> is
> > >>>> > true
> > >>>> > > > for any other methods exposed as the Flink API.
> > >>>> > > > 2. If a Source is not pausable, the end user should receive an
> > >>>> > exception
> > >>>> > > > when enable the watermark alignment, (both Piotr and me agree
> on
> > >>>> > this). In
> > >>>> > > > that case, it meets my criteria of obligatory feature because
> > not
> > >>>> > > > implementing the feature causes a framework API to throw
> > >>>> exception and
> > >>>> > > > fails the job.
> > >>>> > > >
> > >>>> > > > On the other hand, Piotr does not have a strong opinion
> > regarding
> > >>>> > whether
> > >>>> > > > this feature should be optional or not.
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > Thanks for reading through this long email. So basically in
> > order
> > >>>> to
> > >>>> > make
> > >>>> > > > progress on this FLIP, we want to see what do people feel
> about
> > >>>> the
> > >>>> > above
> > >>>> > > > two topics.
> > >>>> > > >
> > >>>> > > > Thanks,
> > >>>> > > >
> > >>>> > > > Jiangjie (Becket) Qin
> > >>>> > > >
> > >>>> > > >
> > >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> > >>>> pnowojski@apache.org>
> > >>>> > > > wrote:
> > >>>> > > >
> > >>>> > > >> Hi Becket,
> > >>>> > > >>
> > >>>> > > >> I still sustain what I wrote before:
> > >>>> > > >> > I think I would still vote soft -1 on this option, but I
> > >>>> wouldn't
> > >>>> > block
> > >>>> > > >> it in case I am out-voted.
> > >>>> > > >>
> > >>>> > > >> > I think it might be helpful to agree on the definition of
> > >>>> optional
> > >>>> > in
> > >>>> > > >> our
> > >>>> > > >> case.
> > >>>> > > >>
> > >>>> > > >> For me it doesn't matter whether a default method throwing an
> > >>>> > exception we
> > >>>> > > >> call optional or non-optional. As long as we keep it this
> way,
> > >>>> the
> > >>>> > effect
> > >>>> > > >> is the same. It's effectively a method that a user doesn't
> have
> > >>>> to
> > >>>> > > >> implement. If interface/system allows some methods to be not
> > >>>> > implemented,
> > >>>> > > >> some users will do just that, regardless if we call it and
> > >>>> document as
> > >>>> > > >> non-optional. And at the same time it's clogging the base
> > >>>> interface.
> > >>>> > > >>
> > >>>> > > >> By the way, just the need for a java-doc/documentation
> > >>>> explaining the
> > >>>> > > >> existence of some construct is a bad smell (code should be
> > >>>> > > >> self-documenting
> > >>>> > > >> and default method throwing an UnsupportedOperationException
> is
> > >>>> not).
> > >>>> > > >>
> > >>>> > > >> > Please note that so far we do not assume whether the
> feature
> > >>>> is in
> > >>>> > > >> > the original API or it is added later. A newly added
> feature
> > >>>> can
> > >>>> > also be
> > >>>> > > >> > non-optional, although it might take some time for all the
> > >>>> pluggable
> > >>>> > > >> > developers to catch up, and they should still work if the
> new
> > >>>> > feature is
> > >>>> > > >> > not used until they catch up. In contrast, we may never
> > expect
> > >>>> an
> > >>>> > > >> optional
> > >>>> > > >> > feature to catch up, because leaving it unimplemented is
> also
> > >>>> > blessed.
> > >>>> > > >> >
> > >>>> > > >> > Let's take the checkpointing as an example. Imagine Flink
> did
> > >>>> not
> > >>>> > > >> support
> > >>>> > > >> > checkpointing before release 1.16. And now we are trying to
> > add
> > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > >>>> configuration
> > >>>> > to
> > >>>> > > >> the
> > >>>> > > >> > end users. In the meantime, will we tell the pluggable
> (e.g.
> > >>>> > operators,
> > >>>> > > >> > connectors) developers that methods like "snapshotState()"
> is
> > >>>> > optional?
> > >>>> > > >> If
> > >>>> > > >> > we do that, the availability of checkpointing in Flink
> would
> > be
> > >>>> > severely
> > >>>> > > >> > weakened. But apparently we should still allow the existing
> > >>>> > > >> implementations
> > >>>> > > >> > to work without checkpointing. It looks to me that adding
> the
> > >>>> > method to
> > >>>> > > >> the
> > >>>> > > >> > pluggable interfaces with a default implementation throwing
> > >>>> > > >> > "UnsupportedOperationException" would be the solution here.
> > >>>> Please
> > >>>> > note
> > >>>> > > >> > that in this case, having the default implementation does
> not
> > >>>> mean
> > >>>> > this
> > >>>> > > >> is
> > >>>> > > >> > optional. It is just the technique to support backwards
> > >>>> > compatibility in
> > >>>> > > >> > the feature evolution. The fact that this method is in the
> > base
> > >>>> > > >> interface
> > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > >>>> implement it.
> > >>>> > > >>
> > >>>> > > >> I would soft vote -1 for having the default method throwing
> > >>>> > > >> UnsupportedOperationException as one of thing for this
> > (FLIP-217)
> > >>>> > special
> > >>>> > > >> circumstances.
> > >>>> > > >>
> > >>>> > > >> At the moment, without thinking this over too much, I would
> > vote
> > >>>> > harder -1
> > >>>> > > >> for having this as a general rule when adding new features.
> If
> > we
> > >>>> > ever end
> > >>>> > > >> up with an API that is littered with default methods throwing
> > >>>> > > >> UnsupportedOperationException that are documented as "non
> > >>>> optional" it
> > >>>> > > >> would be IMO a big design failure. I would be ok-ish with
> that,
> > >>>> only
> > >>>> > if
> > >>>> > > >> that was a temporary thing and we had an aggressive plan to
> > >>>> release
> > >>>> > more
> > >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking
> API
> > >>>> > > >> compatibility, that would get rid of those default methods.
> > >>>> Adding
> > >>>> > > >> checkpointing and methods like "snapshotState()" would IMO
> > easily
> > >>>> > justify
> > >>>> > > >> a
> > >>>> > > >> new major Flink release. In that case we could add those
> > methods
> > >>>> with
> > >>>> > > >> default implementation for some transition period, a one or
> two
> > >>>> minor
> > >>>> > > >> releases, followed by a clean up in a major release. However
> I
> > >>>> would
> > >>>> > still
> > >>>> > > >> argue that it would be cleaner/better to add a decorative
> > >>>> interface
> > >>>> > like
> > >>>> > > >> `CheckpointedOperator` interface instead of adding those
> > default
> > >>>> > methods
> > >>>> > > >> to
> > >>>> > > >> the base `Operator` interface.
> > >>>> > > >>
> > >>>> > > >> I think I can sum up our disagreement here is that I would
> like
> > >>>> to
> > >>>> > keep
> > >>>> > > >> the
> > >>>> > > >> interfaces simpler, with only obligatory methods/features on
> > one
> > >>>> side
> > >>>> > and
> > >>>> > > >> clearly optional features on the other. While you would like
> to
> > >>>> add an
> > >>>> > > >> extra third state in between those two?
> > >>>> > > >>
> > >>>> > > >> Best,
> > >>>> > > >> Piotrek
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >>
> > >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com>
> > >>>> > napisał(a):
> > >>>> > > >>
> > >>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
> > >>>> > > >> >
> > >>>> > > >> > It looks like the key problem is still whether the
> > >>>> implementation of
> > >>>> > > >> > pausable splits in the Sources should be optional or not.
> > >>>> > > >> >
> > >>>> > > >> > I think it might be helpful to agree on the definition of
> > >>>> optional
> > >>>> > in
> > >>>> > > >> our
> > >>>> > > >> > case. To me:
> > >>>> > > >> > Optional = "You CAN leave the method unimplemented, and
> that
> > is
> > >>>> > fine."
> > >>>> > > >> > Non-Optional = "You CAN leave the method unimplemented, but
> > you
> > >>>> > SHOULD
> > >>>> > > >> NOT,
> > >>>> > > >> > because people assume this works."
> > >>>> > > >> >
> > >>>> > > >> > I think one sufficient condition of a Non-Optional feature
> is
> > >>>> that
> > >>>> > if
> > >>>> > > >> the
> > >>>> > > >> > feature is exposed through the framework API, Flink should
> > >>>> expect
> > >>>> > the
> > >>>> > > >> > pluggables to support this feature by default. Otherwise
> the
> > >>>> > > >> availability
> > >>>> > > >> > of that feature becomes undefined.
> > >>>> > > >> >
> > >>>> > > >> > Please note that so far we do not assume whether the
> feature
> > >>>> is in
> > >>>> > > >> > the original API or it is added later. A newly added
> feature
> > >>>> can
> > >>>> > also be
> > >>>> > > >> > non-optional, although it might take some time for all the
> > >>>> pluggable
> > >>>> > > >> > developers to catch up, and they should still work if the
> new
> > >>>> > feature is
> > >>>> > > >> > not used until they catch up. In contrast, we may never
> > expect
> > >>>> an
> > >>>> > > >> optional
> > >>>> > > >> > feature to catch up, because leaving it unimplemented is
> also
> > >>>> > blessed.
> > >>>> > > >> >
> > >>>> > > >> > Let's take the checkpointing as an example. Imagine Flink
> did
> > >>>> not
> > >>>> > > >> support
> > >>>> > > >> > checkpointing before release 1.16. And now we are trying to
> > add
> > >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> > >>>> configuration
> > >>>> > to
> > >>>> > > >> the
> > >>>> > > >> > end users. In the meantime, will we tell the pluggable
> (e.g.
> > >>>> > operators,
> > >>>> > > >> > connectors) developers that methods like "snapshotState()"
> is
> > >>>> > optional?
> > >>>> > > >> If
> > >>>> > > >> > we do that, the availability of checkpointing in Flink
> would
> > be
> > >>>> > severely
> > >>>> > > >> > weakened. But apparently we should still allow the existing
> > >>>> > > >> implementations
> > >>>> > > >> > to work without checkpointing. It looks to me that adding
> the
> > >>>> > method to
> > >>>> > > >> the
> > >>>> > > >> > pluggable interfaces with a default implementation throwing
> > >>>> > > >> > "UnsupportedOperationException" would be the solution here.
> > >>>> Please
> > >>>> > note
> > >>>> > > >> > that in this case, having the default implementation does
> not
> > >>>> mean
> > >>>> > this
> > >>>> > > >> is
> > >>>> > > >> > optional. It is just the technique to support backwards
> > >>>> > compatibility in
> > >>>> > > >> > the feature evolution. The fact that this method is in the
> > base
> > >>>> > > >> interface
> > >>>> > > >> > suggests it is not optional, so the developers SHOULD
> > >>>> implement it.
> > >>>> > > >> >
> > >>>> > > >> > When it comes to this FLIP, I think it meets the criteria
> of
> > >>>> > > >> non-optional
> > >>>> > > >> > features, so we should just use the evolution path of
> > >>>> non-optional
> > >>>> > > >> > features.
> > >>>> > > >> >
> > >>>> > > >> > Thanks,
> > >>>> > > >> >
> > >>>> > > >> > Jiangjie (Becket) Qin
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> >
> > >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > >>>> > pnowojski@apache.org>
> > >>>> > > >> > wrote:
> > >>>> > > >> >
> > >>>> > > >> > > Hi,
> > >>>> > > >> > >
> > >>>> > > >> > > Actually previously I thought about having a decorative
> > >>>> interface
> > >>>> > and
> > >>>> > > >> > > whenever watermark alignment is enabled, checking that
> the
> > >>>> source
> > >>>> > > >> > > implements the decorative interface. If not, throwing an
> > >>>> > exception.
> > >>>> > > >> > >
> > >>>> > > >> > > The option with default methods in the source interfaces
> > >>>> throwing
> > >>>> > > >> > > `UnsupportedOperationException` I think still suffers
> from
> > >>>> the
> > >>>> > same
> > >>>> > > >> > > problems I mentioned before. It's still an optional
> > >>>> implementation
> > >>>> > > >> and at
> > >>>> > > >> > > the same time it's clogging the base interface. I think I
> > >>>> would
> > >>>> > still
> > >>>> > > >> > vote
> > >>>> > > >> > > soft -1 on this option, but I wouldn't block it in case I
> > am
> > >>>> > > >> out-voted.
> > >>>> > > >> > >
> > >>>> > > >> > > Best,
> > >>>> > > >> > > Piotrek
> > >>>> > > >> > >
> > >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > >>>> > sebastian@ververica.com>
> > >>>> > > >> > > napisał(a):
> > >>>> > > >> > >
> > >>>> > > >> > > > Hi Becket,
> > >>>> > > >> > > >
> > >>>> > > >> > > > Thanks a lot for your fast and detailed response. For
> me,
> > >>>> it
> > >>>> > > >> converges
> > >>>> > > >> > > and
> > >>>> > > >> > > > dropping the supportsX method sounds very reasonable to
> > me.
> > >>>> > (Side
> > >>>> > > >> note:
> > >>>> > > >> > > > With "pausable splits" enabled as "default" I think we
> > >>>> > > >> misunderstood.
> > >>>> > > >> > As
> > >>>> > > >> > > > you described now "default" I understand as that it
> > should
> > >>>> be
> > >>>> > the
> > >>>> > > >> new
> > >>>> > > >> > > > recommended way of implementation, and I think that is
> > >>>> fully
> > >>>> > valid.
> > >>>> > > >> > > Before,
> > >>>> > > >> > > > I understood "default" here as the default
> > implementation,
> > >>>> i.e.,
> > >>>> > > >> > throwing
> > >>>> > > >> > > > UnsupportedOperationException, which is the exact
> > >>>> opposite. :) )
> > >>>> > > >> > > >
> > >>>> > > >> > > > Nevertheless: As mentioned, an open question for me is
> if
> > >>>> > watermark
> > >>>> > > >> > > > alignment should enforce pausable splits. For
> > >>>> clarification, the
> > >>>> > > >> > current
> > >>>> > > >> > > > documentation [1] says:
> > >>>> > > >> > > >
> > >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning across
> tasks
> > >>>> of the
> > >>>> > same
> > >>>> > > >> > > >> source and/or different sources. It does not support
> > >>>> aligning
> > >>>> > > >> > > >> splits/partitions/shards in the same task.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> In a case where there are e.g. two Kafka partitions
> that
> > >>>> > produce
> > >>>> > > >> > > >> watermarks at different pace, that get assigned to the
> > >>>> same
> > >>>> > task
> > >>>> > > >> > > watermark
> > >>>> > > >> > > >> might not behave as expected. Fortunately, worst case
> it
> > >>>> > should not
> > >>>> > > >> > > perform
> > >>>> > > >> > > >> worse than without alignment.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> Given the limitation above, we suggest applying
> > watermark
> > >>>> > > >> alignment in
> > >>>> > > >> > > >> two situations:
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>    1. You have two different sources (e.g. Kafka and
> > >>>> File) that
> > >>>> > > >> > produce
> > >>>> > > >> > > >>    watermarks at different speeds
> > >>>> > > >> > > >>    2. You run your source with parallelism equal to
> the
> > >>>> number
> > >>>> > of
> > >>>> > > >> > > >>    splits/shards/partitions, which results in every
> > >>>> subtask
> > >>>> > being
> > >>>> > > >> > > assigned a
> > >>>> > > >> > > >>    single unit of work.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> I personally see no issue in implementing and I see no
> > >>>> reason
> > >>>> > > >> against
> > >>>> > > >> > > > implementing this dependency of watermark alignment and
> > >>>> pausable
> > >>>> > > >> > splits.
> > >>>> > > >> > > (I
> > >>>> > > >> > > > think this would even be a good path towards shaping
> > >>>> watermark
> > >>>> > > >> > alignment
> > >>>> > > >> > > in
> > >>>> > > >> > > > 1.16.) However, "I don't see" means that I would be
> happy
> > >>>> to
> > >>>> > hear
> > >>>> > > >> > Dawid's
> > >>>> > > >> > > > and Piotrek's opinions as they implemented watermark
> > >>>> alignment
> > >>>> > > >> based on
> > >>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
> > >>>> > rationale/background
> > >>>> > > >> > info
> > >>>> > > >> > > > from their side.
> > >>>> > > >> > > >
> > >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > >>>> > > >> > > >
> > >>>> > > >> > > > Regards,
> > >>>> > > >> > > > Sebastian
> > >>>> > > >> > > >
> > >>>> > > >> > > > [1]
> > >>>> > > >> > > >
> > >>>> > > >> > >
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> >
> > >>>>
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > >>>> > > >> > > > [2]
> > >>>> > > >> > > >
> > >>>> > > >> > >
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > >>>> > > >> > > >
> > >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > >>>> > becket.qin@gmail.com>
> > >>>> > > >> > wrote:
> > >>>> > > >> > > >
> > >>>> > > >> > > >> +dev
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> Hi Sebastian,
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> Thank you for the summary. Please see the detailed
> > replies
> > >>>> > inline.
> > >>>> > > >> As
> > >>>> > > >> > a
> > >>>> > > >> > > >> recap of my suggestions.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> 1. Pausable splits API.
> > >>>> > > >> > > >>   a) Add default implementations to methods
> > >>>> > "pauseOrResumeSplits"
> > >>>> > > >> in
> > >>>> > > >> > > both
> > >>>> > > >> > > >> SourceReader and SplitReader where both default
> > >>>> implementations
> > >>>> > > >> throw
> > >>>> > > >> > > >>  UnsupportedOperationException.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> 2. User story.
> > >>>> > > >> > > >>     a) We tell users to enable the watermark alignment
> > as
> > >>>> they
> > >>>> > > >> like.
> > >>>> > > >> > > This
> > >>>> > > >> > > >> is exactly what the current Flink API is.
> > >>>> > > >> > > >>     b) We tell the source developers, please implement
> > >>>> pausable
> > >>>> > > >> > splits,
> > >>>> > > >> > > >> otherwise bad things may happen. Think of it like you
> > are
> > >>>> > expected
> > >>>> > > >> to
> > >>>> > > >> > > >> implement SourceReader#snapshotState() properly,
> > otherwise
> > >>>> > > >> exceptions
> > >>>> > > >> > > will
> > >>>> > > >> > > >> be thrown when users enable checkpointing.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> Thanks,
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> Jiangjie (Becket) Qin
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> > >>>> > > >> > > >> sebastian@ververica.com> wrote:
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>> Hi Becket, Hi everybody,
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> I'm sorry if I misread the messages but I could not
> > >>>> derive an
> > >>>> > > >> > agreement
> > >>>> > > >> > > >>> from the mailing list. Nevertheless, if I understand
> > you
> > >>>> > right the
> > >>>> > > >> > > >>> suggestion is:
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> * Add default implementations to methods
> > >>>> > "pauseOrResumeSplits" in
> > >>>> > > >> > both
> > >>>> > > >> > > >>> SourceReader and SplitReader where both default
> > >>>> > implementations
> > >>>> > > >> throw
> > >>>> > > >> > > >>> UnsupportedOperationException.
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >> Yes.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source
> > >>>> interface.
> > >>>> > (In
> > >>>> > > >> the
> > >>>> > > >> > > >>> following, I refer to supporting this as "pausable
> > >>>> splits".)
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >> We may no longer need this if pausable splits are
> > >>>> expected to
> > >>>> > be
> > >>>> > > >> > > >> implemented by the source developers, i.e.
> non-optional.
> > >>>> Having
> > >>>> > > >> this
> > >>>> > > >> > > method
> > >>>> > > >> > > >> would then be somewhat misleading as it looks like the
> > >>>> sources
> > >>>> > > >> that do
> > >>>> > > >> > > not
> > >>>> > > >> > > >> support pausable splits are also acceptable in the
> long
> > >>>> term.
> > >>>> > So
> > >>>> > > >> API
> > >>>> > > >> > > wise,
> > >>>> > > >> > > >> I'd say maybe we should remove this for this FLIP,
> > >>>> although I
> > >>>> > > >> believe
> > >>>> > > >> > > this
> > >>>> > > >> > > >> supportXXX pattern itself is still attractive for
> > optional
> > >>>> > > >> features.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> To make the conclusions explicit:
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in both
> > >>>> > interfaces
> > >>>> > > >> > > >>> SourceReader and SplitReader are optional where the
> > >>>> default is
> > >>>> > > >> that
> > >>>> > > >> > it
> > >>>> > > >> > > >>> doesn't support it. (--> This means that the
> > >>>> implementation is
> > >>>> > > >> still
> > >>>> > > >> > > >>> optional for the source developer.)
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >> It is optional for backwards compatibility with
> existing
> > >>>> > sources,
> > >>>> > > >> as
> > >>>> > > >> > > they
> > >>>> > > >> > > >> may still compile without code change. But starting
> from
> > >>>> this
> > >>>> > FLIP,
> > >>>> > > >> > > Flink
> > >>>> > > >> > > >> will always optimistically assume that all the sources
> > >>>> support
> > >>>> > > >> > pausable
> > >>>> > > >> > > >> splits. If a source does not support pausable splits,
> it
> > >>>> goes
> > >>>> > to an
> > >>>> > > >> > > error
> > >>>> > > >> > > >> handling path when watermark alignment is enabled on
> it.
> > >>>> This
> > >>>> > is
> > >>>> > > >> > > different
> > >>>> > > >> > > >> from a usual optional feature, where no error is
> > expected.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>> 2. If watermark alignment is enabled in the
> application
> > >>>> code
> > >>>> > by
> > >>>> > > >> > adding
> > >>>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
> > >>>> > > >> SourceReader or
> > >>>> > > >> > > >>> SplitReader do not support pausableSplits, we throw
> an
> > >>>> > > >> > > >>> UnsupportedOperationException.
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >> Yes.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>> 3. With regard to your statement:
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>>> [...] basically means watermark alignment is an
> > >>>> non-optional
> > >>>> > > >> feature
> > >>>> > > >> > > to
> > >>>> > > >> > > >>>> the end users.
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> You actually mean that "pausable splits" are
> > >>>> non-optional for
> > >>>> > the
> > >>>> > > >> app
> > >>>> > > >> > > >>> developer if watermark alignment is enabled. However,
> > >>>> > watermark
> > >>>> > > >> > > alignment
> > >>>> > > >> > > >>> is optional and can be enabled/disabled.
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >> Yes, watermark alignment can be enabled/disabled in
> > >>>> individual
> > >>>> > > >> sources
> > >>>> > > >> > > in
> > >>>> > > >> > > >> Flink jobs, which basically means the code supporting
> > >>>> watermark
> > >>>> > > >> > > alignment
> > >>>> > > >> > > >> has to already be there. That again means the Source
> > >>>> > developers are
> > >>>> > > >> > also
> > >>>> > > >> > > >> expected to support pausable splits by default. So
> this
> > >>>> way we
> > >>>> > > >> > > essentially
> > >>>> > > >> > > >> tell the end users that you may enable / disable this
> > >>>> feature
> > >>>> > as
> > >>>> > > >> you
> > >>>> > > >> > > wish,
> > >>>> > > >> > > >> and tell the source developers that you SHOULD
> implement
> > >>>> this
> > >>>> > > >> because
> > >>>> > > >> > > the
> > >>>> > > >> > > >> end users may turn it on/off at will. And if the
> source
> > >>>> does
> > >>>> > not
> > >>>> > > >> > support
> > >>>> > > >> > > >> pausable splits, that goes to an error handling path
> > when
> > >>>> > watermark
> > >>>> > > >> > > >> alignment is enabled on it. So users know they have to
> > >>>> > explicitly
> > >>>> > > >> > > exclude
> > >>>> > > >> > > >> this source.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> So far it's totally clear to me and I hope this is
> what
> > >>>> you
> > >>>> > mean.
> > >>>> > > >> I
> > >>>> > > >> > > also
> > >>>> > > >> > > >>> agree with both statements:
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> So making that expectation aligned with the source
> > >>>> developers
> > >>>> > > >> seems
> > >>>> > > >> > > >>>> reasonable.
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> I think this is a simple and clean solution from both
> > >>>> the end
> > >>>> > user
> > >>>> > > >> > and
> > >>>> > > >> > > >>>> source developers' standpoint.
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> However, a last conclusion derives from 3. and is an
> > open
> > >>>> > question
> > >>>> > > >> > for
> > >>>> > > >> > > >>> me:
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> 4. The feature of "pausable splits" is now tightly
> > bound
> > >>>> to
> > >>>> > > >> watermark
> > >>>> > > >> > > >>> alignment, i.e., if sources do not support "pausable
> > >>>> splits"
> > >>>> > one
> > >>>> > > >> can
> > >>>> > > >> > > not
> > >>>> > > >> > > >>> enable watermark alignment for these sources. This
> > >>>> dependency
> > >>>> > is
> > >>>> > > >> not
> > >>>> > > >> > > the
> > >>>> > > >> > > >>> current status of watermark alignment implementation
> > >>>> because
> > >>>> > it
> > >>>> > > >> > is/was
> > >>>> > > >> > > >>> implemented without pausable splits. Do we want to
> > >>>> introduce
> > >>>> > this
> > >>>> > > >> > > >>> dependency? (This is an open question. I cannot judge
> > >>>> that.)
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >> The watermark alignment basically relies on the
> pausable
> > >>>> > splits,
> > >>>> > > >> > right?
> > >>>> > > >> > > >> So personally I found it quite reasonable that if the
> > >>>> source
> > >>>> > does
> > >>>> > > >> not
> > >>>> > > >> > > >> support pausable splits, end users cannot enable
> > watermark
> > >>>> > > >> alignment
> > >>>> > > >> > on
> > >>>> > > >> > > it.
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>
> > >>>> > > >> > > >>> If something is wrong, please correct me.
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> Regards,
> > >>>> > > >> > > >>> Sebastian
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> > >>>> > becket.qin@gmail.com>
> > >>>> > > >> > > wrote:
> > >>>> > > >> > > >>>
> > >>>> > > >> > > >>>> Hi Sebastian,
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>> Thanks for the reply and patient discussion. I agree
> > >>>> this is
> > >>>> > a
> > >>>> > > >> > tricky
> > >>>> > > >> > > >>>> decision.
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option
> > c)
> > >>>> > which I
> > >>>> > > >> see
> > >>>> > > >> > as
> > >>>> > > >> > > >>>>> follows:
> > >>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
> > >>>> makes the
> > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> default
> > >>>> > > >> > implementation
> > >>>> > > >> > > is and
> > >>>> > > >> > > >>>>> will remain a way of making implementation optional
> > >>>> because
> > >>>> > > >> even in
> > >>>> > > >> > > future
> > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> flavor"
> > >>>> without
> > >>>> > > >> > support
> > >>>> > > >> > > for
> > >>>> > > >> > > >>>>> pausable splits.
> > >>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
> > >>>> suboptimal
> > >>>> > > >> that
> > >>>> > > >> > > with
> > >>>> > > >> > > >>>>> a NOOP default implementation there is no way to
> > check
> > >>>> at
> > >>>> > > >> runtime
> > >>>> > > >> > if
> > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> pausing.
> > >>>> (To
> > >>>> > do so,
> > >>>> > > >> > one
> > >>>> > > >> > > would
> > >>>> > > >> > > >>>>> need a supportsX method which makes it again more
> > >>>> > complicated.)\
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>> Based on the last few messages in the mailing list.
> > >>>> Piotr
> > >>>> > and I
> > >>>> > > >> > > agreed
> > >>>> > > >> > > >>>> that the default implementation should just throw an
> > >>>> > > >> > > >>>> UnsupportedOperationException if the source is
> > >>>> unpausable. So
> > >>>> > > >> this
> > >>>> > > >> > > >>>> basically tells the Source developers that this
> > feature
> > >>>> is
> > >>>> > > >> expected
> > >>>> > > >> > > to be
> > >>>> > > >> > > >>>> supported. Because we cannot prevent end users from
> > >>>> putting
> > >>>> > an
> > >>>> > > >> > > unpausable
> > >>>> > > >> > > >>>> source into the watermark alignment group, that
> > >>>> basically
> > >>>> > means
> > >>>> > > >> > > watermark
> > >>>> > > >> > > >>>> alignment is an non-optional feature to the end
> users.
> > >>>> So
> > >>>> > making
> > >>>> > > >> > that
> > >>>> > > >> > > >>>> expectation aligned with the source developers seems
> > >>>> > reasonable.
> > >>>> > > >> > And
> > >>>> > > >> > > if a
> > >>>> > > >> > > >>>> source does not support this feature, the end users
> > >>>> should
> > >>>> > > >> > explicitly
> > >>>> > > >> > > >>>> remove that source from the watermark alignment
> group.
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>> Personally speaking I think this is a simple and
> clean
> > >>>> > solution
> > >>>> > > >> from
> > >>>> > > >> > > >>>> both the end user and source developers' standpoint.
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>> Does this address your concerns?
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>> Thanks,
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> > >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> > >>>> > > >> > > >>>>
> > >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> > >>>> suggestions/options
> > >>>> > and
> > >>>> > > >> we
> > >>>> > > >> > > >>>>> would be okay either way because we find neither
> > >>>> solution is
> > >>>> > > >> > perfect
> > >>>> > > >> > > just
> > >>>> > > >> > > >>>>> because of the already present complexity.
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
> > >>>> SourceReader
> > >>>> > and
> > >>>> > > >> > > >>>>> SplitReader
> > >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to be used
> by
> > >>>> > > >> SourceReader
> > >>>> > > >> > and
> > >>>> > > >> > > >>>>> SplitReader
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP [1],
> it
> > is
> > >>>> > based on
> > >>>> > > >> > > >>>>> Option c) which we find acceptable because the
> > >>>> complexity
> > >>>> > added
> > >>>> > > >> is
> > >>>> > > >> > > only a
> > >>>> > > >> > > >>>>> single method.
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option
> > c)
> > >>>> > which I
> > >>>> > > >> see
> > >>>> > > >> > as
> > >>>> > > >> > > >>>>> follows:
> > >>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
> > >>>> makes the
> > >>>> > > >> > > >>>>> implementation optional. And in my opinion, a
> default
> > >>>> > > >> > implementation
> > >>>> > > >> > > is and
> > >>>> > > >> > > >>>>> will remain a way of making implementation optional
> > >>>> because
> > >>>> > > >> even in
> > >>>> > > >> > > future
> > >>>> > > >> > > >>>>> a developer can decide to implement the "old
> flavor"
> > >>>> without
> > >>>> > > >> > support
> > >>>> > > >> > > for
> > >>>> > > >> > > >>>>> pausable splits.
> > >>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
> > >>>> suboptimal
> > >>>> > > >> that
> > >>>> > > >> > > with
> > >>>> > > >> > > >>>>> a NOOP default implementation there is no way to
> > check
> > >>>> at
> > >>>> > > >> runtime
> > >>>> > > >> > if
> > >>>> > > >> > > >>>>> SourceReader or SplitReader actually support
> pausing.
> > >>>> (To
> > >>>> > do so,
> > >>>> > > >> > one
> > >>>> > > >> > > would
> > >>>> > > >> > > >>>>> need a supportsX method which makes it again more
> > >>>> > complicated.)
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> However, we haven't changed it because Option a) is
> > >>>> also not
> > >>>> > > >> > optimal
> > >>>> > > >> > > >>>>> or straight-forward:
> > >>>> > > >> > > >>>>> (1) We need to add two distinct yet similar
> > decorative
> > >>>> > > >> interfaces
> > >>>> > > >> > > >>>>> since, as mentioned, the signatures of the methods
> > are
> > >>>> > > >> different.
> > >>>> > > >> > For
> > >>>> > > >> > > >>>>> example, we would need decorative interfaces like
> > >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > >>>> > > >> > `SourceReaderWithPausableSplits`.
> > >>>> > > >> > > >>>>> (2) As a consequence, we would need to somehow
> > document
> > >>>> > > >> how/where
> > >>>> > > >> > to
> > >>>> > > >> > > >>>>> implement both interfaces and how this relates to
> > each
> > >>>> > other.
> > >>>> > > >> This
> > >>>> > > >> > > we could
> > >>>> > > >> > > >>>>> solve by adding a note in the interface of
> > >>>> SourceReader and
> > >>>> > > >> > > SplitReader and
> > >>>> > > >> > > >>>>> reference to the decorative interfaces but it still
> > >>>> > increases
> > >>>> > > >> > > complexity
> > >>>> > > >> > > >>>>> too.
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> In summary, we see both as acceptable and preferred
> > >>>> over
> > >>>> > other
> > >>>> > > >> > > >>>>> options. The question is if we can find a solution
> or
> > >>>> > compromise
> > >>>> > > >> > > that is
> > >>>> > > >> > > >>>>> acceptable for everybody to reach consensus.
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> Please let us know what you think because we would
> be
> > >>>> happy
> > >>>> > if
> > >>>> > > >> we
> > >>>> > > >> > can
> > >>>> > > >> > > >>>>> conclude the discussion to avoid dropping the
> > >>>> initiative on
> > >>>> > this
> > >>>> > > >> > > FLIP.
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> Regards,
> > >>>> > > >> > > >>>>> Sebastian
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> [1]
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > >
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > >>>> > > >> > > >>>>> (v. 12)
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> > >>>> > > >> > pnowojski@apache.org
> > >>>> > > >> > > >
> > >>>> > > >> > > >>>>> wrote:
> > >>>> > > >> > > >>>>>
> > >>>> > > >> > > >>>>>> Hi Guowei,
> > >>>> > > >> > > >>>>>>
> > >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
> > >>>> > > >> > > >>>>>>
> > >>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1] which
> has
> > >>>> been
> > >>>> > > >> already
> > >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be enabled
> > >>>> with the
> > >>>> > > >> > watermark
> > >>>> > > >> > > >>>>>> strategy which also configures drift and update
> > >>>> interval
> > >>>> > > >> > > >>>>>>
> > >>>> > > >> > > >>>>>> So by default watermark alignment is disabled,
> > >>>> regardless
> > >>>> > if a
> > >>>> > > >> > > source
> > >>>> > > >> > > >>>>>> supports it or not.
> > >>>> > > >> > > >>>>>>
> > >>>> > > >> > > >>>>>> Best,
> > >>>> > > >> > > >>>>>> Piotrek
> > >>>> > > >> > > >>>>>>
> > >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> > >>>> guowei.mgw@gmail.com>
> > >>>> > > >> > > napisał(a):
> > >>>> > > >> > > >>>>>>
> > >>>> > > >> > > >>>>>>> Hi,
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>> We know that in the case of Bounded input Flink
> > >>>> supports
> > >>>> > the
> > >>>> > > >> > Batch
> > >>>> > > >> > > >>>>>>> execution mode. Currently in Batch execution
> mode,
> > >>>> flink
> > >>>> > is
> > >>>> > > >> > > executed
> > >>>> > > >> > > >>>>>>> on a
> > >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps
> > watermark
> > >>>> > alignment
> > >>>> > > >> > > might
> > >>>> > > >> > > >>>>>>> not
> > >>>> > > >> > > >>>>>>> gain much.
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>> So my question is: Is watermark alignment the
> > default
> > >>>> > > >> > behavior(for
> > >>>> > > >> > > >>>>>>> implemented source only)? If so, have you
> > considered
> > >>>> > > >> evaluating
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> impact
> > >>>> > > >> > > >>>>>>> of this behavior on the Batch execution mode? Or
> > >>>> thinks
> > >>>> > it is
> > >>>> > > >> not
> > >>>> > > >> > > >>>>>>> necessary.
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>> Correct me if I miss something.
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>> Best,
> > >>>> > > >> > > >>>>>>> Guowei
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> > >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > >>>> > > >> > > >>>>>>> wrote:
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > > I feel that no matter which option we choose
> > >>>> this can
> > >>>> > not
> > >>>> > > >> be
> > >>>> > > >> > > >>>>>>> solved
> > >>>> > > >> > > >>>>>>> > entirely in either of the options, because of
> the
> > >>>> point
> > >>>> > > >> above
> > >>>> > > >> > and
> > >>>> > > >> > > >>>>>>> because
> > >>>> > > >> > > >>>>>>> > the signature of
> SplitReader#pauseOrResumeSplits
> > >>>> and
> > >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly
> > >>>> different
> > >>>> > (one
> > >>>> > > >> > > >>>>>>> identifies
> > >>>> > > >> > > >>>>>>> > splits with splitId the other one passes the
> > splits
> > >>>> > > >> directly).
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case and for
> > >>>> features
> > >>>> > that
> > >>>> > > >> > need
> > >>>> > > >> > > >>>>>>> to be
> > >>>> > > >> > > >>>>>>> > implemented in more than one place.
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from
> a
> > >>>> split
> > >>>> > an
> > >>>> > > >> > > optional
> > >>>> > > >> > > >>>>>>> feature,
> > >>>> > > >> > > >>>>>>> > > other than that this was not included in the
> > >>>> original
> > >>>> > > >> > > interface?
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > An additional argument in favor of making it
> > >>>> optional
> > >>>> > is to
> > >>>> > > >> > > >>>>>>> simplify source
> > >>>> > > >> > > >>>>>>> > implementation. But on its own I'm not sure if
> > that
> > >>>> > would be
> > >>>> > > >> > > >>>>>>> enough to
> > >>>> > > >> > > >>>>>>> > justify making this feature optional. Maybe.
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer
> to
> > >>>> just
> > >>>> > let
> > >>>> > > >> end
> > >>>> > > >> > > >>>>>>> users and
> > >>>> > > >> > > >>>>>>> > Flink
> > >>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
> > >>>> feature.
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > As I wrote above that would be an interesting
> > >>>> choice to
> > >>>> > make
> > >>>> > > >> > > (ease
> > >>>> > > >> > > >>>>>>> of
> > >>>> > > >> > > >>>>>>> > implementation for new users, vs system
> > >>>> consistency).
> > >>>> > > >> > Regardless
> > >>>> > > >> > > >>>>>>> of that,
> > >>>> > > >> > > >>>>>>> > yes, for me the main argument is the API
> backward
> > >>>> > > >> > compatibility.
> > >>>> > > >> > > >>>>>>> But let's
> > >>>> > > >> > > >>>>>>> > clear a couple of points:
> > >>>> > > >> > > >>>>>>> > - The current proposal adding methods to the
> base
> > >>>> > interface
> > >>>> > > >> > with
> > >>>> > > >> > > >>>>>>> default
> > >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as
> > the
> > >>>> > > >> decorative
> > >>>> > > >> > > >>>>>>> version
> > >>>> > > >> > > >>>>>>> > would be.
> > >>>> > > >> > > >>>>>>> > - Decorative version could implement "throw
> > >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> > >>>> > > >> > > >>>>>>> > if user enabled watermark alignment just as
> well
> > >>>> and I
> > >>>> > agree
> > >>>> > > >> > > >>>>>>> that's a
> > >>>> > > >> > > >>>>>>> > better option compared to logging a warning.
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > Best,
> > >>>> > > >> > > >>>>>>> > Piotrek
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > >>>> > becket.qin@gmail.com>
> > >>>> > > >> > > >>>>>>> napisał(a):
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient discussion,
> > >>>> Piotr and
> > >>>> > > >> Dawid.
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from
> a
> > >>>> split
> > >>>> > an
> > >>>> > > >> > > optional
> > >>>> > > >> > > >>>>>>> feature,
> > >>>> > > >> > > >>>>>>> > > other than that this was not included in the
> > >>>> original
> > >>>> > > >> > > interface?
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > To be honest I am really worried about the
> > >>>> complexity
> > >>>> > of
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> user story
> > >>>> > > >> > > >>>>>>> > > here. Optional features like this have a high
> > >>>> > overhead.
> > >>>> > > >> > Imagine
> > >>>> > > >> > > >>>>>>> this
> > >>>> > > >> > > >>>>>>> > > feature is optional, now a user enabled
> > watermark
> > >>>> > > >> alignment
> > >>>> > > >> > and
> > >>>> > > >> > > >>>>>>> defined a
> > >>>> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm,
> that
> > >>>> > depends on
> > >>>> > > >> > > >>>>>>> whether the
> > >>>> > > >> > > >>>>>>> > > involved Source has implmemented this
> feature.
> > >>>> If the
> > >>>> > > >> Sources
> > >>>> > > >> > > >>>>>>> are well
> > >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end users
> may
> > >>>> have to
> > >>>> > > >> look
> > >>>> > > >> > > into
> > >>>> > > >> > > >>>>>>> the code
> > >>>> > > >> > > >>>>>>> > > of the Source to see whether the feature is
> > >>>> supported.
> > >>>> > > >> Which
> > >>>> > > >> > is
> > >>>> > > >> > > >>>>>>> something
> > >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer
> to
> > >>>> just
> > >>>> > let
> > >>>> > > >> end
> > >>>> > > >> > > >>>>>>> users and
> > >>>> > > >> > > >>>>>>> > Flink
> > >>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
> > >>>> feature.
> > >>>> > > >> After
> > >>>> > > >> > > all
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to the end
> > >>>> users. If
> > >>>> > in
> > >>>> > > >> some
> > >>>> > > >> > > >>>>>>> rare cases,
> > >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
> > >>>> > > >> > > >>>>>>> UnsupportedOperationException
> > >>>> > > >> > > >>>>>>> > will
> > >>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly remove
> > this
> > >>>> > Source
> > >>>> > > >> from
> > >>>> > > >> > > the
> > >>>> > > >> > > >>>>>>> > watermark
> > >>>> > > >> > > >>>>>>> > > group. I don't think we should have a warning
> > >>>> message
> > >>>> > > >> here,
> > >>>> > > >> > as
> > >>>> > > >> > > >>>>>>> they tend
> > >>>> > > >> > > >>>>>>> > to
> > >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do this, we
> > >>>> don't even
> > >>>> > > >> need
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> > supportXXX
> > >>>> > > >> > > >>>>>>> > > method in the Source for this feature. In
> fact
> > >>>> this is
> > >>>> > > >> > exactly
> > >>>> > > >> > > >>>>>>> how many
> > >>>> > > >> > > >>>>>>> > > interfaces works today. For example,
> > >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > >>>> > > >> > > >>>>>>> > not
> > >>>> > > >> > > >>>>>>> > > supported by Pravega source because it does
> not
> > >>>> > support
> > >>>> > > >> > partial
> > >>>> > > >> > > >>>>>>> failover.
> > >>>> > > >> > > >>>>>>> > > In that case, it simply throws an exception
> to
> > >>>> > trigger a
> > >>>> > > >> > global
> > >>>> > > >> > > >>>>>>> recovery.
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > The reason we add a default implementation in
> > >>>> this
> > >>>> > case
> > >>>> > > >> would
> > >>>> > > >> > > >>>>>>> just for
> > >>>> > > >> > > >>>>>>> > the
> > >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so the old
> > >>>> source can
> > >>>> > > >> still
> > >>>> > > >> > > >>>>>>> compile.
> > >>>> > > >> > > >>>>>>> > Sure,
> > >>>> > > >> > > >>>>>>> > > in short term, this feature might not be
> > >>>> supported by
> > >>>> > many
> > >>>> > > >> > > >>>>>>> existing
> > >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite visible
> to
> > >>>> the
> > >>>> > source
> > >>>> > > >> > > >>>>>>> developers
> > >>>> > > >> > > >>>>>>> > that
> > >>>> > > >> > > >>>>>>> > > they did not override the default impl which
> > >>>> throws an
> > >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > @Dawid,
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in
> the
> > >>>> Source
> > >>>> > > >> would
> > >>>> > > >> > be
> > >>>> > > >> > > >>>>>>> the single
> > >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to implement
> > >>>> this
> > >>>> > > >> feature.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > I also don't find it entirely true. Half of
> the
> > >>>> > classes
> > >>>> > > >> are
> > >>>> > > >> > > >>>>>>> theoretically
> > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
> > >>>> point of
> > >>>> > view
> > >>>> > > >> how
> > >>>> > > >> > > the
> > >>>> > > >> > > >>>>>>> > > interfaces
> > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not
> > need
> > >>>> to
> > >>>> > use
> > >>>> > > >> any
> > >>>> > > >> > of
> > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be
> > weird
> > >>>> to
> > >>>> > list
> > >>>> > > >> > their
> > >>>> > > >> > > >>>>>>> methods in
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > > Source interface.
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is to
> > >>>> guide
> > >>>> > users
> > >>>> > > >> to
> > >>>> > > >> > > >>>>>>> implement the
> > >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred
> > way
> > >>>> to
> > >>>> > > >> > implement a
> > >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning that.
> > >>>> Even the
> > >>>> > > >> Java
> > >>>> > > >> > > >>>>>>> language
> > >>>> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
> > >>>> > implementations
> > >>>> > > >> [1]
> > >>>> > > >> > so
> > >>>> > > >> > > >>>>>>> people
> > >>>> > > >> > > >>>>>>> > can
> > >>>> > > >> > > >>>>>>> > > leverage them. But for this particular case,
> if
> > >>>> we
> > >>>> > make
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> feature
> > >>>> > > >> > > >>>>>>> > > non-optional, we don't even need the
> > supportXXX()
> > >>>> > method
> > >>>> > > >> for
> > >>>> > > >> > > now.
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > Thanks,
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid
> > Wysakowicz <
> > >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > wrote:
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy with
> > >>>> whichever
> > >>>> > > >> option
> > >>>> > > >> > is
> > >>>> > > >> > > >>>>>>> agreed in
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > > discussion.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from the
> > >>>> discussion
> > >>>> > > >> > though:
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > @Becket:
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > The main argument for adding the methods to
> > the
> > >>>> > > >> > SourceReader
> > >>>> > > >> > > >>>>>>> is that
> > >>>> > > >> > > >>>>>>> > > these
> > >>>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
> > >>>> SourceReader
> > >>>> > > >> impl,
> > >>>> > > >> > > i.e.
> > >>>> > > >> > > >>>>>>> > starting
> > >>>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl
> > are
> > >>>> > expected
> > >>>> > > >> to
> > >>>> > > >> > > >>>>>>> support this
> > >>>> > > >> > > >>>>>>> > > > method, although some old implementations
> may
> > >>>> not
> > >>>> > have
> > >>>> > > >> > > >>>>>>> implemented this
> > >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the
> > new
> > >>>> > features
> > >>>> > > >> > from
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > optional
> > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > >>>> interface is a
> > >>>> > > >> > solution
> > >>>> > > >> > > >>>>>>> to the
> > >>>> > > >> > > >>>>>>> > > > optional features. We should not use it for
> > the
> > >>>> > features
> > >>>> > > >> > that
> > >>>> > > >> > > >>>>>>> are
> > >>>> > > >> > > >>>>>>> > > > non-optional.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > I don't think that this feature is
> > >>>> NON-OPTIONAL.
> > >>>> > Even
> > >>>> > > >> > though
> > >>>> > > >> > > >>>>>>> > preferred, I
> > >>>> > > >> > > >>>>>>> > > > still think it can be simply optional.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in
> > the
> > >>>> > Source
> > >>>> > > >> would
> > >>>> > > >> > > be
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > single
> > >>>> > > >> > > >>>>>>> > > > source of truth regarding how to implement
> > this
> > >>>> > feature.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > I also don't find it entirely true. Half of
> > the
> > >>>> > classes
> > >>>> > > >> are
> > >>>> > > >> > > >>>>>>> > theoretically
> > >>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
> > >>>> point of
> > >>>> > view
> > >>>> > > >> how
> > >>>> > > >> > > the
> > >>>> > > >> > > >>>>>>> > > interfaces
> > >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not
> > need
> > >>>> to
> > >>>> > use
> > >>>> > > >> any
> > >>>> > > >> > of
> > >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be
> > weird
> > >>>> to
> > >>>> > list
> > >>>> > > >> > their
> > >>>> > > >> > > >>>>>>> methods in
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > > Source interface.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > @Piotr
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > If we have all of the methods with default
> > >>>> > > >> implementation
> > >>>> > > >> > in
> > >>>> > > >> > > >>>>>>> the base
> > >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any clue to
> > >>>> the user
> > >>>> > > >> which
> > >>>> > > >> > > set
> > >>>> > > >> > > >>>>>>> of
> > >>>> > > >> > > >>>>>>> > methods
> > >>>> > > >> > > >>>>>>> > > > are required to be implemented at the same
> > >>>> time.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > I feel that no matter which option we
> choose
> > >>>> this
> > >>>> > can
> > >>>> > > >> not
> > >>>> > > >> > be
> > >>>> > > >> > > >>>>>>> solved
> > >>>> > > >> > > >>>>>>> > > > entirely in either of the options, because
> of
> > >>>> the
> > >>>> > point
> > >>>> > > >> > above
> > >>>> > > >> > > >>>>>>> and
> > >>>> > > >> > > >>>>>>> > because
> > >>>> > > >> > > >>>>>>> > > > the signature of
> > >>>> SplitReader#pauseOrResumeSplits and
> > >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are
> slightly
> > >>>> > different
> > >>>> > > >> > (one
> > >>>> > > >> > > >>>>>>> identifies
> > >>>> > > >> > > >>>>>>> > > > splits with splitId the other one passes
> the
> > >>>> splits
> > >>>> > > >> > > directly).
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Best,
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Dawid
> > >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the
> > >>>> decorative
> > >>>> > > >> > interfaces,
> > >>>> > > >> > > >>>>>>> but I do
> > >>>> > > >> > > >>>>>>> > > > think we should use them with caution. The
> > main
> > >>>> > argument
> > >>>> > > >> > for
> > >>>> > > >> > > >>>>>>> adding the
> > >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that these
> > >>>> methods
> > >>>> > are
> > >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader
> > impl,
> > >>>> i.e.
> > >>>> > > >> > starting
> > >>>> > > >> > > >>>>>>> from this
> > >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are
> expected
> > >>>> to
> > >>>> > support
> > >>>> > > >> > this
> > >>>> > > >> > > >>>>>>> > > > method, although some old implementations
> may
> > >>>> not
> > >>>> > have
> > >>>> > > >> > > >>>>>>> implemented this
> > >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the
> > new
> > >>>> > features
> > >>>> > > >> > from
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > optional
> > >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> > >>>> interface is a
> > >>>> > > >> > solution
> > >>>> > > >> > > >>>>>>> to the
> > >>>> > > >> > > >>>>>>> > > > optional features. We should not use it for
> > the
> > >>>> > features
> > >>>> > > >> > that
> > >>>> > > >> > > >>>>>>> are
> > >>>> > > >> > > >>>>>>> > > > non-optional.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > That said, this feature is optional for
> > >>>> > SplitReaders.
> > >>>> > > >> > > Arguably
> > >>>> > > >> > > >>>>>>> we can
> > >>>> > > >> > > >>>>>>> > > have
> > >>>> > > >> > > >>>>>>> > > > a decorative interface for that, but for
> > >>>> simplicity
> > >>>> > and
> > >>>> > > >> > > >>>>>>> symmetry of the
> > >>>> > > >> > > >>>>>>> > > > interface, personally I prefer just adding
> a
> > >>>> new
> > >>>> > method.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned
> about
> > >>>> the
> > >>>> > > >> decorative
> > >>>> > > >> > > >>>>>>> interfaces,
> > >>>> > > >> > > >>>>>>> > > > they would make sense if:
> > >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative interface
> > >>>> involved
> > >>>> > for a
> > >>>> > > >> > > >>>>>>> feature.
> > >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the methods
> > are
> > >>>> > grouped
> > >>>> > > >> > > >>>>>>> together will
> > >>>> > > >> > > >>>>>>> > not
> > >>>> > > >> > > >>>>>>> > > > stand.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Compared with that, I think the current
> > >>>> solution
> > >>>> > works
> > >>>> > > >> fine
> > >>>> > > >> > > in
> > >>>> > > >> > > >>>>>>> all
> > >>>> > > >> > > >>>>>>> > cases,
> > >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source,
> > and
> > >>>> > default
> > >>>> > > >> > > >>>>>>> methods /
> > >>>> > > >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > The advantages are:
> > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> interface
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > In the current approach, the Java doc of
> the
> > >>>> > > >> SupportXXX()
> > >>>> > > >> > > >>>>>>> method in the
> > >>>> > > >> > > >>>>>>> > > > Source would be the single source of truth
> > >>>> regarding
> > >>>> > > >> how to
> > >>>> > > >> > > >>>>>>> implement
> > >>>> > > >> > > >>>>>>> > > this
> > >>>> > > >> > > >>>>>>> > > > feature. It lists the method that has to be
> > >>>> > implemented
> > >>>> > > >> to
> > >>>> > > >> > > >>>>>>> support this
> > >>>> > > >> > > >>>>>>> > > > feature, regardless of how many classes /
> > >>>> > interfaces are
> > >>>> > > >> > > >>>>>>> involved.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > When implementing the base interface, users
> > do
> > >>>> not
> > >>>> > need
> > >>>> > > >> to
> > >>>> > > >> > > >>>>>>> implement a
> > >>>> > > >> > > >>>>>>> > > > method with default implementation. If they
> > are
> > >>>> > curious
> > >>>> > > >> > what
> > >>>> > > >> > > >>>>>>> the method
> > >>>> > > >> > > >>>>>>> > > is
> > >>>> > > >> > > >>>>>>> > > > for, the java doc of that method simply
> > points
> > >>>> > users to
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> > SupportXXX()
> > >>>> > > >> > > >>>>>>> > > > method in the Source. It seems not adding
> > work
> > >>>> to
> > >>>> > the
> > >>>> > > >> users
> > >>>> > > >> > > >>>>>>> compared
> > >>>> > > >> > > >>>>>>> > with
> > >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much
> better
> > >>>> > > >> > discoverability.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > - all of the methods from a single feature
> > are
> > >>>> > grouped
> > >>>> > > >> in a
> > >>>> > > >> > > >>>>>>> single
> > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
> > >>>> dedicated
> > >>>> > java
> > >>>> > > >> > doc.
> > >>>> > > >> > > >>>>>>> It's also
> > >>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
> > >>>> > decorator
> > >>>> > > >> name
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > - if an optional feature requires two
> methods
> > >>>> to be
> > >>>> > > >> > > >>>>>>> implemented at
> > >>>> > > >> > > >>>>>>> > once,
> > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > These two points are not true when multiple
> > >>>> > components
> > >>>> > > >> and
> > >>>> > > >> > > >>>>>>> classes are
> > >>>> > > >> > > >>>>>>> > > > involved collaboratively to provide a
> > feature.
> > >>>> In
> > >>>> > our
> > >>>> > > >> case,
> > >>>> > > >> > > we
> > >>>> > > >> > > >>>>>>> have
> > >>>> > > >> > > >>>>>>> > both
> > >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved. And
> > >>>> there
> > >>>> > might
> > >>>> > > >> be
> > >>>> > > >> > > other
> > >>>> > > >> > > >>>>>>> > > interfaces
> > >>>> > > >> > > >>>>>>> > > > on the JM side involved for some future
> > >>>> features.
> > >>>> > So the
> > >>>> > > >> > > >>>>>>> relevant
> > >>>> > > >> > > >>>>>>> > methods
> > >>>> > > >> > > >>>>>>> > > > can actually be scattered over the places.
> > That
> > >>>> > said, we
> > >>>> > > >> > may
> > >>>> > > >> > > >>>>>>> still use
> > >>>> > > >> > > >>>>>>> > > > decorative interfaces for each component,
> if
> > >>>> the
> > >>>> > > >> feature is
> > >>>> > > >> > > >>>>>>> optional,
> > >>>> > > >> > > >>>>>>> > > given
> > >>>> > > >> > > >>>>>>> > > > there is a single source of truth for the
> > >>>> feature.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards making
> > life
> > >>>> > easier
> > >>>> > > >> for
> > >>>> > > >> > new
> > >>>> > > >> > > >>>>>>> users,
> > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
> > >>>> slight
> > >>>> > expense
> > >>>> > > >> > for
> > >>>> > > >> > > >>>>>>> the power
> > >>>> > > >> > > >>>>>>> > > >> users.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > I actually think the current approach is
> > >>>> simpler,
> > >>>> > more
> > >>>> > > >> > > >>>>>>> extensible and
> > >>>> > > >> > > >>>>>>> > > more
> > >>>> > > >> > > >>>>>>> > > > general for all the users. Can you
> articulate
> > >>>> a bit
> > >>>> > > >> more on
> > >>>> > > >> > > >>>>>>> which part
> > >>>> > > >> > > >>>>>>> > > you
> > >>>> > > >> > > >>>>>>> > > > think makes users harder to understand?
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > There is another benefit of the decorative
> > >>>> > interfaces
> > >>>> > > >> which
> > >>>> > > >> > > is
> > >>>> > > >> > > >>>>>>> not
> > >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth considering
> > here.
> > >>>> > Usually
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> decorative
> > >>>> > > >> > > >>>>>>> > > > interfaces give slightly better backwards
> > >>>> > compatibility
> > >>>> > > >> > than
> > >>>> > > >> > > >>>>>>> the new
> > >>>> > > >> > > >>>>>>> > > > default method in the interfaces. That is
> > when
> > >>>> > users are
> > >>>> > > >> > > using
> > >>>> > > >> > > >>>>>>> a jar
> > >>>> > > >> > > >>>>>>> > that
> > >>>> > > >> > > >>>>>>> > > > was compiled with an older version of Flink
> > >>>> which
> > >>>> > does
> > >>>> > > >> not
> > >>>> > > >> > > >>>>>>> have the
> > >>>> > > >> > > >>>>>>> > > default
> > >>>> > > >> > > >>>>>>> > > > method in the interfaces in question. A
> > >>>> decorative
> > >>>> > > >> > interface
> > >>>> > > >> > > >>>>>>> may still
> > >>>> > > >> > > >>>>>>> > > > provide backwards compatibility in that
> case,
> > >>>> while
> > >>>> > > >> default
> > >>>> > > >> > > >>>>>>> method impl
> > >>>> > > >> > > >>>>>>> > > > cannot.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do not
> > guarantee
> > >>>> > custom
> > >>>> > > >> > > >>>>>>> components
> > >>>> > > >> > > >>>>>>> > > compiled
> > >>>> > > >> > > >>>>>>> > > > with an older version can run with a newer
> > >>>> version
> > >>>> > of
> > >>>> > > >> > Flink.
> > >>>> > > >> > > A
> > >>>> > > >> > > >>>>>>> > recompile
> > >>>> > > >> > > >>>>>>> > > > with a newer version would be required.
> That
> > >>>> said,
> > >>>> > if
> > >>>> > > >> we do
> > >>>> > > >> > > >>>>>>> care about
> > >>>> > > >> > > >>>>>>> > > > this, we can just change the "supportXXX()"
> > >>>> method
> > >>>> > in
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> Source
> > >>>> > > >> > > >>>>>>> > > interface
> > >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave the
> > >>>> other
> > >>>> > parts
> > >>>> > > >> > > >>>>>>> unchanged.
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Thanks,
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr
> > Nowojski <
> > >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> > >>>> > > >> > > >>>>>>> > > > wrote:
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > >> Hi,
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was
> OoO
> > >>>> for
> > >>>> > the
> > >>>> > > >> last
> > >>>> > > >> > > two
> > >>>> > > >> > > >>>>>>> weeks.
> > >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be
> > actually
> > >>>> > against
> > >>>> > > >> > adding
> > >>>> > > >> > > >>>>>>> those
> > >>>> > > >> > > >>>>>>> > > methods
> > >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons
> > >>>> mentioned
> > >>>> > above.
> > >>>> > > >> > > >>>>>>> Clogging the
> > >>>> > > >> > > >>>>>>> > > base
> > >>>> > > >> > > >>>>>>> > > >> interface for new users with tons of
> methods
> > >>>> that
> > >>>> > they
> > >>>> > > >> do
> > >>>> > > >> > > not
> > >>>> > > >> > > >>>>>>> need, do
> > >>>> > > >> > > >>>>>>> > > not
> > >>>> > > >> > > >>>>>>> > > >> understand and do not know what to do with
> > >>>> them.
> > >>>> > > >> Moreover,
> > >>>> > > >> > > >>>>>>> such
> > >>>> > > >> > > >>>>>>> > > decorative
> > >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a
> > feature
> > >>>> > requires
> > >>>> > > >> two
> > >>>> > > >> > > or
> > >>>> > > >> > > >>>>>>> more
> > >>>> > > >> > > >>>>>>> > > methods
> > >>>> > > >> > > >>>>>>> > > >> to be implemented at the same time. If we
> > >>>> have all
> > >>>> > of
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> methods with
> > >>>> > > >> > > >>>>>>> > > >> default implementation in the base
> > interface,
> > >>>> the
> > >>>> > API
> > >>>> > > >> > > doesn't
> > >>>> > > >> > > >>>>>>> give any
> > >>>> > > >> > > >>>>>>> > > >> clue
> > >>>> > > >> > > >>>>>>> > > >> to the user which set of methods are
> > required
> > >>>> to be
> > >>>> > > >> > > >>>>>>> implemented at the
> > >>>> > > >> > > >>>>>>> > > >> same
> > >>>> > > >> > > >>>>>>> > > >> time.
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of
> > decorative
> > >>>> > > >> interfaces
> > >>>> > > >> > is
> > >>>> > > >> > > >>>>>>> which
> > >>>> > > >> > > >>>>>>> > > >> interface
> > >>>> > > >> > > >>>>>>> > > >> > they can decorate and which combinations
> > of
> > >>>> > multiple
> > >>>> > > >> > > >>>>>>> decorative
> > >>>> > > >> > > >>>>>>> > > >> interfaces
> > >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> > >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new feature
> > >>>> added
> > >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data
> > >>>> aware), are
> > >>>> > we
> > >>>> > > >> > going
> > >>>> > > >> > > >>>>>>> to create
> > >>>> > > >> > > >>>>>>> > > >> another
> > >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
> > >>>> > SortedSplitReader or
> > >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the
> > >>>> additional
> > >>>> > > >> > decorative
> > >>>> > > >> > > >>>>>>> interface
> > >>>> > > >> > > >>>>>>> > > >> like
> > >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases
> the
> > >>>> > > >> understanding
> > >>>> > > >> > > >>>>>>> cost of
> > >>>> > > >> > > >>>>>>> > users
> > >>>> > > >> > > >>>>>>> > > >> > because they have to know what
> decorative
> > >>>> > interfaces
> > >>>> > > >> are
> > >>>> > > >> > > >>>>>>> there,
> > >>>> > > >> > > >>>>>>> > which
> > >>>> > > >> > > >>>>>>> > > >> > interface they can decorate and which
> > >>>> > combinations of
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> decorative
> > >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are not.
> > >>>> Ideally
> > >>>> > we
> > >>>> > > >> want
> > >>>> > > >> > to
> > >>>> > > >> > > >>>>>>> avoid
> > >>>> > > >> > > >>>>>>> > that.
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how embedding
> > >>>> default
> > >>>> > > >> methods
> > >>>> > > >> > > in
> > >>>> > > >> > > >>>>>>> the base
> > >>>> > > >> > > >>>>>>> > > >> interface is solving the problem: what can
> > be
> > >>>> > combined
> > >>>> > > >> or
> > >>>> > > >> > > >>>>>>> not? If
> > >>>> > > >> > > >>>>>>> > there
> > >>>> > > >> > > >>>>>>> > > >> are
> > >>>> > > >> > > >>>>>>> > > >> two conflicting features, having
> decorative
> > >>>> > interfaces
> > >>>> > > >> > that
> > >>>> > > >> > > >>>>>>> can not be
> > >>>> > > >> > > >>>>>>> > > >> mixed together actually makes much more
> > sense
> > >>>> to me
> > >>>> > > >> rather
> > >>>> > > >> > > >>>>>>> than having
> > >>>> > > >> > > >>>>>>> > > >> them
> > >>>> > > >> > > >>>>>>> > > >> all in one base class. How would you allow
> > >>>> users to
> > >>>> > > >> > > implement
> > >>>> > > >> > > >>>>>>> only one
> > >>>> > > >> > > >>>>>>> > > of
> > >>>> > > >> > > >>>>>>> > > >> those two features?
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are
> > >>>> > drawbacks:
> > >>>> > > >> > > >>>>>>> > > >> - how can a user discover what decorators
> > are
> > >>>> > there?
> > >>>> > > >> > > >>>>>>> > > >> - how can a user know where the decorator
> > can
> > >>>> be
> > >>>> > > >> applied
> > >>>> > > >> > to?
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for more power
> > >>>> users,
> > >>>> > that
> > >>>> > > >> can
> > >>>> > > >> > > be
> > >>>> > > >> > > >>>>>>> > mitigated
> > >>>> > > >> > > >>>>>>> > > by
> > >>>> > > >> > > >>>>>>> > > >> the documentation. For example listing all
> > of
> > >>>> the
> > >>>> > > >> > decorators
> > >>>> > > >> > > >>>>>>> with
> > >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the docs and
> in
> > >>>> the
> > >>>> > java
> > >>>> > > >> > docs.
> > >>>> > > >> > > >>>>>>> More
> > >>>> > > >> > > >>>>>>> > > >> experienced users will be able to deal
> with
> > >>>> those
> > >>>> > > >> issues
> > >>>> > > >> > > >>>>>>> easier, as
> > >>>> > > >> > > >>>>>>> > they
> > >>>> > > >> > > >>>>>>> > > >> will already have some basic understanding
> > of
> > >>>> > Flink.
> > >>>> > > >> Also
> > >>>> > > >> > if
> > >>>> > > >> > > >>>>>>> user has
> > >>>> > > >> > > >>>>>>> > a
> > >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he will
> > google
> > >>>> > search a
> > >>>> > > >> > > >>>>>>> potential
> > >>>> > > >> > > >>>>>>> > > solution
> > >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing
> that
> > >>>> he is
> > >>>> > very
> > >>>> > > >> > > likely
> > >>>> > > >> > > >>>>>>> to
> > >>>> > > >> > > >>>>>>> > > discover
> > >>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in the
> > >>>> docs.
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> The advantages are:
> > >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base
> interface
> > >>>> > > >> > > >>>>>>> > > >> - all of the methods from a single feature
> > are
> > >>>> > grouped
> > >>>> > > >> in
> > >>>> > > >> > a
> > >>>> > > >> > > >>>>>>> single
> > >>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
> > >>>> dedicated
> > >>>> > java
> > >>>> > > >> > doc.
> > >>>> > > >> > > >>>>>>> It's also
> > >>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
> > >>>> > decorator
> > >>>> > > >> name
> > >>>> > > >> > > >>>>>>> > > >> - if an optional feature requires two
> > methods
> > >>>> to be
> > >>>> > > >> > > >>>>>>> implemented at
> > >>>> > > >> > > >>>>>>> > once,
> > >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards making
> > life
> > >>>> > easier
> > >>>> > > >> for
> > >>>> > > >> > > new
> > >>>> > > >> > > >>>>>>> users,
> > >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
> > >>>> slight
> > >>>> > expense
> > >>>> > > >> > for
> > >>>> > > >> > > >>>>>>> the power
> > >>>> > > >> > > >>>>>>> > > >> users.
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> Piotrek
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> > >>>> > > >> becket.qin@gmail.com
> > >>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > napisał(a):
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and
> Dawid,
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good summary.
> This
> > >>>> is a
> > >>>> > > >> really
> > >>>> > > >> > > >>>>>>> helpful
> > >>>> > > >> > > >>>>>>> > > >> discussion.
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it
> might
> > >>>> still
> > >>>> > be
> > >>>> > > >> > better
> > >>>> > > >> > > >>>>>>> to add the
> > >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source
> rather
> > >>>> than
> > >>>> > > >> > > SourceReader.
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are trying
> to
> > do
> > >>>> > here is
> > >>>> > > >> to
> > >>>> > > >> > > let
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > Flink
> > >>>> > > >> > > >>>>>>> > > >> > framework know what the Source is
> capable
> > >>>> of. In
> > >>>> > this
> > >>>> > > >> > > FLIP,
> > >>>> > > >> > > >>>>>>> it
> > >>>> > > >> > > >>>>>>> > happens
> > >>>> > > >> > > >>>>>>> > > >> to
> > >>>> > > >> > > >>>>>>> > > >> > be the capability that only involves
> > >>>> > SourceReader.
> > >>>> > > >> But
> > >>>> > > >> > in
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > future,
> > >>>> > > >> > > >>>>>>> > > >> it is
> > >>>> > > >> > > >>>>>>> > > >> > possible that another functionality
> > involves
> > >>>> > both the
> > >>>> > > >> > > >>>>>>> > SplitEnumerator
> > >>>> > > >> > > >>>>>>> > > >> and
> > >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case, following
> the
> > >>>> current
> > >>>> > > >> > > approach,
> > >>>> > > >> > > >>>>>>> we
> > >>>> > > >> > > >>>>>>> > should
> > >>>> > > >> > > >>>>>>> > > >> put
> > >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
> > >>>> > SplitEnumerator
> > >>>> > > >> and
> > >>>> > > >> > > >>>>>>> SourceReader.
> > >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
> > >>>> SourceReader,
> > >>>> > then
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> JM would
> > >>>> > > >> > > >>>>>>> > > have
> > >>>> > > >> > > >>>>>>> > > >> to
> > >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to know
> > >>>> whether
> > >>>> > this
> > >>>> > > >> > > feature
> > >>>> > > >> > > >>>>>>> is
> > >>>> > > >> > > >>>>>>> > > >> supported,
> > >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we put
> the
> > >>>> > > >> > "supportsXXX()"
> > >>>> > > >> > > >>>>>>> method in
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > >> > Source, we will break the "symmetric"
> > design
> > >>>> > because
> > >>>> > > >> > this
> > >>>> > > >> > > >>>>>>> FLIP
> > >>>> > > >> > > >>>>>>> > chose a
> > >>>> > > >> > > >>>>>>> > > >> > different way.
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > This is also why I think supportsXXX()
> > >>>> method
> > >>>> > seems a
> > >>>> > > >> > good
> > >>>> > > >> > > >>>>>>> thing to
> > >>>> > > >> > > >>>>>>> > > >> have,
> > >>>> > > >> > > >>>>>>> > > >> > because when there are a few interfaces
> /
> > >>>> methods
> > >>>> > > >> that
> > >>>> > > >> > are
> > >>>> > > >> > > >>>>>>> expected
> > >>>> > > >> > > >>>>>>> > to
> > >>>> > > >> > > >>>>>>> > > >> be
> > >>>> > > >> > > >>>>>>> > > >> > implemented at the same time in order to
> > >>>> deliver
> > >>>> > a
> > >>>> > > >> > > feature,
> > >>>> > > >> > > >>>>>>> it is
> > >>>> > > >> > > >>>>>>> > > always
> > >>>> > > >> > > >>>>>>> > > >> > good to have a single source of truth to
> > >>>> tell the
> > >>>> > > >> > > framework
> > >>>> > > >> > > >>>>>>> what to
> > >>>> > > >> > > >>>>>>> > > do,
> > >>>> > > >> > > >>>>>>> > > >> so
> > >>>> > > >> > > >>>>>>> > > >> > the framework can do consistent things
> in
> > >>>> > different
> > >>>> > > >> > parts.
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> > >>>> sebastian@ververica.com>
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> > >>>> > > >> AlignedSourceReader
> > >>>> > > >> > +
> > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward
> > >>>> about is
> > >>>> > > >> that we
> > >>>> > > >> > > are
> > >>>> > > >> > > >>>>>>> > > essentially
> > >>>> > > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
> > >>>> > > >> implementations to
> > >>>> > > >> > > >>>>>>> extend
> > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively
> makes
> > >>>> the
> > >>>> > > >> > SourceReader
> > >>>> > > >> > > >>>>>>> interface
> > >>>> > > >> > > >>>>>>> > > >> > without the pausing support useless. So
> > this
> > >>>> > > >> indicates
> > >>>> > > >> > > that
> > >>>> > > >> > > >>>>>>> public
> > >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces
> > >>>> for the
> > >>>> > same
> > >>>> > > >> > > >>>>>>> purpose) only
> > >>>> > > >> > > >>>>>>> > > >> > make sense if the original interface is
> > also
> > >>>> > > >> expected to
> > >>>> > > >> > > be
> > >>>> > > >> > > >>>>>>> used.
> > >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to
> > add
> > >>>> the
> > >>>> > > >> method
> > >>>> > > >> > to
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > original
> > >>>> > > >> > > >>>>>>> > > >> > interface itself.
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > Cheers,
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid
> > >>>> Wysakowicz
> > >>>> > <
> > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > wrote:
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice
> summary.
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > I think most of your points aligned
> with
> > >>>> the
> > >>>> > > >> > suggestions
> > >>>> > > >> > > >>>>>>> I made to
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your
> reply
> > (I
> > >>>> > believe
> > >>>> > > >> we
> > >>>> > > >> > > hit
> > >>>> > > >> > > >>>>>>> enter
> > >>>> > > >> > > >>>>>>> > > >> nearly at
> > >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > >>>> > > >> supportsWatermarksSplitAlignment
> > >>>> > > >> > > to
> > >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the
> > >>>> general
> > >>>> > > >> > capability
> > >>>> > > >> > > of
> > >>>> > > >> > > >>>>>>> > pausing.
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > >>>> > > >> > > >>>>>>> > > (option
> > >>>> > > >> > > >>>>>>> > > >> b)
> > >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the methods
> > >>>> (option
> > >>>> > c), I
> > >>>> > > >> > > >>>>>>> suggest to
> > >>>> > > >> > > >>>>>>> > simply
> > >>>> > > >> > > >>>>>>> > > >> add
> > >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is much
> > >>>> > preferred
> > >>>> > > >> > > approach
> > >>>> > > >> > > >>>>>>> Becket,
> > >>>> > > >> > > >>>>>>> > > >> which
> > >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless there is
> an
> > >>>> > opposition
> > >>>> > > >> > > let's
> > >>>> > > >> > > >>>>>>> go with
> > >>>> > > >> > > >>>>>>> > > this
> > >>>> > > >> > > >>>>>>> > > >> > > option c.
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Dawid
> > >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian
> Mattheis
> > >>>> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many
> > thanks
> > >>>> for
> > >>>> > all
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> input and
> > >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I
> needed
> > >>>> some
> > >>>> > time
> > >>>> > > >> to
> > >>>> > > >> > > >>>>>>> catch up and
> > >>>> > > >> > > >>>>>>> > > >> > structure
> > >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I
> wanted
> > >>>> to
> > >>>> > find a
> > >>>> > > >> way
> > >>>> > > >> > > to
> > >>>> > > >> > > >>>>>>> > structure
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the
> > >>>> feeling
> > >>>> > that
> > >>>> > > >> some
> > >>>> > > >> > > >>>>>>> concerns
> > >>>> > > >> > > >>>>>>> > got
> > >>>> > > >> > > >>>>>>> > > >> lost
> > >>>> > > >> > > >>>>>>> > > >> > in
> > >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt
> and
> > >>>> please
> > >>>> > > >> correct
> > >>>> > > >> > > me
> > >>>> > > >> > > >>>>>>> if
> > >>>> > > >> > > >>>>>>> > > >> something is
> > >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to
> > >>>> collect and
> > >>>> > > >> > assemble
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > > opinions,
> > >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the
> > best
> > >>>> of my
> > >>>> > > >> > > >>>>>>> knowledge):
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
> > >>>> (pause/resume
> > >>>> > > >> > behavior)
> > >>>> > > >> > > >>>>>>> be a
> > >>>> > > >> > > >>>>>>> > general
> > >>>> > > >> > > >>>>>>> > > >> > > capability?
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any reason no
> to
> > >>>> have
> > >>>> > it a
> > >>>> > > >> > > general
> > >>>> > > >> > > >>>>>>> > capability
> > >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit method it
> is
> > >>>> > actually
> > >>>> > > >> > > >>>>>>> independent of
> > >>>> > > >> > > >>>>>>> > the
> > >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have
> it
> > a
> > >>>> > general
> > >>>> > > >> > > >>>>>>> capability, we
> > >>>> > > >> > > >>>>>>> > > >> should
> > >>>> > > >> > > >>>>>>> > > >> > > also agree on the right wording. Does
> > >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then
> > >>>> actually
> > >>>> > > >> meant?
> > >>>> > > >> > (I
> > >>>> > > >> > > >>>>>>> see it as
> > >>>> > > >> > > >>>>>>> > > >> okay.
> > >>>> > > >> > > >>>>>>> > > >> > I
> > >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst
> Arvid
> > >>>> > suggested
> > >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
> > >>>> > enable/disable
> > >>>> > > >> split
> > >>>> > > >> > > >>>>>>> alignment?
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > I would personally not disable the
> split
> > >>>> > alignment
> > >>>> > > >> on
> > >>>> > > >> > > the
> > >>>> > > >> > > >>>>>>> source
> > >>>> > > >> > > >>>>>>> > > >> reader
> > >>>> > > >> > > >>>>>>> > > >> > > side because if split alignment is
> used
> > >>>> for
> > >>>> > some
> > >>>> > > >> other
> > >>>> > > >> > > >>>>>>> use case
> > >>>> > > >> > > >>>>>>> > (see
> > >>>> > > >> > > >>>>>>> > > >> A)
> > >>>> > > >> > > >>>>>>> > > >> > it
> > >>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
> > >>>> other/future
> > >>>> > use
> > >>>> > > >> > cases.
> > >>>> > > >> > > >>>>>>> Instead,
> > >>>> > > >> > > >>>>>>> > I
> > >>>> > > >> > > >>>>>>> > > >> would
> > >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split alignment"
> > where
> > >>>> I
> > >>>> > think
> > >>>> > > >> it
> > >>>> > > >> > > >>>>>>> should
> > >>>> > > >> > > >>>>>>> > disable
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for split
> > >>>> > alignment.
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX
> > method?
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define the
> scope
> > >>>> of a
> > >>>> > > >> supportsX
> > >>>> > > >> > > >>>>>>> method
> > >>>> > > >> > > >>>>>>> > w.r.t.
> > >>>> > > >> > > >>>>>>> > > >> to
> > >>>> > > >> > > >>>>>>> > > >> > > the following questions: a) Where is
> it
> > >>>> used?
> > >>>> > and
> > >>>> > > >> b)
> > >>>> > > >> > > What
> > >>>> > > >> > > >>>>>>> is the
> > >>>> > > >> > > >>>>>>> > > >> expected
> > >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not
> straight-forward
> > >>>> to
> > >>>> > > >> provide a
> > >>>> > > >> > > >>>>>>> meaningful
> > >>>> > > >> > > >>>>>>> > > >> output,
> > >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split
> > >>>> alignment
> > >>>> > but
> > >>>> > > >> > > >>>>>>> SplitReader
> > >>>> > > >> > > >>>>>>> > not.
> > >>>> > > >> > > >>>>>>> > > >> This
> > >>>> > > >> > > >>>>>>> > > >> > is
> > >>>> > > >> > > >>>>>>> > > >> > > because with the current
> implementation,
> > >>>> we can
> > >>>> > > >> > > determine
> > >>>> > > >> > > >>>>>>> whether
> > >>>> > > >> > > >>>>>>> > > >> split
> > >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported only
> during
> > >>>> > runtime
> > >>>> > > >> and
> > >>>> > > >> > > >>>>>>> specifically
> > >>>> > > >> > > >>>>>>> > > >> > actually
> > >>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down the
> > >>>> call
> > >>>> > > >> hierarchy
> > >>>> > > >> > up
> > >>>> > > >> > > >>>>>>> to the
> > >>>> > > >> > > >>>>>>> > > actual
> > >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either
> > >>>> raise an
> > >>>> > > >> error or
> > >>>> > > >> > > >>>>>>> warning if
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not supported
> at
> > >>>> some
> > >>>> > > >> point. I
> > >>>> > > >> > > >>>>>>> know we
> > >>>> > > >> > > >>>>>>> > > should
> > >>>> > > >> > > >>>>>>> > > >> > > carefully think about when this could
> be
> > >>>> the
> > >>>> > case
> > >>>> > > >> > > because
> > >>>> > > >> > > >>>>>>> we don't
> > >>>> > > >> > > >>>>>>> > > >> want
> > >>>> > > >> > > >>>>>>> > > >> > to
> > >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings.
> > However,
> > >>>> > warnings
> > >>>> > > >> > > could
> > >>>> > > >> > > >>>>>>> be an
> > >>>> > > >> > > >>>>>>> > > >> indicator
> > >>>> > > >> > > >>>>>>> > > >> > > for the user that for watermark split
> > >>>> > alignment use
> > >>>> > > >> > case
> > >>>> > > >> > > >>>>>>> split
> > >>>> > > >> > > >>>>>>> > > >> reading is
> > >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to
> either
> > >>>> > disable
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> trigger for
> > >>>> > > >> > > >>>>>>> > > >> > watermark
> > >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
> > >>>> > use/implement a
> > >>>> > > >> > source
> > >>>> > > >> > > >>>>>>> and
> > >>>> > > >> > > >>>>>>> > reader
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the discussion
> > >>>> with the
> > >>>> > the
> > >>>> > > >> > > various
> > >>>> > > >> > > >>>>>>> > > >> possibilities
> > >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and
> emails, I
> > >>>> would
> > >>>> > > >> like to
> > >>>> > > >> > > >>>>>>> summarize
> > >>>> > > >> > > >>>>>>> > the
> > >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> > >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
> > >>>> > ("symmetric"),
> > >>>> > > >> i.e.,
> > >>>> > > >> > > >>>>>>> similar
> > >>>> > > >> > > >>>>>>> > > >> semantics
> > >>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces with
> > >>>> similar
> > >>>> > usage.
> > >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which implementations
> > >>>> implement
> > >>>> > > >> > > >>>>>>> interfaces/support
> > >>>> > > >> > > >>>>>>> > > >> > > behavior.
> > >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
> > >>>> implementations
> > >>>> > and
> > >>>> > > >> how
> > >>>> > > >> > to
> > >>>> > > >> > > >>>>>>> implement
> > >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the
> > relations
> > >>>> > between
> > >>>> > > >> > > >>>>>>> relevant
> > >>>> > > >> > > >>>>>>> > classes
> > >>>> > > >> > > >>>>>>> > > of
> > >>>> > > >> > > >>>>>>> > > >> > the
> > >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> > >>>> SourceReaderBase
> > >>>> > > >> (Internal)
> > >>>> > > >> > > >>>>>>> <|-- ..
> > >>>> > > >> > > >>>>>>> > <|--
> > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>--
> SplitFetcherManager
> > >>>> > (Internal)
> > >>>> > > >> > <>--
> > >>>> > > >> > > >>>>>>> > SplitFetcher
> > >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public)
> > <|--
> > >>>> > > >> > MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>--
> B:
> > A
> > >>>> "has
> > >>>> > a"
> > >>>> > > >> B)
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> > >>>> > SplitFetcherManager
> > >>>> > > >> > > >>>>>>> implement most
> > >>>> > > >> > > >>>>>>> > of
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just
> because
> > >>>> we
> > >>>> > wanted
> > >>>> > > >> to
> > >>>> > > >> > > >>>>>>> implement
> > >>>> > > >> > > >>>>>>> > > split
> > >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be
> available
> > as
> > >>>> > kind of
> > >>>> > > >> a
> > >>>> > > >> > > >>>>>>> default. As a
> > >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
> > >>>> > implementation" for
> > >>>> > > >> > > >>>>>>> SourceReader
> > >>>> > > >> > > >>>>>>> > > that
> > >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic for split
> > >>>> > alignment.
> > >>>> > > >> For
> > >>>> > > >> > > that
> > >>>> > > >> > > >>>>>>> reason,
> > >>>> > > >> > > >>>>>>> > I
> > >>>> > > >> > > >>>>>>> > > >> find
> > >>>> > > >> > > >>>>>>> > > >> > it
> > >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
> > >>>> > > >> implementation
> > >>>> > > >> > in
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > > interface
> > >>>> > > >> > > >>>>>>> > > >> for
> > >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence,
> > >>>> interface
> > >>>> > > >> strategy
> > >>>> > > >> > > c)
> > >>>> > > >> > > >>>>>>> is
> > >>>> > > >> > > >>>>>>> > > difficult
> > >>>> > > >> > > >>>>>>> > > >> > > because this would require NOOP
> default
> > >>>> > > >> > implementations
> > >>>> > > >> > > >>>>>>> in the
> > >>>> > > >> > > >>>>>>> > > public
> > >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and
> > >>>> SplitReader.
> > >>>> > This is
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> same for
> > >>>> > > >> > > >>>>>>> > > >> strategy
> > >>>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP
> default
> > >>>> > > >> > implementation
> > >>>> > > >> > > >>>>>>> in the
> > >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid
> > described
> > >>>> > method
> > >>>> > > >> > > >>>>>>> signatures of
> > >>>> > > >> > > >>>>>>> > > >> alignSplit
> > >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader
> differ
> > >>>> and it
> > >>>> > > >> would
> > >>>> > > >> > be
> > >>>> > > >> > > >>>>>>> extremely
> > >>>> > > >> > > >>>>>>> > > >> > difficult
> > >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the same (with
> > even
> > >>>> > > >> potential
> > >>>> > > >> > > >>>>>>> performance
> > >>>> > > >> > > >>>>>>> > > >> impact
> > >>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of
> split
> > >>>> ids).
> > >>>> > > >> > Therefore,
> > >>>> > > >> > > >>>>>>> having a
> > >>>> > > >> > > >>>>>>> > > >> > symmetric
> > >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of strategy a)
> > is
> > >>>> > actually
> > >>>> > > >> not
> > >>>> > > >> > > >>>>>>> possible
> > >>>> > > >> > > >>>>>>> > and
> > >>>> > > >> > > >>>>>>> > > >> > having
> > >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with
> different
> > >>>> method
> > >>>> > > >> > > >>>>>>> signatures is
> > >>>> > > >> > > >>>>>>> > > >> confusing.
> > >>>> > > >> > > >>>>>>> > > >> > My
> > >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best with
> > >>>> strategy b)
> > >>>> > > >> which
> > >>>> > > >> > > >>>>>>> means to
> > >>>> > > >> > > >>>>>>> > have
> > >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that
> inherit
> > >>>> from
> > >>>> > the
> > >>>> > > >> > parent
> > >>>> > > >> > > >>>>>>> > interface:
> > >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader,
> > >>>> > SplitReader
> > >>>> > > >> > <|--
> > >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure
> what
> > >>>> the
> > >>>> > > >> > > implications
> > >>>> > > >> > > >>>>>>> are and
> > >>>> > > >> > > >>>>>>> > if
> > >>>> > > >> > > >>>>>>> > > >> this
> > >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest that
> > >>>> Dawid
> > >>>> > and I
> > >>>> > > >> just
> > >>>> > > >> > > >>>>>>> try to
> > >>>> > > >> > > >>>>>>> > > >> implement
> > >>>> > > >> > > >>>>>>> > > >> > and
> > >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > # Summary
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know your
> > >>>> > > >> perspectives.
> > >>>> > > >> > > >>>>>>> Please
> > >>>> > > >> > > >>>>>>> > correct
> > >>>> > > >> > > >>>>>>> > > >> me,
> > >>>> > > >> > > >>>>>>> > > >> > if
> > >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I
> misunderstood
> > >>>> > > >> something. My
> > >>>> > > >> > > >>>>>>> perspective
> > >>>> > > >> > > >>>>>>> > > >> would
> > >>>> > > >> > > >>>>>>> > > >> > be:
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark
> > trigger
> > >>>> for
> > >>>> > split
> > >>>> > > >> > > >>>>>>> alignment)
> > >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> > >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > Best,
> > >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid
> > >>>> > Wysakowicz <
> > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > > wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >
> > >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> > >>>> > approach, I
> > >>>> > > >> > fear
> > >>>> > > >> > > >>>>>>> that it
> > >>>> > > >> > > >>>>>>> > > >> doesn't
> > >>>> > > >> > > >>>>>>> > > >> > work
> > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources
> > can
> > >>>> > always
> > >>>> > > >> be
> > >>>> > > >> > > >>>>>>> aligned
> > >>>> > > >> > > >>>>>>> > > >> cross-task
> > >>>> > > >> > > >>>>>>> > > >> > and
> > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > alignment.
> > >>>> So
> > >>>> > it's
> > >>>> > > >> > > >>>>>>> plausible to put
> > >>>> > > >> > > >>>>>>> > > >> > sources
> > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though
> > they
> > >>>> do
> > >>>> > not
> > >>>> > > >> use
> > >>>> > > >> > any
> > >>>> > > >> > > >>>>>>> of the
> > >>>> > > >> > > >>>>>>> > > >> presented
> > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> > issue a
> > >>>> > > >> warning, if
> > >>>> > > >> > > >>>>>>> they handle
> > >>>> > > >> > > >>>>>>> > > >> > multiple
> > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method
> would
> > >>>> be for
> > >>>> > > >> telling
> > >>>> > > >> > > if
> > >>>> > > >> > > >>>>>>> it
> > >>>> > > >> > > >>>>>>> > > supports
> > >>>> > > >> > > >>>>>>> > > >> > that
> > >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task
> > >>>> alignment
> > >>>> > would
> > >>>> > > >> > always
> > >>>> > > >> > > >>>>>>> be
> > >>>> > > >> > > >>>>>>> > > supported.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I
> believe
> > >>>> to be
> > >>>> > > >> closest
> > >>>> > > >> > > to a
> > >>>> > > >> > > >>>>>>> > consensus
> > >>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do you mind
> > >>>> taking a
> > >>>> > > >> look?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the
> > >>>> nits?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise
> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I
> > fully
> > >>>> agree
> > >>>> > > >> that
> > >>>> > > >> > > >>>>>>> decorator
> > >>>> > > >> > > >>>>>>> > > >> interfaces
> > >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a
> > big
> > >>>> fan
> > >>>> > of
> > >>>> > > >> > > >>>>>>> overloading the
> > >>>> > > >> > > >>>>>>> > > base
> > >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to
> implement
> > as
> > >>>> > is). The
> > >>>> > > >> > > usual
> > >>>> > > >> > > >>>>>>> feedback
> > >>>> > > >> > > >>>>>>> > > to
> > >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are always
> > that
> > >>>> > they are
> > >>>> > > >> > > >>>>>>> overwhelming
> > >>>> > > >> > > >>>>>>> > and
> > >>>> > > >> > > >>>>>>> > > >> too
> > >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also
> > not
> > >>>> > oppose
> > >>>> > > >> 1c)
> > >>>> > > >> > as
> > >>>> > > >> > > >>>>>>> scattered
> > >>>> > > >> > > >>>>>>> > > >> > interfaces
> > >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike
> > >>>> 1a) and
> > >>>> > 1d).
> > >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> > >>>> > approach, I
> > >>>> > > >> > fear
> > >>>> > > >> > > >>>>>>> that it
> > >>>> > > >> > > >>>>>>> > > >> doesn't
> > >>>> > > >> > > >>>>>>> > > >> > work
> > >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources
> > can
> > >>>> > always
> > >>>> > > >> be
> > >>>> > > >> > > >>>>>>> aligned
> > >>>> > > >> > > >>>>>>> > > >> cross-task
> > >>>> > > >> > > >>>>>>> > > >> > and
> > >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> > alignment.
> > >>>> So
> > >>>> > it's
> > >>>> > > >> > > >>>>>>> plausible to put
> > >>>> > > >> > > >>>>>>> > > >> > sources
> > >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though
> > they
> > >>>> do
> > >>>> > not
> > >>>> > > >> use
> > >>>> > > >> > any
> > >>>> > > >> > > >>>>>>> of the
> > >>>> > > >> > > >>>>>>> > > >> presented
> > >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> > issue a
> > >>>> > > >> warning, if
> > >>>> > > >> > > >>>>>>> they handle
> > >>>> > > >> > > >>>>>>> > > >> > multiple
> > >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to
> > >>>> facilitate
> > >>>> > future
> > >>>> > > >> use
> > >>>> > > >> > > >>>>>>> cases makes
> > >>>> > > >> > > >>>>>>> > > >> sense
> > >>>> > > >> > > >>>>>>> > > >> > but
> > >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is
> chosen)
> > >>>> should
> > >>>> > be
> > >>>> > > >> > > adjusted
> > >>>> > > >> > > >>>>>>> > > >> accordingly.
> > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> > >>>> > PausingSourceReader
> > >>>> > > >> and
> > >>>> > > >> > > I'd
> > >>>> > > >> > > >>>>>>> go for
> > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's
> proposal
> > >>>> > afaik). We
> > >>>> > > >> > could
> > >>>> > > >> > > >>>>>>> also
> > >>>> > > >> > > >>>>>>> > split
> > >>>> > > >> > > >>>>>>> > > it
> > >>>> > > >> > > >>>>>>> > > >> > into
> > >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> > >>>> > > >> pauseOrResumeSplits
> > >>>> > > >> > > >>>>>>> may allow
> > >>>> > > >> > > >>>>>>> > > >> Sources
> > >>>> > > >> > > >>>>>>> > > >> > to
> > >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls
> > (as
> > >>>> > written
> > >>>> > > >> in
> > >>>> > > >> > > the
> > >>>> > > >> > > >>>>>>> > Javadoc),
> > >>>> > > >> > > >>>>>>> > > >> both
> > >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm
> > not
> > >>>> > sure if
> > >>>> > > >> > there
> > >>>> > > >> > > >>>>>>> is a
> > >>>> > > >> > > >>>>>>> > system
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >> can.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> > >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the
> > >>>> capability
> > >>>> > > >> > section:
> > >>>> > > >> > > >>>>>>> Sources
> > >>>> > > >> > > >>>>>>> > that
> > >>>> > > >> > > >>>>>>> > > >> > adopt
> > >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in
> > earlier
> > >>>> > > >> versions. So
> > >>>> > > >> > > it
> > >>>> > > >> > > >>>>>>> feels
> > >>>> > > >> > > >>>>>>> > like
> > >>>> > > >> > > >>>>>>> > > >> we
> > >>>> > > >> > > >>>>>>> > > >> > are
> > >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old sources
> > can
> > >>>> be
> > >>>> > used
> > >>>> > > >> > after
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > change);
> > >>>> > > >> > > >>>>>>> > > >> but
> > >>>> > > >> > > >>>>>>> > > >> > I
> > >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API
> addition.
> > >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what happens
> > >>>> when all
> > >>>> > > >> splits
> > >>>> > > >> > > are
> > >>>> > > >> > > >>>>>>> paused.
> > >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how the 3
> > >>>> flavors
> > >>>> > of
> > >>>> > > >> > > >>>>>>> SourceReaderBase
> > >>>> > > >> > > >>>>>>> > > >> > interact
> > >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> > >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to
> > >>>> include
> > >>>> > Kafka
> > >>>> > > >> and
> > >>>> > > >> > > >>>>>>> Pulsar in
> > >>>> > > >> > > >>>>>>> > the
> > >>>> > > >> > > >>>>>>> > > >> > FLIP.
> > >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate
> > >>>> follow-up
> > >>>> > work.
> > >>>> > > >> > (could
> > >>>> > > >> > > >>>>>>> be in the
> > >>>> > > >> > > >>>>>>> > > >> same
> > >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM
> Dawid
> > >>>> > Wysakowicz
> > >>>> > > >> <
> > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > >>>> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > >>>> SourceReader,
> > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > >>>> > > >> > > >>>>>>> > > >> along
> > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > >>>> SplitReader,
> > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > >>>> > > >> > > >>>>>>> > > or
> > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > >>>> > AlignedSourceReader"
> > >>>> > > >> > and
> > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > >>>> SourceReader"
> > >>>> > and
> > >>>> > > >> > > >>>>>>> "MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> > implements
> > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according
> > to
> > >>>> Dawid
> > >>>> > > >> would
> > >>>> > > >> > > be:
> > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > >>>> SourceReader"
> > >>>> > and
> > >>>> > > >> > > >>>>>>> "MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> > implements
> > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> > >>>> personally
> > >>>> > > >> > speaking I
> > >>>> > > >> > > >>>>>>> think it
> > >>>> > > >> > > >>>>>>> > is
> > >>>> > > >> > > >>>>>>> > > >> > simpler
> > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
> > >>>> reader
> > >>>> > with
> > >>>> > > >> > > >>>>>>> default impl.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to have
> it
> > >>>> > aligned as
> > >>>> > > >> > much
> > >>>> > > >> > > as
> > >>>> > > >> > > >>>>>>> > possible.
> > >>>> > > >> > > >>>>>>> > > >> I'd
> > >>>> > > >> > > >>>>>>> > > >> > be
> > >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
> > >>>> > > >> > AlignedSplitReader
> > >>>> > > >> > > >>>>>>> with
> > >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> > >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin
> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align
> > other
> > >>>> > things in
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> future?
> > >>>> > > >> > > >>>>>>> > For
> > >>>> > > >> > > >>>>>>> > > >> > example,
> > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> > >>>> limits. This
> > >>>> > > >> > feature
> > >>>> > > >> > > >>>>>>> may also
> > >>>> > > >> > > >>>>>>> > > need
> > >>>> > > >> > > >>>>>>> > > >> an
> > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > >>>> interface to
> > >>>> > > >> support
> > >>>> > > >> > > >>>>>>> align other
> > >>>> > > >> > > >>>>>>> > > >> > things, I
> > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > >>>> > > >> > > >>>>>>> > > >> instead
> > >>>> > > >> > > >>>>>>> > > >> > of
> > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it
> > >>>> would
> > >>>> > > >> usually
> > >>>> > > >> > be
> > >>>> > > >> > > >>>>>>> more
> > >>>> > > >> > > >>>>>>> > > >> extensible
> > >>>> > > >> > > >>>>>>> > > >> > to
> > >>>> > > >> > > >>>>>>> > > >> > >> just describe what the method
> actually
> > >>>> does,
> > >>>> > > >> instead
> > >>>> > > >> > of
> > >>>> > > >> > > >>>>>>> assuming
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For example,
> in
> > >>>> this
> > >>>> > case,
> > >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible because this
> > >>>> can be
> > >>>> > used
> > >>>> > > >> for
> > >>>> > > >> > > >>>>>>> any kind of
> > >>>> > > >> > > >>>>>>> > > >> flow
> > >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark alignment or
> > >>>> simple
> > >>>> > rate
> > >>>> > > >> > > >>>>>>> limiting.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on
> this.
> > I
> > >>>> think
> > >>>> > > >> they
> > >>>> > > >> > > have
> > >>>> > > >> > > >>>>>>> their
> > >>>> > > >> > > >>>>>>> > own
> > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> Interfaces
> > >>>> for
> > >>>> > > >> extending
> > >>>> > > >> > > >>>>>>> abilities
> > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > TableSource/TableSink,
> > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
> > >>>> methods in
> > >>>> > > >> this
> > >>>> > > >> > > >>>>>>> case. When
> > >>>> > > >> > > >>>>>>> > you
> > >>>> > > >> > > >>>>>>> > > >> have
> > >>>> > > >> > > >>>>>>> > > >> > a
> > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces
> > can
> > >>>> help
> > >>>> > to
> > >>>> > > >> > > >>>>>>> organize them
> > >>>> > > >> > > >>>>>>> > and
> > >>>> > > >> > > >>>>>>> > > >> make
> > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to
> have
> > >>>> an
> > >>>> > > >> ability.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if
> > >>>> this is
> > >>>> > a
> > >>>> > > >> > general
> > >>>> > > >> > > >>>>>>> design
> > >>>> > > >> > > >>>>>>> > > >> pattern
> > >>>> > > >> > > >>>>>>> > > >> > in
> > >>>> > > >> > > >>>>>>> > > >> > >> the other components in Flink. But it
> > >>>> looks
> > >>>> > like
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> current API
> > >>>> > > >> > > >>>>>>> > > >> > proposal
> > >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is essentially
> > >>>> > > >> "MySourceReader
> > >>>> > > >> > > >>>>>>> implements
> > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment",
> > >>>> along with
> > >>>> > > >> > > >>>>>>> "MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> implements
> > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd
> > >>>> > consider any
> > >>>> > > >> > one
> > >>>> > > >> > > >>>>>>> of the
> > >>>> > > >> > > >>>>>>> > > >> following
> > >>>> > > >> > > >>>>>>> > > >> > as
> > >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> > >>>> SourceReader,
> > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > >>>> > > >> > > >>>>>>> > > >> along
> > >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> > >>>> SplitReader,
> > >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> > >>>> > > >> > > >>>>>>> > > or
> > >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > >>>> > AlignedSourceReader"
> > >>>> > > >> > and
> > >>>> > > >> > > >>>>>>> > > "MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> > >>>> SourceReader"
> > >>>> > and
> > >>>> > > >> > > >>>>>>> "MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> > implements
> > >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according
> > to
> > >>>> Dawid
> > >>>> > > >> would
> > >>>> > > >> > > be:
> > >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> > >>>> SourceReader"
> > >>>> > and
> > >>>> > > >> > > >>>>>>> "MySplitReader
> > >>>> > > >> > > >>>>>>> > > >> > implements
> > >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> > >>>> personally
> > >>>> > > >> > speaking I
> > >>>> > > >> > > >>>>>>> think it
> > >>>> > > >> > > >>>>>>> > is
> > >>>> > > >> > > >>>>>>> > > >> > simpler
> > >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
> > >>>> reader
> > >>>> > with
> > >>>> > > >> > > >>>>>>> default impl.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> > >>>> dwysakowicz@apache.org> <
> > >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > > >> <
> > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > >>>> dwysakowicz@apache.org
> > >>>> > >,
> > >>>> > > >> > thanks
> > >>>> > > >> > > >>>>>>> for the
> > >>>> > > >> > > >>>>>>> > > reply.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> > >>>> preference
> > >>>> > > >> and I
> > >>>> > > >> > > >>>>>>> agree most
> > >>>> > > >> > > >>>>>>> > of
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I
> am
> > >>>> fine
> > >>>> > > >> > following
> > >>>> > > >> > > >>>>>>> your
> > >>>> > > >> > > >>>>>>> > > >> suggestion
> > >>>> > > >> > > >>>>>>> > > >> > to
> > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > >>>> > > >> > > >>>>>>> > > >> but
> > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there,
> not
> > >>>> in the
> > >>>> > > >> Source
> > >>>> > > >> > to
> > >>>> > > >> > > >>>>>>> keep the
> > >>>> > > >> > > >>>>>>> > > two
> > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the
> "supportsXXX"
> > >>>> in
> > >>>> > Source
> > >>>> > > >> is
> > >>>> > > >> > > >>>>>>> that this
> > >>>> > > >> > > >>>>>>> > > allows
> > >>>> > > >> > > >>>>>>> > > >> > some
> > >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For example, if a
> > >>>> user
> > >>>> > enabled
> > >>>> > > >> > > >>>>>>> watermark
> > >>>> > > >> > > >>>>>>> > > >> alignment
> > >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by the
> > Source,
> > >>>> an
> > >>>> > > >> exception
> > >>>> > > >> > > >>>>>>> can be
> > >>>> > > >> > > >>>>>>> > thrown
> > >>>> > > >> > > >>>>>>> > > >> at
> > >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general
> > >>>> useful. That
> > >>>> > > >> said,
> > >>>> > > >> > I
> > >>>> > > >> > > >>>>>>> agree that
> > >>>> > > >> > > >>>>>>> > > API
> > >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put
> > the
> > >>>> two
> > >>>> > > >> methods
> > >>>> > > >> > > >>>>>>> together.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark
> > Wu <
> > >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> > >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
> > >>>> > > >> imjark@gmail.com>
> > >>>> > > >> > > >>>>>>> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on
> > FLIP-182.
> > >>>> > Sorry I
> > >>>> > > >> did
> > >>>> > > >> > > >>>>>>> miss it.
> > >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
> > >>>> > Wysakowicz <
> > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> > >>>> dwysakowicz@apache.org>
> > >>>> > <
> > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align
> with
> > >>>> > watermarks
> > >>>> > > >> > when
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > source
> > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > watermark
> > >>>> > > >> alignment
> > >>>> > > >> > > even
> > >>>> > > >> > > >>>>>>> if Kafka
> > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> > >>>> somehow. I
> > >>>> > > >> agree
> > >>>> > > >> > > >>>>>>> with Becket
> > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
> > >>>> users to
> > >>>> > > >> > > configure
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > source
> > >>>> > > >> > > >>>>>>> > > >> to
> > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > >>>> > > >> > > maxDesiredWatermark?
> > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > >>>> maxDesiredWatermark
> > >>>> > will
> > >>>> > > >> > > >>>>>>> greatly affect
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume.
> > Can
> > >>>> users
> > >>>> > > >> > > configure
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > > >> alignment
> > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous
> FLIP[1]
> > >>>> which
> > >>>> > has
> > >>>> > > >> > been
> > >>>> > > >> > > >>>>>>> already
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> implemented
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled
> > >>>> with the
> > >>>> > > >> > > watermark
> > >>>> > > >> > > >>>>>>> strategy
> > >>>> > > >> > > >>>>>>> > > >> which
> > >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and update
> > >>>> interval.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > >>>> interface to
> > >>>> > > >> support
> > >>>> > > >> > > >>>>>>> align other
> > >>>> > > >> > > >>>>>>> > > >> things,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > >>>> > > >> > > >>>>>>> > > >> instead
> > >>>> > > >> > > >>>>>>> > > >> > of
> > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the other
> > >>>> hand
> > >>>> > putting
> > >>>> > > >> > all
> > >>>> > > >> > > >>>>>>> methods,
> > >>>> > > >> > > >>>>>>> > > even
> > >>>> > > >> > > >>>>>>> > > >> > with
> > >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling
> > >>>> certain
> > >>>> > > >> features,
> > >>>> > > >> > > >>>>>>> makes the
> > >>>> > > >> > > >>>>>>> > > entry
> > >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new source
> > >>>> higher.
> > >>>> > > >> Instead of
> > >>>> > > >> > > >>>>>>> focusing on
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> basic
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> and required properties of the
> Source,
> > >>>> the
> > >>>> > person
> > >>>> > > >> > > >>>>>>> implementing a
> > >>>> > > >> > > >>>>>>> > > >> source
> > >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to figure
> out
> > >>>> what
> > >>>> > all
> > >>>> > > >> of
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> extra
> > >>>> > > >> > > >>>>>>> > > >> features
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> are
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with them. It
> > >>>> makes it
> > >>>> > also
> > >>>> > > >> > > harder
> > >>>> > > >> > > >>>>>>> to
> > >>>> > > >> > > >>>>>>> > > organize
> > >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark
> said.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> > >>>> preference
> > >>>> > > >> and I
> > >>>> > > >> > > >>>>>>> agree most
> > >>>> > > >> > > >>>>>>> > of
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I
> am
> > >>>> fine
> > >>>> > > >> > following
> > >>>> > > >> > > >>>>>>> your
> > >>>> > > >> > > >>>>>>> > > >> suggestion
> > >>>> > > >> > > >>>>>>> > > >> > to
> > >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > >>>> > > >> > > >>>>>>> > > >> but
> > >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there,
> not
> > >>>> in the
> > >>>> > > >> Source
> > >>>> > > >> > to
> > >>>> > > >> > > >>>>>>> keep the
> > >>>> > > >> > > >>>>>>> > > two
> > >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really
> > unfortunate
> > >>>> the
> > >>>> > > >> > > >>>>>>> "alignSplits"
> > >>>> > > >> > > >>>>>>> > methods
> > >>>> > > >> > > >>>>>>> > > >> > differ
> > >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and
> > >>>> SpitReader. The
> > >>>> > > >> reason
> > >>>> > > >> > > for
> > >>>> > > >> > > >>>>>>> that is
> > >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with
> > >>>> SplitIds,
> > >>>> > whereas
> > >>>> > > >> > > >>>>>>> SplitReader
> > >>>> > > >> > > >>>>>>> > > needs
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I found
> > the
> > >>>> > > >> discrepancy
> > >>>> > > >> > > >>>>>>> acceptable
> > >>>> > > >> > > >>>>>>> > for
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> > >>>> significantly,
> > >>>> > > >> especially
> > >>>> > > >> > > as
> > >>>> > > >> > > >>>>>>> they
> > >>>> > > >> > > >>>>>>> > would
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> highly
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we would
> > >>>> have to
> > >>>> > > >> perform
> > >>>> > > >> > > >>>>>>> additional
> > >>>> > > >> > > >>>>>>> > > >> > lookups.
> > >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a
> secondary
> > >>>> > interface.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and
> > >>>> Sebastian!
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor questions
> > (maybe I
> > >>>> > missed
> > >>>> > > >> > > >>>>>>> something).
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align
> with
> > >>>> > watermarks
> > >>>> > > >> > when
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > source
> > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> > watermark
> > >>>> > > >> alignment
> > >>>> > > >> > > even
> > >>>> > > >> > > >>>>>>> if Kafka
> > >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> > >>>> somehow. I
> > >>>> > > >> agree
> > >>>> > > >> > > >>>>>>> with Becket
> > >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> > >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
> > >>>> users to
> > >>>> > > >> > > configure
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > source
> > >>>> > > >> > > >>>>>>> > > >> to
> > >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > >>>> > > >> > > maxDesiredWatermark?
> > >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> > >>>> maxDesiredWatermark
> > >>>> > will
> > >>>> > > >> > > >>>>>>> greatly affect
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> throughput
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume.
> > Can
> > >>>> users
> > >>>> > > >> > > configure
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > > >> alignment
> > >>>> > > >> > > >>>>>>> > > >> > >> offset?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align
> > other
> > >>>> > things in
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> future?
> > >>>> > > >> > > >>>>>>> > For
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> example,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> > >>>> limits. This
> > >>>> > > >> > feature
> > >>>> > > >> > > >>>>>>> may also
> > >>>> > > >> > > >>>>>>> > > need
> > >>>> > > >> > > >>>>>>> > > >> an
> > >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> > >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> > >>>> interface to
> > >>>> > > >> support
> > >>>> > > >> > > >>>>>>> align other
> > >>>> > > >> > > >>>>>>> > > >> things,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > >>>> > > >> > > >>>>>>> > > >> instead
> > >>>> > > >> > > >>>>>>> > > >> > of
> > >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on
> this.
> > I
> > >>>> think
> > >>>> > > >> they
> > >>>> > > >> > > have
> > >>>> > > >> > > >>>>>>> their
> > >>>> > > >> > > >>>>>>> > own
> > >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> > >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use
> Interfaces
> > >>>> for
> > >>>> > > >> extending
> > >>>> > > >> > > >>>>>>> abilities
> > >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> > TableSource/TableSink,
> > >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
> > >>>> methods in
> > >>>> > > >> this
> > >>>> > > >> > > >>>>>>> case. When
> > >>>> > > >> > > >>>>>>> > you
> > >>>> > > >> > > >>>>>>> > > >> have
> > >>>> > > >> > > >>>>>>> > > >> > a
> > >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > >>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces
> > can
> > >>>> help
> > >>>> > to
> > >>>> > > >> > > >>>>>>> organize them
> > >>>> > > >> > > >>>>>>> > and
> > >>>> > > >> > > >>>>>>> > > >> make
> > >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to
> have
> > >>>> an
> > >>>> > > >> ability.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >> Jark
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket
> > Qin
> > >>>> <
> > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > >>>> > > >> > > >>>>>>> > <
> > >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> > becket.qin@gmail.com>
> > >>>> <
> > >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> > >>>> > > >> > > >>>>>>> > <
> > >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies
> > >>>> that I
> > >>>> > > >> somehow
> > >>>> > > >> > > >>>>>>> misread a
> > >>>> > > >> > > >>>>>>> > > bunch
> > >>>> > > >> > > >>>>>>> > > >> of
> > >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were
> "assign".
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default
> implementation,
> > >>>> I was
> > >>>> > > >> > thinking
> > >>>> > > >> > > >>>>>>> of the
> > >>>> > > >> > > >>>>>>> > > default
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> no-op
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little worried
> > >>>> about
> > >>>> > the
> > >>>> > > >> > > >>>>>>> proliferation of
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> decorative
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most
> important
> > >>>> thing
> > >>>> > about
> > >>>> > > >> > > >>>>>>> interfaces is
> > >>>> > > >> > > >>>>>>> > > that
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> they
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this
> case, I
> > >>>> prefer
> > >>>> > > >> adding
> > >>>> > > >> > > >>>>>>> new method
> > >>>> > > >> > > >>>>>>> > to
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the following
> > >>>> reasons:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of
> > >>>> decorative
> > >>>> > > >> > interfaces
> > >>>> > > >> > > >>>>>>> is which
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> > combinations
> > >>>> of
> > >>>> > > >> multiple
> > >>>> > > >> > > >>>>>>> decorative
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> > >>>> > > >> > withSplitsAlignment
> > >>>> > > >> > > >>>>>>> interface
> > >>>> > > >> > > >>>>>>> > > is
> > >>>> > > >> > > >>>>>>> > > >> > only
> > >>>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader which
> > >>>> means it
> > >>>> > > >> can't
> > >>>> > > >> > > >>>>>>> decorate any
> > >>>> > > >> > > >>>>>>> > > >> other
> > >>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface design
> > >>>> > perspective, a
> > >>>> > > >> > > >>>>>>> natural
> > >>>> > > >> > > >>>>>>> > question
> > >>>> > > >> > > >>>>>>> > > >> is
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> why
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> > >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > >>>> > > >> > > >>>>>>> > > is
> > >>>> > > >> > > >>>>>>> > > >> > also
> > >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split reader
> > >>>> > implementing
> > >>>> > > >> > both
> > >>>> > > >> > > >>>>>>> > SplitReader
> > >>>> > > >> > > >>>>>>> > > >> and
> > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work,
> because
> > a
> > >>>> > source
> > >>>> > > >> > reader
> > >>>> > > >> > > >>>>>>> > implementing
> > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment
> > >>>> works.
> > >>>> > So why
> > >>>> > > >> > > isn't
> > >>>> > > >> > > >>>>>>> there an
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> interface
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the
> future,
> > if
> > >>>> > there
> > >>>> > > >> is a
> > >>>> > > >> > > new
> > >>>> > > >> > > >>>>>>> feature
> > >>>> > > >> > > >>>>>>> > > >> added
> > >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data
> > >>>> aware),
> > >>>> > are
> > >>>> > > >> we
> > >>>> > > >> > > >>>>>>> going to
> > >>>> > > >> > > >>>>>>> > create
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> another
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
> > >>>> > > >> SortedSplitReader or
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think the
> > >>>> > additional
> > >>>> > > >> > > >>>>>>> decorative
> > >>>> > > >> > > >>>>>>> > > interface
> > >>>> > > >> > > >>>>>>> > > >> > like
> > >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually
> increases
> > >>>> the
> > >>>> > > >> > > understanding
> > >>>> > > >> > > >>>>>>> cost of
> > >>>> > > >> > > >>>>>>> > > >> users
> > >>>> > > >> > > >>>>>>> > > >> > >> because they have to know what
> > decorative
> > >>>> > > >> interfaces
> > >>>> > > >> > > are
> > >>>> > > >> > > >>>>>>> there,
> > >>>> > > >> > > >>>>>>> > > which
> > >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and which
> > >>>> > > >> combinations of
> > >>>> > > >> > > the
> > >>>> > > >> > > >>>>>>> > > decorative
> > >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which are
> not.
> > >>>> > Ideally we
> > >>>> > > >> > want
> > >>>> > > >> > > >>>>>>> to avoid
> > >>>> > > >> > > >>>>>>> > > >> that.
> > >>>> > > >> > > >>>>>>> > > >> > To
> > >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an
> > >>>> > interface of
> > >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> it
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an
> > >>>> internal
> > >>>> > > >> interface
> > >>>> > > >> > > and
> > >>>> > > >> > > >>>>>>> let
> > >>>> > > >> > > >>>>>>> > > >> > SourceReader
> > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the
> > >>>> SourceReader
> > >>>> > with a
> > >>>> > > >> > > default
> > >>>> > > >> > > >>>>>>> > > >> implementation
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> of
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic
> branching
> > >>>> in the
> > >>>> > > >> source
> > >>>> > > >> > > >>>>>>> logic,
> > >>>> > > >> > > >>>>>>> > > >> especially
> > >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast
> > >>>> majority of
> > >>>> > the
> > >>>> > > >> > > >>>>>>> SourceReader
> > >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all, would
> just
> > >>>> extend
> > >>>> > > >> from
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new method to the
> > >>>> > interface
> > >>>> > > >> would
> > >>>> > > >> > > >>>>>>> effectively
> > >>>> > > >> > > >>>>>>> > > >> give
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle that
> may
> > be
> > >>>> > worth
> > >>>> > > >> > > >>>>>>> discussing is how
> > >>>> > > >> > > >>>>>>> > > do
> > >>>> > > >> > > >>>>>>> > > >> we
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> let
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell Flink
> > >>>> what
> > >>>> > > >> capability
> > >>>> > > >> > > is
> > >>>> > > >> > > >>>>>>> > supported
> > >>>> > > >> > > >>>>>>> > > >> and
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> what
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel
> the
> > >>>> most
> > >>>> > > >> intuitive
> > >>>> > > >> > > >>>>>>> place to me
> > >>>> > > >> > > >>>>>>> > > is
> > >>>> > > >> > > >>>>>>> > > >> in
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is the
> > >>>> entrance
> > >>>> > of the
> > >>>> > > >> > > >>>>>>> entire Source
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> connector
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> logic.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am
> > >>>> wondering
> > >>>> > if the
> > >>>> > > >> > > >>>>>>> following
> > >>>> > > >> > > >>>>>>> > > >> interface
> > >>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand by the
> > >>>> users.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to
> > >>>> internal
> > >>>> > > >> interface,
> > >>>> > > >> > > >>>>>>> let both
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with
> a
> > >>>> default
> > >>>> > > >> no-op
> > >>>> > > >> > > >>>>>>> > > implementation.
> > >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > >>>> > > >> supportSplitsAlignment()"
> > >>>> > > >> > > to
> > >>>> > > >> > > >>>>>>> the
> > >>>> > > >> > > >>>>>>> > Source
> > >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> > implementation
> > >>>> > returning
> > >>>> > > >> > > >>>>>>> false. Sources
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> have
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic can
> > >>>> change
> > >>>> > this to
> > >>>> > > >> > > >>>>>>> return true,
> > >>>> > > >> > > >>>>>>> > and
> > >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits() methods in
> > the
> > >>>> > > >> > SourceReader
> > >>>> > > >> > > /
> > >>>> > > >> > > >>>>>>> > > SplitReader
> > >>>> > > >> > > >>>>>>> > > >> if
> > >>>> > > >> > > >>>>>>> > > >> > >> needed.
> > >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new optional
> > >>>> feature is
> > >>>> > > >> going
> > >>>> > > >> > to
> > >>>> > > >> > > >>>>>>> be added
> > >>>> > > >> > > >>>>>>> > to
> > >>>> > > >> > > >>>>>>> > > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires the
> > >>>> > awareness
> > >>>> > > >> from
> > >>>> > > >> > > >>>>>>> Flink, we
> > >>>> > > >> > > >>>>>>> > can
> > >>>> > > >> > > >>>>>>> > > >> add
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> more
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid
> > >>>> > Wysakowicz <
> > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > >>>> dwysakowicz@apache.org>
> > >>>> > <
> > >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > >>>> > dwysakowicz@apache.org>
> > >>>> > > >> <
> > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > >>>> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > >>>> `AlignedSplitReader`
> > >>>> > > >> > > interface
> > >>>> > > >> > > >>>>>>> (aka the
> > >>>> > > >> > > >>>>>>> > > >> stop &
> > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented
> > for
> > >>>> > Kafka and
> > >>>> > > >> > > >>>>>>> Pulsar only,
> > >>>> > > >> > > >>>>>>> > > >> correct?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though,
> those
> > >>>> are
> > >>>> > the
> > >>>> > > >> only
> > >>>> > > >> > > >>>>>>> sources
> > >>>> > > >> > > >>>>>>> > which
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> consume
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits and
> > >>>> thus
> > >>>> > > >> alignment
> > >>>> > > >> > > >>>>>>> applies.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > alignment
> > >>>> in
> > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > >>>> > > >> > > >>>>>>> > > then
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
> > >>>> implement
> > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > >>>> > > >> > > >>>>>>> > > >> could
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> be
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> WithSplitsAlignment
> > >>>> can
> > >>>> > > >> instead
> > >>>> > > >> > be
> > >>>> > > >> > > >>>>>>> added to
> > >>>> > > >> > > >>>>>>> > the
> > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > >>>> > KafkaSourceReader) to
> > >>>> > > >> > make
> > >>>> > > >> > > >>>>>>> it
> > >>>> > > >> > > >>>>>>> > explicit
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I
> think
> > >>>> we
> > >>>> > could
> > >>>> > > >> > > >>>>>>> actually do
> > >>>> > > >> > > >>>>>>> > that.
> > >>>> > > >> > > >>>>>>> > > >> Given
> > >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
> > >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > >>>> > > >> > > >>>>>>> > > >> rather
> > >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the
> > >>>> corresponding
> > >>>> > method
> > >>>> > > >> of
> > >>>> > > >> > > >>>>>>> > SplitFetcher),
> > >>>> > > >> > > >>>>>>> > > >> we
> > >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the actual
> > source
> > >>>> > > >> > > >>>>>>> implementations. This
> > >>>> > > >> > > >>>>>>> > > >> solution
> > >>>> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors of
> new
> > >>>> > sources
> > >>>> > > >> would
> > >>>> > > >> > > >>>>>>> have to do
> > >>>> > > >> > > >>>>>>> > > two
> > >>>> > > >> > > >>>>>>> > > >> > >> things: extend from
> AlignedSplitReader
> > >>>> and
> > >>>> > > >> implement
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> > >>>> AlignedSplitReader.
> > >>>> > I
> > >>>> > > >> would
> > >>>> > > >> > > be
> > >>>> > > >> > > >>>>>>> fine
> > >>>> > > >> > > >>>>>>> > with
> > >>>> > > >> > > >>>>>>> > > >> such
> > >>>> > > >> > > >>>>>>> > > >> > a
> > >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> > >>>> section, is
> > >>>> > it
> > >>>> > > >> > > >>>>>>> accurate? Let's
> > >>>> > > >> > > >>>>>>> > > >> assume
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> one
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> partitions
> > >>>> and
> > >>>> > one of
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> partition
> > >>>> > > >> > > >>>>>>> > is
> > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > situation,
> > >>>> > watermark
> > >>>> > > >> > for
> > >>>> > > >> > > >>>>>>> this
> > >>>> > > >> > > >>>>>>> > source
> > >>>> > > >> > > >>>>>>> > > >> task
> > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading
> recent
> > >>>> data
> > >>>> > from
> > >>>> > > >> > other
> > >>>> > > >> > > >>>>>>> two Kafka
> > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't
> hold
> > >>>> back
> > >>>> > the
> > >>>> > > >> > overall
> > >>>> > > >> > > >>>>>>> > watermark.
> > >>>> > > >> > > >>>>>>> > > I
> > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may
> have
> > >>>> late
> > >>>> > data
> > >>>> > > >> for
> > >>>> > > >> > > >>>>>>> this slow
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark.
> > >>>> Watermark of
> > >>>> > an
> > >>>> > > >> > > >>>>>>> operator is the
> > >>>> > > >> > > >>>>>>> > > >> minimum
> > >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> > >>>> restart. Say
> > >>>> > > >> split
> > >>>> > > >> > > >>>>>>> alignment is
> > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed.
> job
> > >>>> > failed and
> > >>>> > > >> > > >>>>>>> restored from
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > >> > last
> > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> decision
> > >>>> is not
> > >>>> > > >> > > >>>>>>> checkpointed,
> > >>>> > > >> > > >>>>>>> > > initially
> > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we
> > get
> > >>>> a
> > >>>> > cycle
> > >>>> > > >> of
> > >>>> > > >> > > >>>>>>> watermark
> > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying
> this
> > >>>> > corner is
> > >>>> > > >> a
> > >>>> > > >> > > >>>>>>> problem. Just
> > >>>> > > >> > > >>>>>>> > > >> want
> > >>>> > > >> > > >>>>>>> > > >> > to
> > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is
> sort
> > >>>> of a
> > >>>> > > >> general
> > >>>> > > >> > use
> > >>>> > > >> > > >>>>>>> case, so
> > >>>> > > >> > > >>>>>>> > > >> should
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> we
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > >>>> SourceReader
> > >>>> > > >> directly
> > >>>> > > >> > > >>>>>>> instead of
> > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > >>>> > > >> WithSplitAssignment?
> > >>>> > > >> > > We
> > >>>> > > >> > > >>>>>>> can
> > >>>> > > >> > > >>>>>>> > > provide
> > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> > >>>> > > >> compatibility
> > >>>> > > >> > > >>>>>>> won't be an
> > >>>> > > >> > > >>>>>>> > > >> issue.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a
> default
> > >>>> > > >> > implementation.
> > >>>> > > >> > > >>>>>>> How would
> > >>>> > > >> > > >>>>>>> > we
> > >>>> > > >> > > >>>>>>> > > >> do
> > >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it
> > >>>> better
> > >>>> > than
> > >>>> > > >> > > having
> > >>>> > > >> > > >>>>>>> an
> > >>>> > > >> > > >>>>>>> > opt-in
> > >>>> > > >> > > >>>>>>> > > >> > >> interface? The default implementation
> > >>>> would
> > >>>> > have
> > >>>> > > >> to
> > >>>> > > >> > be
> > >>>> > > >> > > >>>>>>> added
> > >>>> > > >> > > >>>>>>> > > >> exclusively
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> in
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By
> > >>>> the way
> > >>>> > > >> notice
> > >>>> > > >> > > >>>>>>> > > SourceReaderBase
> > >>>> > > >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment,
> > so
> > >>>> > > >> effectively
> > >>>> > > >> > > all
> > >>>> > > >> > > >>>>>>> > > >> implementations
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> do
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be
> > honest I
> > >>>> > think
> > >>>> > > >> it is
> > >>>> > > >> > > >>>>>>> impossible
> > >>>> > > >> > > >>>>>>> > to
> > >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader interface
> > >>>> directly
> > >>>> > by
> > >>>> > > >> end
> > >>>> > > >> > > >>>>>>> users.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
> > >>>> interface
> > >>>> > > >> > probably
> > >>>> > > >> > > >>>>>>> also
> > >>>> > > >> > > >>>>>>> > needs
> > >>>> > > >> > > >>>>>>> > > >> some
> > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the
> > split
> > >>>> > > >> > granularity.
> > >>>> > > >> > > >>>>>>> Can you
> > >>>> > > >> > > >>>>>>> > add
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> > >>>> interface
> > >>>> > > >> section as
> > >>>> > > >> > > >>>>>>> well?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the beginning.
> > See
> > >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method
> > >>>> name
> > >>>> > > >> > assignSplits
> > >>>> > > >> > > >>>>>>> here,
> > >>>> > > >> > > >>>>>>> > given
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> it
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> > >>>> > assignments? It
> > >>>> > > >> > > seems
> > >>>> > > >> > > >>>>>>> > something
> > >>>> > > >> > > >>>>>>> > > >> like
> > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > >>>> > > >> adjustSplitsThrottling() is
> > >>>> > > >> > > >>>>>>> more
> > >>>> > > >> > > >>>>>>> > > accurate.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*,
> not
> > >>>> > assign. Do
> > >>>> > > >> you
> > >>>> > > >> > > >>>>>>> still
> > >>>> > > >> > > >>>>>>> > prefer
> > >>>> > > >> > > >>>>>>> > > a
> > >>>> > > >> > > >>>>>>> > > >> > >> different name for that? Personally,
> I
> > am
> > >>>> > open for
> > >>>> > > >> > > >>>>>>> suggestions
> > >>>> > > >> > > >>>>>>> > > here.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > >
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> >
> > >>>>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin
> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort,
> > >>>> Sebastion. I
> > >>>> > think
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> motivation
> > >>>> > > >> > > >>>>>>> > > >> makes a
> > >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions
> /
> > >>>> > questions.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is
> sort
> > >>>> of a
> > >>>> > > >> general
> > >>>> > > >> > use
> > >>>> > > >> > > >>>>>>> case, so
> > >>>> > > >> > > >>>>>>> > > >> should
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> we
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> > >>>> SourceReader
> > >>>> > > >> directly
> > >>>> > > >> > > >>>>>>> instead of
> > >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > >>>> > > >> WithSplitAssignment?
> > >>>> > > >> > > We
> > >>>> > > >> > > >>>>>>> can
> > >>>> > > >> > > >>>>>>> > > provide
> > >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> > >>>> > > >> compatibility
> > >>>> > > >> > > >>>>>>> won't be an
> > >>>> > > >> > > >>>>>>> > > >> issue.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
> > >>>> interface
> > >>>> > > >> > probably
> > >>>> > > >> > > >>>>>>> also
> > >>>> > > >> > > >>>>>>> > needs
> > >>>> > > >> > > >>>>>>> > > >> some
> > >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the
> > split
> > >>>> > > >> > granularity.
> > >>>> > > >> > > >>>>>>> Can you
> > >>>> > > >> > > >>>>>>> > add
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> > >>>> interface
> > >>>> > > >> section as
> > >>>> > > >> > > >>>>>>> well?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method
> > >>>> name
> > >>>> > > >> > assignSplits
> > >>>> > > >> > > >>>>>>> here,
> > >>>> > > >> > > >>>>>>> > given
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> it
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> > >>>> > assignments? It
> > >>>> > > >> > > seems
> > >>>> > > >> > > >>>>>>> > something
> > >>>> > > >> > > >>>>>>> > > >> like
> > >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > >>>> > > >> adjustSplitsThrottling() is
> > >>>> > > >> > > >>>>>>> more
> > >>>> > > >> > > >>>>>>> > > accurate.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM
> Steven
> > >>>> Wu <
> > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> > > <
> > >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> > stevenz3wu@gmail.com>
> > >>>> <
> > >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > >>>> > > >> > > >>>>>>> > <
> > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> However, a single source operator may
> > >>>> read
> > >>>> > data
> > >>>> > > >> from
> > >>>> > > >> > > >>>>>>> multiple
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple
> Kafka
> > >>>> > > >> partitions,
> > >>>> > > >> > > such
> > >>>> > > >> > > >>>>>>> that
> > >>>> > > >> > > >>>>>>> > even
> > >>>> > > >> > > >>>>>>> > > >> with
> > >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source
> operator
> > >>>> may
> > >>>> > need
> > >>>> > > >> to
> > >>>> > > >> > > >>>>>>> buffer
> > >>>> > > >> > > >>>>>>> > > excessive
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> amount
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data
> faster
> > >>>> than
> > >>>> > > >> another.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> > >>>> section, is
> > >>>> > it
> > >>>> > > >> > > >>>>>>> accurate? Let's
> > >>>> > > >> > > >>>>>>> > > >> assume
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> one
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3
> partitions
> > >>>> and
> > >>>> > one of
> > >>>> > > >> the
> > >>>> > > >> > > >>>>>>> partition
> > >>>> > > >> > > >>>>>>> > is
> > >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> > situation,
> > >>>> > watermark
> > >>>> > > >> > for
> > >>>> > > >> > > >>>>>>> this
> > >>>> > > >> > > >>>>>>> > source
> > >>>> > > >> > > >>>>>>> > > >> task
> > >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading
> recent
> > >>>> data
> > >>>> > from
> > >>>> > > >> > other
> > >>>> > > >> > > >>>>>>> two Kafka
> > >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't
> hold
> > >>>> back
> > >>>> > the
> > >>>> > > >> > overall
> > >>>> > > >> > > >>>>>>> > watermark.
> > >>>> > > >> > > >>>>>>> > > I
> > >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may
> have
> > >>>> late
> > >>>> > data
> > >>>> > > >> for
> > >>>> > > >> > > >>>>>>> this slow
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> partition.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> > >>>> restart. Say
> > >>>> > > >> split
> > >>>> > > >> > > >>>>>>> alignment is
> > >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed.
> job
> > >>>> > failed and
> > >>>> > > >> > > >>>>>>> restored from
> > >>>> > > >> > > >>>>>>> > > the
> > >>>> > > >> > > >>>>>>> > > >> > last
> > >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment
> decision
> > >>>> is not
> > >>>> > > >> > > >>>>>>> checkpointed,
> > >>>> > > >> > > >>>>>>> > > initially
> > >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we
> > get
> > >>>> a
> > >>>> > cycle
> > >>>> > > >> of
> > >>>> > > >> > > >>>>>>> watermark
> > >>>> > > >> > > >>>>>>> > > >> > aggregation
> > >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying
> this
> > >>>> > corner is
> > >>>> > > >> a
> > >>>> > > >> > > >>>>>>> problem. Just
> > >>>> > > >> > > >>>>>>> > > >> want
> > >>>> > > >> > > >>>>>>> > > >> > to
> > >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM
> Thomas
> > >>>> Weise <
> > >>>> > > >> > > >>>>>>> thw@apache.org> <
> > >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
> > >>>> thw@apache.org>
> > >>>> > <
> > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> > alignment
> > >>>> in
> > >>>> > > >> > > >>>>>>> SourceReaderBase and
> > >>>> > > >> > > >>>>>>> > > then
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> doing
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
> > >>>> implement
> > >>>> > > >> > > >>>>>>> AlignedSplitReader
> > >>>> > > >> > > >>>>>>> > > >> could
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> be
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps
> WithSplitsAlignment
> > >>>> can
> > >>>> > > >> instead
> > >>>> > > >> > be
> > >>>> > > >> > > >>>>>>> added to
> > >>>> > > >> > > >>>>>>> > the
> > >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > >>>> > KafkaSourceReader) to
> > >>>> > > >> > make
> > >>>> > > >> > > >>>>>>> it
> > >>>> > > >> > > >>>>>>> > explicit
> > >>>> > > >> > > >>>>>>> > > >> that
> > >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> > >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM
> > >>>> Konstantin
> > >>>> > Knauf <
> > >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> <
> > >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org>
> <
> > >>>> > > >> > knaufk@apache.org
> > >>>> > > >> > > >
> > >>>> > > >> > > >>>>>>> <
> > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> > >>>> `AlignedSplitReader`
> > >>>> > > >> > > interface
> > >>>> > > >> > > >>>>>>> (aka the
> > >>>> > > >> > > >>>>>>> > > >> stop
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> &
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented
> > for
> > >>>> > Kafka and
> > >>>> > > >> > > >>>>>>> Pulsar only,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> correct?
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is
> valuable
> > >>>> to
> > >>>> > > >> complete
> > >>>> > > >> > the
> > >>>> > > >> > > >>>>>>> watermark
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> aligned
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM
> Dawid
> > >>>> > Wysakowicz
> > >>>> > > >> <
> > >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> > >>>> dwysakowicz@apache.org>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on it,
> I
> > >>>> > support it
> > >>>> > > >> ;)
> > >>>> > > >> > I
> > >>>> > > >> > > >>>>>>> think we
> > >>>> > > >> > > >>>>>>> > can
> > >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there
> > >>>> are no
> > >>>> > > >> concerns
> > >>>> > > >> > > so
> > >>>> > > >> > > >>>>>>> far.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian
> Mattheis
> > >>>> wrote:
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion on
> > >>>> FLIP 217
> > >>>> > [1]
> > >>>> > > >> for
> > >>>> > > >> > > an
> > >>>> > > >> > > >>>>>>> > extension
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> of
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform
> > alignment
> > >>>> also
> > >>>> > in
> > >>>> > > >> > > >>>>>>> SplitReaders. To
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> do
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> so,
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend
> > and
> > >>>> > resume
> > >>>> > > >> > reading
> > >>>> > > >> > > >>>>>>> from
> > >>>> > > >> > > >>>>>>> > split
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> sources
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates
> > and
> > >>>> > controlls
> > >>>> > > >> > > >>>>>>> suspend and
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> resume.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> To
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> gather information about current
> > >>>> watermarks
> > >>>> > of the
> > >>>> > > >> > > >>>>>>> SplitReaders,
> > >>>> > > >> > > >>>>>>> > we
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> extend
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> the internal
> WatermarkOutputMulitplexer
> > >>>> and
> > >>>> > report
> > >>>> > > >> > > >>>>>>> watermarks to
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> the
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2],
> > >>>> prototyped
> > >>>> > by
> > >>>> > > >> Arvid
> > >>>> > > >> > > >>>>>>> Heise and
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> revised
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> and
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did
> > >>>> most of
> > >>>> > the
> > >>>> > > >> > work.)
> > >>>> > > >> > > >>>>>>> and me.
> > >>>> > > >> > > >>>>>>> > The
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> changes
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way
> that
> > if
> > >>>> > affected
> > >>>> > > >> > > >>>>>>> components do
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> not
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the behavior
> is
> > >>>> as
> > >>>> > before.
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Best,
> > >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> [1]
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > >
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> >
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> [2]
> > >>>> > > >> > > >>>>>>>
> > >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> --
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> twitter.com/snntrablehttps://github.com/knaufk
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> > >>
> > >>>> > > >> > > >>>>>>> > > >> >
> > >>>> > > >> > > >>>>>>> > > >>
> > >>>> > > >> > > >>>>>>> > > >
> > >>>> > > >> > > >>>>>>> > >
> > >>>> > > >> > > >>>>>>> >
> > >>>> > > >> > > >>>>>>>
> > >>>> > > >> > > >>>>>>
> > >>>> > > >> > >
> > >>>> > > >> >
> > >>>> > > >>
> > >>>> > > >
> > >>>> >
> > >>>>
> > >>>
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi Becket,

> My main concern of having a supportsPausingSplits() knob

What is the problem with `supportsPausingSplits()` that you see?  Do you
want to remove it?

Also I don't understand your proposal for Solution 1. How do you want to
decide whether to throw an exception? For that we would need to have
`supportsPausingSplits()`, right?

If so, I would prefer solution 2, to provide a graceful migration path for
any users that are already using FLIP-182 with multiple splits per
operator. I don't think there are many of those, but such a flag seems easy
to implement while making migration easier. Having said that, technically
we could change the behaviour and start throwing an exception always in
such a case, as this feature is marked as Experimental.

Best,
Piotrek

śr., 29 cze 2022 o 02:54 Becket Qin <be...@gmail.com> napisał(a):

> Hi Sebastian,
>
> Regarding the question,
>
> >
> > @Becket: I'm not sure about the intention of solution 1. Can you explain
> > that a bit more? In particular, I don't understand: "The reasoning behind
> > this solution is that existing users should only use the
> > coarse watermark alignment when a source reader only reads from a single
> > split." Why should a user not use coarse watermark alignment when source
> > reader reads from multiple splits? The split alignment uses the "coarse
> > watermark", i.e., maxDesiredWatermark, as described in the FLIP for
> > alignment.
>
>
> Imagine you have a source reader reading from two splits, and the
> watermarks look like the following:
> 1. Watermark of Split 1: 10:00 AM Jun 29,
> 2. Watermark of Split 2: 11:00 AM Jun 29
> 3. maxDesiredWatermark:10:30 AM Jun 29
>
> At this point, the source reader's watermark is 10:00 AM which is lower
> than the maxDesiredWatermark, so the source reader won't be paused from
> reading. However, because the source reader cannot specify which split to
> read from, if it continues to read, the watermark gap between the two
> splits may become even bigger. This essentially fails the main purpose of
> watermark alignment - to reduce the number of records buffered in the
> state. This does not necessarily happen, but this is not what FLIP-182 was
> designed for to begin with. So I'd rather avoid extending the feature
> to that case.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <
> sebastian@ververica.com>
> wrote:
>
> > (Sorry I didn't send to the list but only to Becket. My bad and thanks
> > Piotr. Next attempt:)
> >
> > Hi all,
> >
> > Thanks for pushing the FLIP. I would drive it and would be happy to get
> > back to you, @Thomas, for reviews. (Sorry for low responsiveness, there
> > were several efforts with high priority on my side ...) As next step, I
> > would revise the FLIP to get the discussion concluded.
> >
> > However, as Becket mentioned I feel that some things are still not clear
> > yet:
> >
> > Re: Thomas
> >>
> >> However, from a user perspective, should the split level alignment be
> >>> an opt-in feature, at least for a few releases? If yes, then we would
> >>> require a knob similar to supportsPausingSplits(), which I understand
> >>> won't be part of the revised FLIP. Such control may be beneficial:
> >>
> >>
> >>> * Compare runtime behavior with split level alignment on/off
> >>> * Allow use of sources that don't implement pausing splits yet
> >>
> >>
> >>> The second point would, from the user's perspective, be necessary for
> >>> backward compatibility? While the interface aspect and source
> >>> compatibility has been discussed in great detail, I don't think it
> >>> would be desirable if an application that already uses alignment fails
> >>> after upgrading to the new Flink version, forcing users to lock step
> >>> modify sources for the new non-optional split level alignment.
> >>
> >>
> >>> So I think clarification of the compatibility aspect on the FLIP page
> >>> would be necessary.
> >>
> >>
> >> That is a good point. Currently FLIP-182 is marked as experimental. So
> >> technically speaking it could change. That said, I agree that it would
> be
> >> good to not break the existing sources.
> >>
> >> My main concern of having a supportsPausingSplits() knob is that this
> >> might be something requiring code change on future derepcation. I am
> >> thinking of two potential solutions:
> >>
> >> Solution 1:
> >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
> >> source reader only has one split assigned, the source reader simply
> stops
> >> polling but just returns NOTHING_AVAILABLE. If there are more than one
> >> splits assigned, it throws an exception with a message such as "The
> >> unpausable SplitReader CLASS_NAME only works with watermark alignment
> >> when assigned a single split. There are more than one split assigned to
> the
> >> SplitReader".
> >> The reasoning behind this solution is that existing users should only
> use
> >> the coarse watermark alignment when a source reader only reads from a
> >> single split. Reading from more than one split might have unwanted side
> >> effects, so we might as well throw an exception in this case.
> >>
> >> Solution 2:
> >> Having a configuration "enable.coarse-grained.watermark.alignment", the
> >> default value is false. Once it is set to true, we will allow
> >> coarse-grained watermark alignment if a SplitReader is pausable.
> >> This solution allows users to keep the current FLIP-182 behavior, with
> >> the risk of side effects.
> >>
> >> Personally speaking, I feel solution 1 seems better because
> >> coarse-grained watermark alignment could be frustrating to the users
> >> when more than one split is assigned. So we might as well not support
> it at
> >> all. And also there is nothing to deprecate in the future with this
> >> solution.
> >>
> >> What do you think?
> >>
> >
> > @Thomas: My understanding is that you intend a simple switch/knob to test
> > w/ and w/o (split) watermark alignment, right? Isn't the coarse-grained
> w/
> > vs w/o watermark alignment sufficient for that? Or do you think that
> > switching watermark aligment explicitly on split level is required?
> >
> > @Becket: I'm not sure about the intention of solution 1. Can you explain
> > that a bit more? In particular, I don't understand: "The reasoning behind
> > this solution is that existing users should only use the coarse watermark
> > alignment when a source reader only reads from a single split." Why
> > should a user not use coarse watermark alignment when source reader reads
> > from multiple splits? The split alignment uses the "coarse watermark",
> > i.e., maxDesiredWatermark, as described in the FLIP for alignment.
> >
> > Could you please clarify?
> >
> > Regards,
> > Sebastian
> >
> > On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <be...@gmail.com> wrote:
> >
> >> Thanks for the feedback, Thomas and Steve. And thanks Piotr for the
> >> patient and detailed discussion.
> >>
> >> Let's move forward with option 1 then.
> >>
> >> Re: Thomas
> >>
> >> However, from a user perspective, should the split level alignment be
> >>> an opt-in feature, at least for a few releases? If yes, then we would
> >>> require a knob similar to supportsPausingSplits(), which I understand
> >>> won't be part of the revised FLIP. Such control may be beneficial:
> >>
> >>
> >>> * Compare runtime behavior with split level alignment on/off
> >>> * Allow use of sources that don't implement pausing splits yet
> >>
> >>
> >>> The second point would, from the user's perspective, be necessary for
> >>> backward compatibility? While the interface aspect and source
> >>> compatibility has been discussed in great detail, I don't think it
> >>> would be desirable if an application that already uses alignment fails
> >>> after upgrading to the new Flink version, forcing users to lock step
> >>> modify sources for the new non-optional split level alignment.
> >>
> >>
> >>> So I think clarification of the compatibility aspect on the FLIP page
> >>> would be necessary.
> >>
> >>
> >> That is a good point. Currently FLIP-182 is marked as experimental. So
> >> technically speaking it could change. That said, I agree that it would
> be
> >> good to not break the existing sources.
> >>
> >> My main concern of having a supportsPausingSplits() knob is that this
> >> might be something requiring code change on future derepcation. I am
> >> thinking of two potential solutions:
> >>
> >> Solution 1:
> >> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
> >> source reader only has one split assigned, the source reader simply
> stops
> >> polling but just returns NOTHING_AVAILABLE. If there are more than one
> >> splits assigned, it throws an exception with a message such as "The
> >> unpausable SplitReader CLASS_NAME only works with watermark alignment
> when
> >> assigned a single split. There are more than one split assigned to the
> >> SplitReader".
> >> The reasoning behind this solution is that existing users should only
> use
> >> the coarse watermark alignment when a source reader only reads from a
> >> single split. Reading from more than one split might have unwanted side
> >> effects, so we might as well throw an exception in this case.
> >>
> >> Solution 2:
> >> Having a configuration "enable.coarse-grained.watermark.alignment", the
> >> default value is false. Once it is set to true, we will allow
> >> coarse-grained watermark alignment if a SplitReader is pausable.
> >> This solution allows users to keep the current FLIP-182 behavior, with
> >> the risk of side effects.
> >>
> >> Personally speaking, I feel solution 1 seems better because
> >> coarse-grained watermark alignment could be frustrating to the users
> when
> >> more than one split is assigned. So we might as well not support it at
> all.
> >> And also there is nothing to deprecate in the future with this solution.
> >>
> >> What do you think?
> >>
> >> Thank,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <pn...@apache.org>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> It looks like option 1 wins overall? So let's go with that.
> >>>
> >>> Best,
> >>> Piotrek
> >>>
> >>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com> napisał(a):
> >>>
> >>>> Both option 1 (default impl in base interface) and option 2
> (decorative
> >>>> interface) are pretty common patterns. I would also be fine with
> either.
> >>>> The important thing is that an exception is thrown if a source doesn't
> >>>> support the alignment capability.
> >>>>
> >>>> The other point is that we can validate the source capability if
> >>>> alignment
> >>>> is enabled in WatermarkStrategy. I believe either option can achieve
> >>>> this
> >>>> goal too.
> >>>> public interface WatermarkStrategy<T> {
> >>>>     WatermarkStrategy<T> withWatermarkAlignment(String watermarkGroup,
> >>>> Duration maxAllowedWatermarkDrift);
> >>>> }
> >>>>
> >>>> If I have to pick one, I am slightly favoring option 1 (base
> >>>> interface). As
> >>>> watermark is already an essential concept of source, maybe watermark
> >>>> alignment capability can also be a property of the base source/reader
> >>>> interface.
> >>>>
> >>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org> wrote:
> >>>>
> >>>> > Hi everyone,
> >>>> >
> >>>> > Thank you for all the effort that went into this discussion. The
> split
> >>>> > level watermark alignment will be an important feature for Flink
> that
> >>>> > will address operational problems for various use cases. From
> reading
> >>>> > through this thread it appears that not too much remains to bring
> this
> >>>> > FLIP to acceptance and allow development to move forward. I would
> like
> >>>> > to contribute if possible.
> >>>> >
> >>>> > Regarding option 1 vs. option 2: I don't have a strong preference,
> >>>> > perhaps slightly leaning towards option 1.
> >>>> >
> >>>> > However, from a user perspective, should the split level alignment
> be
> >>>> > an opt-in feature, at least for a few releases? If yes, then we
> would
> >>>> > require a knob similar to supportsPausingSplits(), which I
> understand
> >>>> > won't be part of the revised FLIP. Such control may be beneficial:
> >>>> >
> >>>> > * Compare runtime behavior with split level alignment on/off
> >>>> > * Allow use of sources that don't implement pausing splits yet
> >>>> >
> >>>> > The second point would, from the user's perspective, be necessary
> for
> >>>> > backward compatibility? While the interface aspect and source
> >>>> > compatibility has been discussed in great detail, I don't think it
> >>>> > would be desirable if an application that already uses alignment
> fails
> >>>> > after upgrading to the new Flink version, forcing users to lock step
> >>>> > modify sources for the new non-optional split level alignment.
> >>>> >
> >>>> > So I think clarification of the compatibility aspect on the FLIP
> page
> >>>> > would be necessary.
> >>>> >
> >>>> > Thanks,
> >>>> > Thomas
> >>>> >
> >>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
> >>>> piotr.nowojski@gmail.com>
> >>>> > wrote:
> >>>> > >
> >>>> > > Hi Becket,
> >>>> > >
> >>>> > > Thanks for summing this up. Just one correction:
> >>>> > >
> >>>> > > > Piotr prefers option 2, his opinions are:
> >>>> > > >   e) It is OK that the code itself in option 2 indicates the
> >>>> developers
> >>>> > > that a feature is optional. We will rely on the documentation to
> >>>> correct
> >>>> > > that and clarify that the feature is actually obligatory.
> >>>> > >
> >>>> > > I would say based on a) and b) that feature would be still
> >>>> optional. So
> >>>> > > both the implementation and the documentation would be saying
> that.
> >>>> We
> >>>> > > could add a mention to the docs and release notes, that this
> >>>> feature will
> >>>> > > be obligatory in the next major release and plan such a release
> >>>> > accordingly.
> >>>> > >
> >>>> > > Re the option 1., as you mentioned:
> >>>> > > > As for option 1: For developers, the feature is still optional
> >>>> due to
> >>>> > the
> >>>> > > default implementation in the interface, regardless of what the
> >>>> default
> >>>> > > implementation does, because the code compiles without overriding
> >>>> these
> >>>> > > methods
> >>>> > >
> >>>> > > Also importantly, the code will work in most cases.
> >>>> > >
> >>>> > > > Obligatory: Jobs may fail if these methods are not implemented
> >>>> > properly.
> >>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common pattern
> in
> >>>> > Java,
> >>>> > > e.g. Iterator.remove() by default throws
> >>>> "UnsupportedOperationException",
> >>>> > > informing the implementation that things may go wrong if this
> >>>> method is
> >>>> > not
> >>>> > > implemented.
> >>>> > >
> >>>> > > For me `Iterator#remove()` is an optional feature. Personally, I
> >>>> don't
> >>>> > > remember if I have ever implemented it.
> >>>> > >
> >>>> > > Best,
> >>>> > > Piotrek
> >>>> > >
> >>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com>
> >>>> napisał(a):
> >>>> > >
> >>>> > > > I had an offline discussion with Piotr and here is the summary.
> >>>> Please
> >>>> > > > correct me if I miss something, Piotr.
> >>>> > > >
> >>>> > > > There are two things we would like to seek more opinions from
> the
> >>>> > > > community, so we can make progress on this FLIP.
> >>>> > > >
> >>>> > > > 1. The General pattern to add obligatory features to existing
> >>>> > interfaces.
> >>>> > > >
> >>>> > > >
> >>>> >
> >>>>
> ***********************************************************************************
> >>>> > > > For interfaces exposed to the developers for implementation,
> they
> >>>> are
> >>>> > > > either intended to be *optional* or *obligatory. *While it is
> >>>> quite
> >>>> > clear
> >>>> > > > about how to convey that intention when creating the interfaces,
> >>>> it is
> >>>> > not
> >>>> > > > as commonly agreed when we are adding new features to an
> existing
> >>>> > > > interface. In general, Flink uses decorative interfaces when
> >>>> adding
> >>>> > > > optional features to existing interfaces. Both Piotr and I agree
> >>>> that
> >>>> > looks
> >>>> > > > good.
> >>>> > > >
> >>>> > > > Different opinions are mainly about how to add obligatory
> >>>> features to
> >>>> > the
> >>>> > > > existing interfaces, probably due to different understandings of
> >>>> > > > "obligatory".
> >>>> > > >
> >>>> > > > We have discussed about four options:
> >>>> > > >
> >>>> > > > *Option 1:*
> >>>> > > >
> >>>> > > >    - Just add a new method to the existing interface.
> >>>> > > >    - For backwards compatibility, the method would have a
> default
> >>>> > > >    implementation throwing "UnsupportedOperationException".
> >>>> > > >    - In the next major version, remove the default
> implementation.
> >>>> > > >    - For the developers, any method with a default
> implementation
> >>>> > > >    throwing an "UnsupportedOperationException" should be taken
> as
> >>>> > obligatory.
> >>>> > > >
> >>>> > > > *Option 2:*
> >>>> > > >
> >>>> > > >    - Always make the features optional by adding a decorative
> >>>> > interface,
> >>>> > > >    just like ordinary optional features.
> >>>> > > >    - Inform the developers via documentation that this feature
> is
> >>>> > > >    obligatory, although it looks like optional from the code.
> >>>> > > >    - In case the developers did not implement the decorative
> >>>> interface,
> >>>> > > >    throw an exception
> >>>> > > >    - In the next major version, move the methods in the
> decorative
> >>>> > > >    interface to the base interface, and deprecate the decorative
> >>>> > interface.
> >>>> > > >
> >>>> > > > *Option 3:*
> >>>> > > >
> >>>> > > >    - Always bump the major version when a new obligatory feature
> >>>> is
> >>>> > > >    added, even if we may have to do it frequently.
> >>>> > > >
> >>>> > > > *Option 4:*
> >>>> > > >
> >>>> > > >    - Add a V2, V3... of the interface affected by the new
> >>>> obligatory
> >>>> > > >    feature.
> >>>> > > >    - In the next major versions, deprecate old versions of the
> >>>> > interfaces.
> >>>> > > >
> >>>> > > > Both Piotr and me agreed that option 3 and option 4 have a big
> >>>> side
> >>>> > effect
> >>>> > > > and should be avoided. We have different preference between
> >>>> option 1
> >>>> > and
> >>>> > > > option 2.
> >>>> > > >
> >>>> > > > Personally I prefer option 1, the reasons are:
> >>>> > > >   a) simple and intuitive. Java 8 introduced the default impl in
> >>>> > > > interfaces exactly for interface evolving, and this is a common
> >>>> > pattern in
> >>>> > > > many projects.
> >>>> > > >   b) prominent to the developers that the feature is expected to
> >>>> be
> >>>> > > > implemented, because it explicitly throws an exception in the
> >>>> default
> >>>> > impl.
> >>>> > > >   c) low maintenance overhead - the Flink framework can always
> >>>> assume
> >>>> > the
> >>>> > > > method exists, so no special handling logic is needed.
> >>>> > > >   d) communicate a clear semantic boundary between optional and
> >>>> > obligatory
> >>>> > > > features in the Flink to the developers.
> >>>> > > >       - Optional: Jobs still run without exception if these
> >>>> methods are
> >>>> > > > not implemented. e.g. all the SupportsXXXPushDown interfaces.
> >>>> > > >       - Obligatory: Jobs may fail if these methods are not
> >>>> implemented
> >>>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a
> >>>> common
> >>>> > pattern
> >>>> > > > in Java, e.g. Iterator.remove() by default throws
> >>>> > > > "UnsupportedOperationException", informing the implementation
> that
> >>>> > things
> >>>> > > > may go wrong if this method is not implemented.
> >>>> > > >
> >>>> > > > As for option 2, Although the API itself sounds clean, it
> misleads
> >>>> > people
> >>>> > > > to think of an obligatory feature to be optional - from the code
> >>>> the
> >>>> > > > feature is optional, but the documents say it is obligatory. We
> >>>> > probably
> >>>> > > > should avoid such code-doc inconsistency, as people will be
> >>>> confused.
> >>>> > And I
> >>>> > > > would actually be bewildered that sometimes not implementing an
> >>>> > "optional"
> >>>> > > > feature is fine, but sometimes it causes the jobs to fail.
> >>>> > > >
> >>>> > > > In response to the argument that the method with a default
> >>>> > implementation
> >>>> > > > is always optional, if that is true, it actually means all the
> >>>> > interfaces
> >>>> > > > should be immutable once they are created. If we want to add a
> >>>> method
> >>>> > to an
> >>>> > > > existing interface, for backwards compatibility, we will have to
> >>>> > provide a
> >>>> > > > default implementation. And the fact it has a default
> >>>> implementation
> >>>> > > > indicates the method is optional. If that method is optional, it
> >>>> should
> >>>> > > > reside in a separate decorative interface, otherwise it clogs
> that
> >>>> > existing
> >>>> > > > interface. Therefore, people should never add a method to an
> >>>> existing
> >>>> > > > interface. I find this conclusion a bit extreme.
> >>>> > > >
> >>>> > > > Piotr prefers option 2, his opinions are:
> >>>> > > >     a) Obligatory methods are the methods that fail the code
> >>>> > compilation
> >>>> > > > if not implemented.
> >>>> > > >     b) All obligatory methods should reside in the base
> interface,
> >>>> > without
> >>>> > > > a default implementation. And all the optional methods should be
> >>>> in
> >>>> > > > decorative interfaces. This is a clean API.
> >>>> > > >     c) due to b), there isn't a viable solution to add an
> >>>> obligatory
> >>>> > > > method to an existing interface in a backwards compatible way.
> >>>> Unless
> >>>> > we
> >>>> > > > are OK with breaking backwards compatibility, all the interfaces
> >>>> > should be
> >>>> > > > treated as immutable. As a compromise, we might as well just
> >>>> treat all
> >>>> > the
> >>>> > > > features added later as optional features. This way we keep the
> >>>> API
> >>>> > clean.
> >>>> > > >     d) based on b) and c), option 2 has a clean API, while
> option
> >>>> 1
> >>>> > does
> >>>> > > > not.
> >>>> > > >     e) It is OK that the code itself in option 2 indicates the
> >>>> > developers
> >>>> > > > that a feature is optional. We will rely on the documentation to
> >>>> > correct
> >>>> > > > that and clarify that the feature is actually obligatory.
> >>>> > > >     f) Regarding the effectiveness of making people aware that
> the
> >>>> > feature
> >>>> > > > is obligatory, Option 1 and Option 2 are similar. For people
> that
> >>>> do
> >>>> > not
> >>>> > > > read the release note / documentation, they will mistake the
> >>>> feature
> >>>> > to be
> >>>> > > > optional anyways.
> >>>> > > >
> >>>> > > > As for option 1: For developers, the feature is still optional
> >>>> due to
> >>>> > the
> >>>> > > > default implementation in the interface, regardless of what the
> >>>> default
> >>>> > > > implementation does, because the code compiles without
> overriding
> >>>> these
> >>>> > > > methods. Also, another problem of this option is that for users
> >>>> that
> >>>> > do not
> >>>> > > > know about the history of the interface, they may be confused by
> >>>> the
> >>>> > > > default implementation throwing an exception.
> >>>> > > >
> >>>> > > >
> >>>> > > > 2. For this particular FLIP, should it be optional or not?
> >>>> > > >
> >>>> > > >
> >>>> >
> >>>>
> ***********************************************************************************
> >>>> > > > As mentioned in the previous email, I feel this FLIP should be
> >>>> > obligatory,
> >>>> > > > for the following reasons:
> >>>> > > > 1. The Flink framework exposes the watermark alignment API to
> the
> >>>> end
> >>>> > > > users. From the end users' perspective, the feature should be
> >>>> available
> >>>> > > > regardless of the implementation details in the pluggables. This
> >>>> is
> >>>> > true
> >>>> > > > for any other methods exposed as the Flink API.
> >>>> > > > 2. If a Source is not pausable, the end user should receive an
> >>>> > exception
> >>>> > > > when enable the watermark alignment, (both Piotr and me agree on
> >>>> > this). In
> >>>> > > > that case, it meets my criteria of obligatory feature because
> not
> >>>> > > > implementing the feature causes a framework API to throw
> >>>> exception and
> >>>> > > > fails the job.
> >>>> > > >
> >>>> > > > On the other hand, Piotr does not have a strong opinion
> regarding
> >>>> > whether
> >>>> > > > this feature should be optional or not.
> >>>> > > >
> >>>> > > >
> >>>> > > > Thanks for reading through this long email. So basically in
> order
> >>>> to
> >>>> > make
> >>>> > > > progress on this FLIP, we want to see what do people feel about
> >>>> the
> >>>> > above
> >>>> > > > two topics.
> >>>> > > >
> >>>> > > > Thanks,
> >>>> > > >
> >>>> > > > Jiangjie (Becket) Qin
> >>>> > > >
> >>>> > > >
> >>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
> >>>> pnowojski@apache.org>
> >>>> > > > wrote:
> >>>> > > >
> >>>> > > >> Hi Becket,
> >>>> > > >>
> >>>> > > >> I still sustain what I wrote before:
> >>>> > > >> > I think I would still vote soft -1 on this option, but I
> >>>> wouldn't
> >>>> > block
> >>>> > > >> it in case I am out-voted.
> >>>> > > >>
> >>>> > > >> > I think it might be helpful to agree on the definition of
> >>>> optional
> >>>> > in
> >>>> > > >> our
> >>>> > > >> case.
> >>>> > > >>
> >>>> > > >> For me it doesn't matter whether a default method throwing an
> >>>> > exception we
> >>>> > > >> call optional or non-optional. As long as we keep it this way,
> >>>> the
> >>>> > effect
> >>>> > > >> is the same. It's effectively a method that a user doesn't have
> >>>> to
> >>>> > > >> implement. If interface/system allows some methods to be not
> >>>> > implemented,
> >>>> > > >> some users will do just that, regardless if we call it and
> >>>> document as
> >>>> > > >> non-optional. And at the same time it's clogging the base
> >>>> interface.
> >>>> > > >>
> >>>> > > >> By the way, just the need for a java-doc/documentation
> >>>> explaining the
> >>>> > > >> existence of some construct is a bad smell (code should be
> >>>> > > >> self-documenting
> >>>> > > >> and default method throwing an UnsupportedOperationException is
> >>>> not).
> >>>> > > >>
> >>>> > > >> > Please note that so far we do not assume whether the feature
> >>>> is in
> >>>> > > >> > the original API or it is added later. A newly added feature
> >>>> can
> >>>> > also be
> >>>> > > >> > non-optional, although it might take some time for all the
> >>>> pluggable
> >>>> > > >> > developers to catch up, and they should still work if the new
> >>>> > feature is
> >>>> > > >> > not used until they catch up. In contrast, we may never
> expect
> >>>> an
> >>>> > > >> optional
> >>>> > > >> > feature to catch up, because leaving it unimplemented is also
> >>>> > blessed.
> >>>> > > >> >
> >>>> > > >> > Let's take the checkpointing as an example. Imagine Flink did
> >>>> not
> >>>> > > >> support
> >>>> > > >> > checkpointing before release 1.16. And now we are trying to
> add
> >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> >>>> configuration
> >>>> > to
> >>>> > > >> the
> >>>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
> >>>> > operators,
> >>>> > > >> > connectors) developers that methods like "snapshotState()" is
> >>>> > optional?
> >>>> > > >> If
> >>>> > > >> > we do that, the availability of checkpointing in Flink would
> be
> >>>> > severely
> >>>> > > >> > weakened. But apparently we should still allow the existing
> >>>> > > >> implementations
> >>>> > > >> > to work without checkpointing. It looks to me that adding the
> >>>> > method to
> >>>> > > >> the
> >>>> > > >> > pluggable interfaces with a default implementation throwing
> >>>> > > >> > "UnsupportedOperationException" would be the solution here.
> >>>> Please
> >>>> > note
> >>>> > > >> > that in this case, having the default implementation does not
> >>>> mean
> >>>> > this
> >>>> > > >> is
> >>>> > > >> > optional. It is just the technique to support backwards
> >>>> > compatibility in
> >>>> > > >> > the feature evolution. The fact that this method is in the
> base
> >>>> > > >> interface
> >>>> > > >> > suggests it is not optional, so the developers SHOULD
> >>>> implement it.
> >>>> > > >>
> >>>> > > >> I would soft vote -1 for having the default method throwing
> >>>> > > >> UnsupportedOperationException as one of thing for this
> (FLIP-217)
> >>>> > special
> >>>> > > >> circumstances.
> >>>> > > >>
> >>>> > > >> At the moment, without thinking this over too much, I would
> vote
> >>>> > harder -1
> >>>> > > >> for having this as a general rule when adding new features. If
> we
> >>>> > ever end
> >>>> > > >> up with an API that is littered with default methods throwing
> >>>> > > >> UnsupportedOperationException that are documented as "non
> >>>> optional" it
> >>>> > > >> would be IMO a big design failure. I would be ok-ish with that,
> >>>> only
> >>>> > if
> >>>> > > >> that was a temporary thing and we had an aggressive plan to
> >>>> release
> >>>> > more
> >>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
> >>>> > > >> compatibility, that would get rid of those default methods.
> >>>> Adding
> >>>> > > >> checkpointing and methods like "snapshotState()" would IMO
> easily
> >>>> > justify
> >>>> > > >> a
> >>>> > > >> new major Flink release. In that case we could add those
> methods
> >>>> with
> >>>> > > >> default implementation for some transition period, a one or two
> >>>> minor
> >>>> > > >> releases, followed by a clean up in a major release. However I
> >>>> would
> >>>> > still
> >>>> > > >> argue that it would be cleaner/better to add a decorative
> >>>> interface
> >>>> > like
> >>>> > > >> `CheckpointedOperator` interface instead of adding those
> default
> >>>> > methods
> >>>> > > >> to
> >>>> > > >> the base `Operator` interface.
> >>>> > > >>
> >>>> > > >> I think I can sum up our disagreement here is that I would like
> >>>> to
> >>>> > keep
> >>>> > > >> the
> >>>> > > >> interfaces simpler, with only obligatory methods/features on
> one
> >>>> side
> >>>> > and
> >>>> > > >> clearly optional features on the other. While you would like to
> >>>> add an
> >>>> > > >> extra third state in between those two?
> >>>> > > >>
> >>>> > > >> Best,
> >>>> > > >> Piotrek
> >>>> > > >>
> >>>> > > >>
> >>>> > > >>
> >>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com>
> >>>> > napisał(a):
> >>>> > > >>
> >>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
> >>>> > > >> >
> >>>> > > >> > It looks like the key problem is still whether the
> >>>> implementation of
> >>>> > > >> > pausable splits in the Sources should be optional or not.
> >>>> > > >> >
> >>>> > > >> > I think it might be helpful to agree on the definition of
> >>>> optional
> >>>> > in
> >>>> > > >> our
> >>>> > > >> > case. To me:
> >>>> > > >> > Optional = "You CAN leave the method unimplemented, and that
> is
> >>>> > fine."
> >>>> > > >> > Non-Optional = "You CAN leave the method unimplemented, but
> you
> >>>> > SHOULD
> >>>> > > >> NOT,
> >>>> > > >> > because people assume this works."
> >>>> > > >> >
> >>>> > > >> > I think one sufficient condition of a Non-Optional feature is
> >>>> that
> >>>> > if
> >>>> > > >> the
> >>>> > > >> > feature is exposed through the framework API, Flink should
> >>>> expect
> >>>> > the
> >>>> > > >> > pluggables to support this feature by default. Otherwise the
> >>>> > > >> availability
> >>>> > > >> > of that feature becomes undefined.
> >>>> > > >> >
> >>>> > > >> > Please note that so far we do not assume whether the feature
> >>>> is in
> >>>> > > >> > the original API or it is added later. A newly added feature
> >>>> can
> >>>> > also be
> >>>> > > >> > non-optional, although it might take some time for all the
> >>>> pluggable
> >>>> > > >> > developers to catch up, and they should still work if the new
> >>>> > feature is
> >>>> > > >> > not used until they catch up. In contrast, we may never
> expect
> >>>> an
> >>>> > > >> optional
> >>>> > > >> > feature to catch up, because leaving it unimplemented is also
> >>>> > blessed.
> >>>> > > >> >
> >>>> > > >> > Let's take the checkpointing as an example. Imagine Flink did
> >>>> not
> >>>> > > >> support
> >>>> > > >> > checkpointing before release 1.16. And now we are trying to
> add
> >>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
> >>>> configuration
> >>>> > to
> >>>> > > >> the
> >>>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
> >>>> > operators,
> >>>> > > >> > connectors) developers that methods like "snapshotState()" is
> >>>> > optional?
> >>>> > > >> If
> >>>> > > >> > we do that, the availability of checkpointing in Flink would
> be
> >>>> > severely
> >>>> > > >> > weakened. But apparently we should still allow the existing
> >>>> > > >> implementations
> >>>> > > >> > to work without checkpointing. It looks to me that adding the
> >>>> > method to
> >>>> > > >> the
> >>>> > > >> > pluggable interfaces with a default implementation throwing
> >>>> > > >> > "UnsupportedOperationException" would be the solution here.
> >>>> Please
> >>>> > note
> >>>> > > >> > that in this case, having the default implementation does not
> >>>> mean
> >>>> > this
> >>>> > > >> is
> >>>> > > >> > optional. It is just the technique to support backwards
> >>>> > compatibility in
> >>>> > > >> > the feature evolution. The fact that this method is in the
> base
> >>>> > > >> interface
> >>>> > > >> > suggests it is not optional, so the developers SHOULD
> >>>> implement it.
> >>>> > > >> >
> >>>> > > >> > When it comes to this FLIP, I think it meets the criteria of
> >>>> > > >> non-optional
> >>>> > > >> > features, so we should just use the evolution path of
> >>>> non-optional
> >>>> > > >> > features.
> >>>> > > >> >
> >>>> > > >> > Thanks,
> >>>> > > >> >
> >>>> > > >> > Jiangjie (Becket) Qin
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> >
> >>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> >>>> > pnowojski@apache.org>
> >>>> > > >> > wrote:
> >>>> > > >> >
> >>>> > > >> > > Hi,
> >>>> > > >> > >
> >>>> > > >> > > Actually previously I thought about having a decorative
> >>>> interface
> >>>> > and
> >>>> > > >> > > whenever watermark alignment is enabled, checking that the
> >>>> source
> >>>> > > >> > > implements the decorative interface. If not, throwing an
> >>>> > exception.
> >>>> > > >> > >
> >>>> > > >> > > The option with default methods in the source interfaces
> >>>> throwing
> >>>> > > >> > > `UnsupportedOperationException` I think still suffers from
> >>>> the
> >>>> > same
> >>>> > > >> > > problems I mentioned before. It's still an optional
> >>>> implementation
> >>>> > > >> and at
> >>>> > > >> > > the same time it's clogging the base interface. I think I
> >>>> would
> >>>> > still
> >>>> > > >> > vote
> >>>> > > >> > > soft -1 on this option, but I wouldn't block it in case I
> am
> >>>> > > >> out-voted.
> >>>> > > >> > >
> >>>> > > >> > > Best,
> >>>> > > >> > > Piotrek
> >>>> > > >> > >
> >>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> >>>> > sebastian@ververica.com>
> >>>> > > >> > > napisał(a):
> >>>> > > >> > >
> >>>> > > >> > > > Hi Becket,
> >>>> > > >> > > >
> >>>> > > >> > > > Thanks a lot for your fast and detailed response. For me,
> >>>> it
> >>>> > > >> converges
> >>>> > > >> > > and
> >>>> > > >> > > > dropping the supportsX method sounds very reasonable to
> me.
> >>>> > (Side
> >>>> > > >> note:
> >>>> > > >> > > > With "pausable splits" enabled as "default" I think we
> >>>> > > >> misunderstood.
> >>>> > > >> > As
> >>>> > > >> > > > you described now "default" I understand as that it
> should
> >>>> be
> >>>> > the
> >>>> > > >> new
> >>>> > > >> > > > recommended way of implementation, and I think that is
> >>>> fully
> >>>> > valid.
> >>>> > > >> > > Before,
> >>>> > > >> > > > I understood "default" here as the default
> implementation,
> >>>> i.e.,
> >>>> > > >> > throwing
> >>>> > > >> > > > UnsupportedOperationException, which is the exact
> >>>> opposite. :) )
> >>>> > > >> > > >
> >>>> > > >> > > > Nevertheless: As mentioned, an open question for me is if
> >>>> > watermark
> >>>> > > >> > > > alignment should enforce pausable splits. For
> >>>> clarification, the
> >>>> > > >> > current
> >>>> > > >> > > > documentation [1] says:
> >>>> > > >> > > >
> >>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning across tasks
> >>>> of the
> >>>> > same
> >>>> > > >> > > >> source and/or different sources. It does not support
> >>>> aligning
> >>>> > > >> > > >> splits/partitions/shards in the same task.
> >>>> > > >> > > >>
> >>>> > > >> > > >> In a case where there are e.g. two Kafka partitions that
> >>>> > produce
> >>>> > > >> > > >> watermarks at different pace, that get assigned to the
> >>>> same
> >>>> > task
> >>>> > > >> > > watermark
> >>>> > > >> > > >> might not behave as expected. Fortunately, worst case it
> >>>> > should not
> >>>> > > >> > > perform
> >>>> > > >> > > >> worse than without alignment.
> >>>> > > >> > > >>
> >>>> > > >> > > >> Given the limitation above, we suggest applying
> watermark
> >>>> > > >> alignment in
> >>>> > > >> > > >> two situations:
> >>>> > > >> > > >>
> >>>> > > >> > > >>    1. You have two different sources (e.g. Kafka and
> >>>> File) that
> >>>> > > >> > produce
> >>>> > > >> > > >>    watermarks at different speeds
> >>>> > > >> > > >>    2. You run your source with parallelism equal to the
> >>>> number
> >>>> > of
> >>>> > > >> > > >>    splits/shards/partitions, which results in every
> >>>> subtask
> >>>> > being
> >>>> > > >> > > assigned a
> >>>> > > >> > > >>    single unit of work.
> >>>> > > >> > > >>
> >>>> > > >> > > >> I personally see no issue in implementing and I see no
> >>>> reason
> >>>> > > >> against
> >>>> > > >> > > > implementing this dependency of watermark alignment and
> >>>> pausable
> >>>> > > >> > splits.
> >>>> > > >> > > (I
> >>>> > > >> > > > think this would even be a good path towards shaping
> >>>> watermark
> >>>> > > >> > alignment
> >>>> > > >> > > in
> >>>> > > >> > > > 1.16.) However, "I don't see" means that I would be happy
> >>>> to
> >>>> > hear
> >>>> > > >> > Dawid's
> >>>> > > >> > > > and Piotrek's opinions as they implemented watermark
> >>>> alignment
> >>>> > > >> based on
> >>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
> >>>> > rationale/background
> >>>> > > >> > info
> >>>> > > >> > > > from their side.
> >>>> > > >> > > >
> >>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> >>>> > > >> > > >
> >>>> > > >> > > > Regards,
> >>>> > > >> > > > Sebastian
> >>>> > > >> > > >
> >>>> > > >> > > > [1]
> >>>> > > >> > > >
> >>>> > > >> > >
> >>>> > > >> >
> >>>> > > >>
> >>>> >
> >>>>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> >>>> > > >> > > > [2]
> >>>> > > >> > > >
> >>>> > > >> > >
> >>>> > > >> >
> >>>> > > >>
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> >>>> > > >> > > >
> >>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> >>>> > becket.qin@gmail.com>
> >>>> > > >> > wrote:
> >>>> > > >> > > >
> >>>> > > >> > > >> +dev
> >>>> > > >> > > >>
> >>>> > > >> > > >> Hi Sebastian,
> >>>> > > >> > > >>
> >>>> > > >> > > >> Thank you for the summary. Please see the detailed
> replies
> >>>> > inline.
> >>>> > > >> As
> >>>> > > >> > a
> >>>> > > >> > > >> recap of my suggestions.
> >>>> > > >> > > >>
> >>>> > > >> > > >> 1. Pausable splits API.
> >>>> > > >> > > >>   a) Add default implementations to methods
> >>>> > "pauseOrResumeSplits"
> >>>> > > >> in
> >>>> > > >> > > both
> >>>> > > >> > > >> SourceReader and SplitReader where both default
> >>>> implementations
> >>>> > > >> throw
> >>>> > > >> > > >>  UnsupportedOperationException.
> >>>> > > >> > > >>
> >>>> > > >> > > >> 2. User story.
> >>>> > > >> > > >>     a) We tell users to enable the watermark alignment
> as
> >>>> they
> >>>> > > >> like.
> >>>> > > >> > > This
> >>>> > > >> > > >> is exactly what the current Flink API is.
> >>>> > > >> > > >>     b) We tell the source developers, please implement
> >>>> pausable
> >>>> > > >> > splits,
> >>>> > > >> > > >> otherwise bad things may happen. Think of it like you
> are
> >>>> > expected
> >>>> > > >> to
> >>>> > > >> > > >> implement SourceReader#snapshotState() properly,
> otherwise
> >>>> > > >> exceptions
> >>>> > > >> > > will
> >>>> > > >> > > >> be thrown when users enable checkpointing.
> >>>> > > >> > > >>
> >>>> > > >> > > >> Thanks,
> >>>> > > >> > > >>
> >>>> > > >> > > >> Jiangjie (Becket) Qin
> >>>> > > >> > > >>
> >>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> >>>> > > >> > > >> sebastian@ververica.com> wrote:
> >>>> > > >> > > >>
> >>>> > > >> > > >>> Hi Becket, Hi everybody,
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> I'm sorry if I misread the messages but I could not
> >>>> derive an
> >>>> > > >> > agreement
> >>>> > > >> > > >>> from the mailing list. Nevertheless, if I understand
> you
> >>>> > right the
> >>>> > > >> > > >>> suggestion is:
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> * Add default implementations to methods
> >>>> > "pauseOrResumeSplits" in
> >>>> > > >> > both
> >>>> > > >> > > >>> SourceReader and SplitReader where both default
> >>>> > implementations
> >>>> > > >> throw
> >>>> > > >> > > >>> UnsupportedOperationException.
> >>>> > > >> > > >>>
> >>>> > > >> > > >> Yes.
> >>>> > > >> > > >>
> >>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source
> >>>> interface.
> >>>> > (In
> >>>> > > >> the
> >>>> > > >> > > >>> following, I refer to supporting this as "pausable
> >>>> splits".)
> >>>> > > >> > > >>>
> >>>> > > >> > > >> We may no longer need this if pausable splits are
> >>>> expected to
> >>>> > be
> >>>> > > >> > > >> implemented by the source developers, i.e. non-optional.
> >>>> Having
> >>>> > > >> this
> >>>> > > >> > > method
> >>>> > > >> > > >> would then be somewhat misleading as it looks like the
> >>>> sources
> >>>> > > >> that do
> >>>> > > >> > > not
> >>>> > > >> > > >> support pausable splits are also acceptable in the long
> >>>> term.
> >>>> > So
> >>>> > > >> API
> >>>> > > >> > > wise,
> >>>> > > >> > > >> I'd say maybe we should remove this for this FLIP,
> >>>> although I
> >>>> > > >> believe
> >>>> > > >> > > this
> >>>> > > >> > > >> supportXXX pattern itself is still attractive for
> optional
> >>>> > > >> features.
> >>>> > > >> > > >>
> >>>> > > >> > > >>
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> To make the conclusions explicit:
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in both
> >>>> > interfaces
> >>>> > > >> > > >>> SourceReader and SplitReader are optional where the
> >>>> default is
> >>>> > > >> that
> >>>> > > >> > it
> >>>> > > >> > > >>> doesn't support it. (--> This means that the
> >>>> implementation is
> >>>> > > >> still
> >>>> > > >> > > >>> optional for the source developer.)
> >>>> > > >> > > >>>
> >>>> > > >> > > >> It is optional for backwards compatibility with existing
> >>>> > sources,
> >>>> > > >> as
> >>>> > > >> > > they
> >>>> > > >> > > >> may still compile without code change. But starting from
> >>>> this
> >>>> > FLIP,
> >>>> > > >> > > Flink
> >>>> > > >> > > >> will always optimistically assume that all the sources
> >>>> support
> >>>> > > >> > pausable
> >>>> > > >> > > >> splits. If a source does not support pausable splits, it
> >>>> goes
> >>>> > to an
> >>>> > > >> > > error
> >>>> > > >> > > >> handling path when watermark alignment is enabled on it.
> >>>> This
> >>>> > is
> >>>> > > >> > > different
> >>>> > > >> > > >> from a usual optional feature, where no error is
> expected.
> >>>> > > >> > > >>
> >>>> > > >> > > >>
> >>>> > > >> > > >>> 2. If watermark alignment is enabled in the application
> >>>> code
> >>>> > by
> >>>> > > >> > adding
> >>>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
> >>>> > > >> SourceReader or
> >>>> > > >> > > >>> SplitReader do not support pausableSplits, we throw an
> >>>> > > >> > > >>> UnsupportedOperationException.
> >>>> > > >> > > >>>
> >>>> > > >> > > >> Yes.
> >>>> > > >> > > >>
> >>>> > > >> > > >>
> >>>> > > >> > > >>> 3. With regard to your statement:
> >>>> > > >> > > >>>
> >>>> > > >> > > >>>> [...] basically means watermark alignment is an
> >>>> non-optional
> >>>> > > >> feature
> >>>> > > >> > > to
> >>>> > > >> > > >>>> the end users.
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> You actually mean that "pausable splits" are
> >>>> non-optional for
> >>>> > the
> >>>> > > >> app
> >>>> > > >> > > >>> developer if watermark alignment is enabled. However,
> >>>> > watermark
> >>>> > > >> > > alignment
> >>>> > > >> > > >>> is optional and can be enabled/disabled.
> >>>> > > >> > > >>>
> >>>> > > >> > > >> Yes, watermark alignment can be enabled/disabled in
> >>>> individual
> >>>> > > >> sources
> >>>> > > >> > > in
> >>>> > > >> > > >> Flink jobs, which basically means the code supporting
> >>>> watermark
> >>>> > > >> > > alignment
> >>>> > > >> > > >> has to already be there. That again means the Source
> >>>> > developers are
> >>>> > > >> > also
> >>>> > > >> > > >> expected to support pausable splits by default. So this
> >>>> way we
> >>>> > > >> > > essentially
> >>>> > > >> > > >> tell the end users that you may enable / disable this
> >>>> feature
> >>>> > as
> >>>> > > >> you
> >>>> > > >> > > wish,
> >>>> > > >> > > >> and tell the source developers that you SHOULD implement
> >>>> this
> >>>> > > >> because
> >>>> > > >> > > the
> >>>> > > >> > > >> end users may turn it on/off at will. And if the source
> >>>> does
> >>>> > not
> >>>> > > >> > support
> >>>> > > >> > > >> pausable splits, that goes to an error handling path
> when
> >>>> > watermark
> >>>> > > >> > > >> alignment is enabled on it. So users know they have to
> >>>> > explicitly
> >>>> > > >> > > exclude
> >>>> > > >> > > >> this source.
> >>>> > > >> > > >>
> >>>> > > >> > > >>
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> So far it's totally clear to me and I hope this is what
> >>>> you
> >>>> > mean.
> >>>> > > >> I
> >>>> > > >> > > also
> >>>> > > >> > > >>> agree with both statements:
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> So making that expectation aligned with the source
> >>>> developers
> >>>> > > >> seems
> >>>> > > >> > > >>>> reasonable.
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> I think this is a simple and clean solution from both
> >>>> the end
> >>>> > user
> >>>> > > >> > and
> >>>> > > >> > > >>>> source developers' standpoint.
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> However, a last conclusion derives from 3. and is an
> open
> >>>> > question
> >>>> > > >> > for
> >>>> > > >> > > >>> me:
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> 4. The feature of "pausable splits" is now tightly
> bound
> >>>> to
> >>>> > > >> watermark
> >>>> > > >> > > >>> alignment, i.e., if sources do not support "pausable
> >>>> splits"
> >>>> > one
> >>>> > > >> can
> >>>> > > >> > > not
> >>>> > > >> > > >>> enable watermark alignment for these sources. This
> >>>> dependency
> >>>> > is
> >>>> > > >> not
> >>>> > > >> > > the
> >>>> > > >> > > >>> current status of watermark alignment implementation
> >>>> because
> >>>> > it
> >>>> > > >> > is/was
> >>>> > > >> > > >>> implemented without pausable splits. Do we want to
> >>>> introduce
> >>>> > this
> >>>> > > >> > > >>> dependency? (This is an open question. I cannot judge
> >>>> that.)
> >>>> > > >> > > >>>
> >>>> > > >> > > >> The watermark alignment basically relies on the pausable
> >>>> > splits,
> >>>> > > >> > right?
> >>>> > > >> > > >> So personally I found it quite reasonable that if the
> >>>> source
> >>>> > does
> >>>> > > >> not
> >>>> > > >> > > >> support pausable splits, end users cannot enable
> watermark
> >>>> > > >> alignment
> >>>> > > >> > on
> >>>> > > >> > > it.
> >>>> > > >> > > >>
> >>>> > > >> > > >>
> >>>> > > >> > > >>> If something is wrong, please correct me.
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> Regards,
> >>>> > > >> > > >>> Sebastian
> >>>> > > >> > > >>>
> >>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> >>>> > becket.qin@gmail.com>
> >>>> > > >> > > wrote:
> >>>> > > >> > > >>>
> >>>> > > >> > > >>>> Hi Sebastian,
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>> Thanks for the reply and patient discussion. I agree
> >>>> this is
> >>>> > a
> >>>> > > >> > tricky
> >>>> > > >> > > >>>> decision.
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option
> c)
> >>>> > which I
> >>>> > > >> see
> >>>> > > >> > as
> >>>> > > >> > > >>>>> follows:
> >>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
> >>>> makes the
> >>>> > > >> > > >>>>> implementation optional. And in my opinion, a default
> >>>> > > >> > implementation
> >>>> > > >> > > is and
> >>>> > > >> > > >>>>> will remain a way of making implementation optional
> >>>> because
> >>>> > > >> even in
> >>>> > > >> > > future
> >>>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
> >>>> without
> >>>> > > >> > support
> >>>> > > >> > > for
> >>>> > > >> > > >>>>> pausable splits.
> >>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
> >>>> suboptimal
> >>>> > > >> that
> >>>> > > >> > > with
> >>>> > > >> > > >>>>> a NOOP default implementation there is no way to
> check
> >>>> at
> >>>> > > >> runtime
> >>>> > > >> > if
> >>>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing.
> >>>> (To
> >>>> > do so,
> >>>> > > >> > one
> >>>> > > >> > > would
> >>>> > > >> > > >>>>> need a supportsX method which makes it again more
> >>>> > complicated.)\
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>> Based on the last few messages in the mailing list.
> >>>> Piotr
> >>>> > and I
> >>>> > > >> > > agreed
> >>>> > > >> > > >>>> that the default implementation should just throw an
> >>>> > > >> > > >>>> UnsupportedOperationException if the source is
> >>>> unpausable. So
> >>>> > > >> this
> >>>> > > >> > > >>>> basically tells the Source developers that this
> feature
> >>>> is
> >>>> > > >> expected
> >>>> > > >> > > to be
> >>>> > > >> > > >>>> supported. Because we cannot prevent end users from
> >>>> putting
> >>>> > an
> >>>> > > >> > > unpausable
> >>>> > > >> > > >>>> source into the watermark alignment group, that
> >>>> basically
> >>>> > means
> >>>> > > >> > > watermark
> >>>> > > >> > > >>>> alignment is an non-optional feature to the end users.
> >>>> So
> >>>> > making
> >>>> > > >> > that
> >>>> > > >> > > >>>> expectation aligned with the source developers seems
> >>>> > reasonable.
> >>>> > > >> > And
> >>>> > > >> > > if a
> >>>> > > >> > > >>>> source does not support this feature, the end users
> >>>> should
> >>>> > > >> > explicitly
> >>>> > > >> > > >>>> remove that source from the watermark alignment group.
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>> Personally speaking I think this is a simple and clean
> >>>> > solution
> >>>> > > >> from
> >>>> > > >> > > >>>> both the end user and source developers' standpoint.
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>> Does this address your concerns?
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>> Thanks,
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>> Jiangjie (Becket) Qin
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> >>>> > > >> > > >>>> sebastian@ververica.com> wrote:
> >>>> > > >> > > >>>>
> >>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> we, Dawid and I, discussed the various
> >>>> suggestions/options
> >>>> > and
> >>>> > > >> we
> >>>> > > >> > > >>>>> would be okay either way because we find neither
> >>>> solution is
> >>>> > > >> > perfect
> >>>> > > >> > > just
> >>>> > > >> > > >>>>> because of the already present complexity.
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
> >>>> SourceReader
> >>>> > and
> >>>> > > >> > > >>>>> SplitReader
> >>>> > > >> > > >>>>> Option a) Adding decorative interfaces to be used by
> >>>> > > >> SourceReader
> >>>> > > >> > and
> >>>> > > >> > > >>>>> SplitReader
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP [1], it
> is
> >>>> > based on
> >>>> > > >> > > >>>>> Option c) which we find acceptable because the
> >>>> complexity
> >>>> > added
> >>>> > > >> is
> >>>> > > >> > > only a
> >>>> > > >> > > >>>>> single method.
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option
> c)
> >>>> > which I
> >>>> > > >> see
> >>>> > > >> > as
> >>>> > > >> > > >>>>> follows:
> >>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
> >>>> makes the
> >>>> > > >> > > >>>>> implementation optional. And in my opinion, a default
> >>>> > > >> > implementation
> >>>> > > >> > > is and
> >>>> > > >> > > >>>>> will remain a way of making implementation optional
> >>>> because
> >>>> > > >> even in
> >>>> > > >> > > future
> >>>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
> >>>> without
> >>>> > > >> > support
> >>>> > > >> > > for
> >>>> > > >> > > >>>>> pausable splits.
> >>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
> >>>> suboptimal
> >>>> > > >> that
> >>>> > > >> > > with
> >>>> > > >> > > >>>>> a NOOP default implementation there is no way to
> check
> >>>> at
> >>>> > > >> runtime
> >>>> > > >> > if
> >>>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing.
> >>>> (To
> >>>> > do so,
> >>>> > > >> > one
> >>>> > > >> > > would
> >>>> > > >> > > >>>>> need a supportsX method which makes it again more
> >>>> > complicated.)
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> However, we haven't changed it because Option a) is
> >>>> also not
> >>>> > > >> > optimal
> >>>> > > >> > > >>>>> or straight-forward:
> >>>> > > >> > > >>>>> (1) We need to add two distinct yet similar
> decorative
> >>>> > > >> interfaces
> >>>> > > >> > > >>>>> since, as mentioned, the signatures of the methods
> are
> >>>> > > >> different.
> >>>> > > >> > For
> >>>> > > >> > > >>>>> example, we would need decorative interfaces like
> >>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> >>>> > > >> > `SourceReaderWithPausableSplits`.
> >>>> > > >> > > >>>>> (2) As a consequence, we would need to somehow
> document
> >>>> > > >> how/where
> >>>> > > >> > to
> >>>> > > >> > > >>>>> implement both interfaces and how this relates to
> each
> >>>> > other.
> >>>> > > >> This
> >>>> > > >> > > we could
> >>>> > > >> > > >>>>> solve by adding a note in the interface of
> >>>> SourceReader and
> >>>> > > >> > > SplitReader and
> >>>> > > >> > > >>>>> reference to the decorative interfaces but it still
> >>>> > increases
> >>>> > > >> > > complexity
> >>>> > > >> > > >>>>> too.
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> In summary, we see both as acceptable and preferred
> >>>> over
> >>>> > other
> >>>> > > >> > > >>>>> options. The question is if we can find a solution or
> >>>> > compromise
> >>>> > > >> > > that is
> >>>> > > >> > > >>>>> acceptable for everybody to reach consensus.
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> Please let us know what you think because we would be
> >>>> happy
> >>>> > if
> >>>> > > >> we
> >>>> > > >> > can
> >>>> > > >> > > >>>>> conclude the discussion to avoid dropping the
> >>>> initiative on
> >>>> > this
> >>>> > > >> > > FLIP.
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> Regards,
> >>>> > > >> > > >>>>> Sebastian
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> [1]
> >>>> > > >> > > >>>>>
> >>>> > > >> > >
> >>>> > > >> >
> >>>> > > >>
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> >>>> > > >> > > >>>>> (v. 12)
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> >>>> > > >> > pnowojski@apache.org
> >>>> > > >> > > >
> >>>> > > >> > > >>>>> wrote:
> >>>> > > >> > > >>>>>
> >>>> > > >> > > >>>>>> Hi Guowei,
> >>>> > > >> > > >>>>>>
> >>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
> >>>> > > >> > > >>>>>>
> >>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1] which has
> >>>> been
> >>>> > > >> already
> >>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be enabled
> >>>> with the
> >>>> > > >> > watermark
> >>>> > > >> > > >>>>>> strategy which also configures drift and update
> >>>> interval
> >>>> > > >> > > >>>>>>
> >>>> > > >> > > >>>>>> So by default watermark alignment is disabled,
> >>>> regardless
> >>>> > if a
> >>>> > > >> > > source
> >>>> > > >> > > >>>>>> supports it or not.
> >>>> > > >> > > >>>>>>
> >>>> > > >> > > >>>>>> Best,
> >>>> > > >> > > >>>>>> Piotrek
> >>>> > > >> > > >>>>>>
> >>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
> >>>> guowei.mgw@gmail.com>
> >>>> > > >> > > napisał(a):
> >>>> > > >> > > >>>>>>
> >>>> > > >> > > >>>>>>> Hi,
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>> We know that in the case of Bounded input Flink
> >>>> supports
> >>>> > the
> >>>> > > >> > Batch
> >>>> > > >> > > >>>>>>> execution mode. Currently in Batch execution mode,
> >>>> flink
> >>>> > is
> >>>> > > >> > > executed
> >>>> > > >> > > >>>>>>> on a
> >>>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps
> watermark
> >>>> > alignment
> >>>> > > >> > > might
> >>>> > > >> > > >>>>>>> not
> >>>> > > >> > > >>>>>>> gain much.
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>> So my question is: Is watermark alignment the
> default
> >>>> > > >> > behavior(for
> >>>> > > >> > > >>>>>>> implemented source only)? If so, have you
> considered
> >>>> > > >> evaluating
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> impact
> >>>> > > >> > > >>>>>>> of this behavior on the Batch execution mode? Or
> >>>> thinks
> >>>> > it is
> >>>> > > >> not
> >>>> > > >> > > >>>>>>> necessary.
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>> Correct me if I miss something.
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>> Best,
> >>>> > > >> > > >>>>>>> Guowei
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> >>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> >>>> > > >> > > >>>>>>> wrote:
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > > I feel that no matter which option we choose
> >>>> this can
> >>>> > not
> >>>> > > >> be
> >>>> > > >> > > >>>>>>> solved
> >>>> > > >> > > >>>>>>> > entirely in either of the options, because of the
> >>>> point
> >>>> > > >> above
> >>>> > > >> > and
> >>>> > > >> > > >>>>>>> because
> >>>> > > >> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits
> >>>> and
> >>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly
> >>>> different
> >>>> > (one
> >>>> > > >> > > >>>>>>> identifies
> >>>> > > >> > > >>>>>>> > splits with splitId the other one passes the
> splits
> >>>> > > >> directly).
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > Yes, that's a good point in this case and for
> >>>> features
> >>>> > that
> >>>> > > >> > need
> >>>> > > >> > > >>>>>>> to be
> >>>> > > >> > > >>>>>>> > implemented in more than one place.
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
> >>>> split
> >>>> > an
> >>>> > > >> > > optional
> >>>> > > >> > > >>>>>>> feature,
> >>>> > > >> > > >>>>>>> > > other than that this was not included in the
> >>>> original
> >>>> > > >> > > interface?
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > An additional argument in favor of making it
> >>>> optional
> >>>> > is to
> >>>> > > >> > > >>>>>>> simplify source
> >>>> > > >> > > >>>>>>> > implementation. But on its own I'm not sure if
> that
> >>>> > would be
> >>>> > > >> > > >>>>>>> enough to
> >>>> > > >> > > >>>>>>> > justify making this feature optional. Maybe.
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to
> >>>> just
> >>>> > let
> >>>> > > >> end
> >>>> > > >> > > >>>>>>> users and
> >>>> > > >> > > >>>>>>> > Flink
> >>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
> >>>> feature.
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > As I wrote above that would be an interesting
> >>>> choice to
> >>>> > make
> >>>> > > >> > > (ease
> >>>> > > >> > > >>>>>>> of
> >>>> > > >> > > >>>>>>> > implementation for new users, vs system
> >>>> consistency).
> >>>> > > >> > Regardless
> >>>> > > >> > > >>>>>>> of that,
> >>>> > > >> > > >>>>>>> > yes, for me the main argument is the API backward
> >>>> > > >> > compatibility.
> >>>> > > >> > > >>>>>>> But let's
> >>>> > > >> > > >>>>>>> > clear a couple of points:
> >>>> > > >> > > >>>>>>> > - The current proposal adding methods to the base
> >>>> > interface
> >>>> > > >> > with
> >>>> > > >> > > >>>>>>> default
> >>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as
> the
> >>>> > > >> decorative
> >>>> > > >> > > >>>>>>> version
> >>>> > > >> > > >>>>>>> > would be.
> >>>> > > >> > > >>>>>>> > - Decorative version could implement "throw
> >>>> > > >> > > >>>>>>> UnsupportedOperationException"
> >>>> > > >> > > >>>>>>> > if user enabled watermark alignment just as well
> >>>> and I
> >>>> > agree
> >>>> > > >> > > >>>>>>> that's a
> >>>> > > >> > > >>>>>>> > better option compared to logging a warning.
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > Best,
> >>>> > > >> > > >>>>>>> > Piotrek
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> >>>> > becket.qin@gmail.com>
> >>>> > > >> > > >>>>>>> napisał(a):
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>> > > Thanks for the reply and patient discussion,
> >>>> Piotr and
> >>>> > > >> Dawid.
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
> >>>> split
> >>>> > an
> >>>> > > >> > > optional
> >>>> > > >> > > >>>>>>> feature,
> >>>> > > >> > > >>>>>>> > > other than that this was not included in the
> >>>> original
> >>>> > > >> > > interface?
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > To be honest I am really worried about the
> >>>> complexity
> >>>> > of
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> user story
> >>>> > > >> > > >>>>>>> > > here. Optional features like this have a high
> >>>> > overhead.
> >>>> > > >> > Imagine
> >>>> > > >> > > >>>>>>> this
> >>>> > > >> > > >>>>>>> > > feature is optional, now a user enabled
> watermark
> >>>> > > >> alignment
> >>>> > > >> > and
> >>>> > > >> > > >>>>>>> defined a
> >>>> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that
> >>>> > depends on
> >>>> > > >> > > >>>>>>> whether the
> >>>> > > >> > > >>>>>>> > > involved Source has implmemented this feature.
> >>>> If the
> >>>> > > >> Sources
> >>>> > > >> > > >>>>>>> are well
> >>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end users may
> >>>> have to
> >>>> > > >> look
> >>>> > > >> > > into
> >>>> > > >> > > >>>>>>> the code
> >>>> > > >> > > >>>>>>> > > of the Source to see whether the feature is
> >>>> supported.
> >>>> > > >> Which
> >>>> > > >> > is
> >>>> > > >> > > >>>>>>> something
> >>>> > > >> > > >>>>>>> > > they shouldn't have to do.
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to
> >>>> just
> >>>> > let
> >>>> > > >> end
> >>>> > > >> > > >>>>>>> users and
> >>>> > > >> > > >>>>>>> > Flink
> >>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
> >>>> feature.
> >>>> > > >> After
> >>>> > > >> > > all
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > > watermark group is not optinoal to the end
> >>>> users. If
> >>>> > in
> >>>> > > >> some
> >>>> > > >> > > >>>>>>> rare cases,
> >>>> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
> >>>> > > >> > > >>>>>>> UnsupportedOperationException
> >>>> > > >> > > >>>>>>> > will
> >>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly remove
> this
> >>>> > Source
> >>>> > > >> from
> >>>> > > >> > > the
> >>>> > > >> > > >>>>>>> > watermark
> >>>> > > >> > > >>>>>>> > > group. I don't think we should have a warning
> >>>> message
> >>>> > > >> here,
> >>>> > > >> > as
> >>>> > > >> > > >>>>>>> they tend
> >>>> > > >> > > >>>>>>> > to
> >>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do this, we
> >>>> don't even
> >>>> > > >> need
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> > supportXXX
> >>>> > > >> > > >>>>>>> > > method in the Source for this feature. In fact
> >>>> this is
> >>>> > > >> > exactly
> >>>> > > >> > > >>>>>>> how many
> >>>> > > >> > > >>>>>>> > > interfaces works today. For example,
> >>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> >>>> > > >> > > >>>>>>> > not
> >>>> > > >> > > >>>>>>> > > supported by Pravega source because it does not
> >>>> > support
> >>>> > > >> > partial
> >>>> > > >> > > >>>>>>> failover.
> >>>> > > >> > > >>>>>>> > > In that case, it simply throws an exception to
> >>>> > trigger a
> >>>> > > >> > global
> >>>> > > >> > > >>>>>>> recovery.
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > The reason we add a default implementation in
> >>>> this
> >>>> > case
> >>>> > > >> would
> >>>> > > >> > > >>>>>>> just for
> >>>> > > >> > > >>>>>>> > the
> >>>> > > >> > > >>>>>>> > > sake of backwards compatibility so the old
> >>>> source can
> >>>> > > >> still
> >>>> > > >> > > >>>>>>> compile.
> >>>> > > >> > > >>>>>>> > Sure,
> >>>> > > >> > > >>>>>>> > > in short term, this feature might not be
> >>>> supported by
> >>>> > many
> >>>> > > >> > > >>>>>>> existing
> >>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite visible to
> >>>> the
> >>>> > source
> >>>> > > >> > > >>>>>>> developers
> >>>> > > >> > > >>>>>>> > that
> >>>> > > >> > > >>>>>>> > > they did not override the default impl which
> >>>> throws an
> >>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > @Dawid,
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in the
> >>>> Source
> >>>> > > >> would
> >>>> > > >> > be
> >>>> > > >> > > >>>>>>> the single
> >>>> > > >> > > >>>>>>> > > >> source of truth regarding how to implement
> >>>> this
> >>>> > > >> feature.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > I also don't find it entirely true. Half of the
> >>>> > classes
> >>>> > > >> are
> >>>> > > >> > > >>>>>>> theoretically
> >>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
> >>>> point of
> >>>> > view
> >>>> > > >> how
> >>>> > > >> > > the
> >>>> > > >> > > >>>>>>> > > interfaces
> >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not
> need
> >>>> to
> >>>> > use
> >>>> > > >> any
> >>>> > > >> > of
> >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be
> weird
> >>>> to
> >>>> > list
> >>>> > > >> > their
> >>>> > > >> > > >>>>>>> methods in
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > > Source interface.
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is to
> >>>> guide
> >>>> > users
> >>>> > > >> to
> >>>> > > >> > > >>>>>>> implement the
> >>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred
> way
> >>>> to
> >>>> > > >> > implement a
> >>>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning that.
> >>>> Even the
> >>>> > > >> Java
> >>>> > > >> > > >>>>>>> language
> >>>> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
> >>>> > implementations
> >>>> > > >> [1]
> >>>> > > >> > so
> >>>> > > >> > > >>>>>>> people
> >>>> > > >> > > >>>>>>> > can
> >>>> > > >> > > >>>>>>> > > leverage them. But for this particular case, if
> >>>> we
> >>>> > make
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> feature
> >>>> > > >> > > >>>>>>> > > non-optional, we don't even need the
> supportXXX()
> >>>> > method
> >>>> > > >> for
> >>>> > > >> > > now.
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > Thanks,
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid
> Wysakowicz <
> >>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > wrote:
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy with
> >>>> whichever
> >>>> > > >> option
> >>>> > > >> > is
> >>>> > > >> > > >>>>>>> agreed in
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > > discussion.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from the
> >>>> discussion
> >>>> > > >> > though:
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > @Becket:
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > The main argument for adding the methods to
> the
> >>>> > > >> > SourceReader
> >>>> > > >> > > >>>>>>> is that
> >>>> > > >> > > >>>>>>> > > these
> >>>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
> >>>> SourceReader
> >>>> > > >> impl,
> >>>> > > >> > > i.e.
> >>>> > > >> > > >>>>>>> > starting
> >>>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl
> are
> >>>> > expected
> >>>> > > >> to
> >>>> > > >> > > >>>>>>> support this
> >>>> > > >> > > >>>>>>> > > > method, although some old implementations may
> >>>> not
> >>>> > have
> >>>> > > >> > > >>>>>>> implemented this
> >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the
> new
> >>>> > features
> >>>> > > >> > from
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > optional
> >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> >>>> interface is a
> >>>> > > >> > solution
> >>>> > > >> > > >>>>>>> to the
> >>>> > > >> > > >>>>>>> > > > optional features. We should not use it for
> the
> >>>> > features
> >>>> > > >> > that
> >>>> > > >> > > >>>>>>> are
> >>>> > > >> > > >>>>>>> > > > non-optional.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > I don't think that this feature is
> >>>> NON-OPTIONAL.
> >>>> > Even
> >>>> > > >> > though
> >>>> > > >> > > >>>>>>> > preferred, I
> >>>> > > >> > > >>>>>>> > > > still think it can be simply optional.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in
> the
> >>>> > Source
> >>>> > > >> would
> >>>> > > >> > > be
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > single
> >>>> > > >> > > >>>>>>> > > > source of truth regarding how to implement
> this
> >>>> > feature.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > I also don't find it entirely true. Half of
> the
> >>>> > classes
> >>>> > > >> are
> >>>> > > >> > > >>>>>>> > theoretically
> >>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
> >>>> point of
> >>>> > view
> >>>> > > >> how
> >>>> > > >> > > the
> >>>> > > >> > > >>>>>>> > > interfaces
> >>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not
> need
> >>>> to
> >>>> > use
> >>>> > > >> any
> >>>> > > >> > of
> >>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be
> weird
> >>>> to
> >>>> > list
> >>>> > > >> > their
> >>>> > > >> > > >>>>>>> methods in
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > > Source interface.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > @Piotr
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > If we have all of the methods with default
> >>>> > > >> implementation
> >>>> > > >> > in
> >>>> > > >> > > >>>>>>> the base
> >>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any clue to
> >>>> the user
> >>>> > > >> which
> >>>> > > >> > > set
> >>>> > > >> > > >>>>>>> of
> >>>> > > >> > > >>>>>>> > methods
> >>>> > > >> > > >>>>>>> > > > are required to be implemented at the same
> >>>> time.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > I feel that no matter which option we choose
> >>>> this
> >>>> > can
> >>>> > > >> not
> >>>> > > >> > be
> >>>> > > >> > > >>>>>>> solved
> >>>> > > >> > > >>>>>>> > > > entirely in either of the options, because of
> >>>> the
> >>>> > point
> >>>> > > >> > above
> >>>> > > >> > > >>>>>>> and
> >>>> > > >> > > >>>>>>> > because
> >>>> > > >> > > >>>>>>> > > > the signature of
> >>>> SplitReader#pauseOrResumeSplits and
> >>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly
> >>>> > different
> >>>> > > >> > (one
> >>>> > > >> > > >>>>>>> identifies
> >>>> > > >> > > >>>>>>> > > > splits with splitId the other one passes the
> >>>> splits
> >>>> > > >> > > directly).
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Best,
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Dawid
> >>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Hi Piotr,
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Thanks for the comment.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the
> >>>> decorative
> >>>> > > >> > interfaces,
> >>>> > > >> > > >>>>>>> but I do
> >>>> > > >> > > >>>>>>> > > > think we should use them with caution. The
> main
> >>>> > argument
> >>>> > > >> > for
> >>>> > > >> > > >>>>>>> adding the
> >>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that these
> >>>> methods
> >>>> > are
> >>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader
> impl,
> >>>> i.e.
> >>>> > > >> > starting
> >>>> > > >> > > >>>>>>> from this
> >>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected
> >>>> to
> >>>> > support
> >>>> > > >> > this
> >>>> > > >> > > >>>>>>> > > > method, although some old implementations may
> >>>> not
> >>>> > have
> >>>> > > >> > > >>>>>>> implemented this
> >>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the
> new
> >>>> > features
> >>>> > > >> > from
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > optional
> >>>> > > >> > > >>>>>>> > > > features. While the public decorative
> >>>> interface is a
> >>>> > > >> > solution
> >>>> > > >> > > >>>>>>> to the
> >>>> > > >> > > >>>>>>> > > > optional features. We should not use it for
> the
> >>>> > features
> >>>> > > >> > that
> >>>> > > >> > > >>>>>>> are
> >>>> > > >> > > >>>>>>> > > > non-optional.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > That said, this feature is optional for
> >>>> > SplitReaders.
> >>>> > > >> > > Arguably
> >>>> > > >> > > >>>>>>> we can
> >>>> > > >> > > >>>>>>> > > have
> >>>> > > >> > > >>>>>>> > > > a decorative interface for that, but for
> >>>> simplicity
> >>>> > and
> >>>> > > >> > > >>>>>>> symmetry of the
> >>>> > > >> > > >>>>>>> > > > interface, personally I prefer just adding a
> >>>> new
> >>>> > method.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned about
> >>>> the
> >>>> > > >> decorative
> >>>> > > >> > > >>>>>>> interfaces,
> >>>> > > >> > > >>>>>>> > > > they would make sense if:
> >>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
> >>>> > > >> > > >>>>>>> > > > 2. There is only one decorative interface
> >>>> involved
> >>>> > for a
> >>>> > > >> > > >>>>>>> feature.
> >>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the methods
> are
> >>>> > grouped
> >>>> > > >> > > >>>>>>> together will
> >>>> > > >> > > >>>>>>> > not
> >>>> > > >> > > >>>>>>> > > > stand.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Compared with that, I think the current
> >>>> solution
> >>>> > works
> >>>> > > >> fine
> >>>> > > >> > > in
> >>>> > > >> > > >>>>>>> all
> >>>> > > >> > > >>>>>>> > cases,
> >>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source,
> and
> >>>> > default
> >>>> > > >> > > >>>>>>> methods /
> >>>> > > >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > The advantages are:
> >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > In the current approach, the Java doc of the
> >>>> > > >> SupportXXX()
> >>>> > > >> > > >>>>>>> method in the
> >>>> > > >> > > >>>>>>> > > > Source would be the single source of truth
> >>>> regarding
> >>>> > > >> how to
> >>>> > > >> > > >>>>>>> implement
> >>>> > > >> > > >>>>>>> > > this
> >>>> > > >> > > >>>>>>> > > > feature. It lists the method that has to be
> >>>> > implemented
> >>>> > > >> to
> >>>> > > >> > > >>>>>>> support this
> >>>> > > >> > > >>>>>>> > > > feature, regardless of how many classes /
> >>>> > interfaces are
> >>>> > > >> > > >>>>>>> involved.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > When implementing the base interface, users
> do
> >>>> not
> >>>> > need
> >>>> > > >> to
> >>>> > > >> > > >>>>>>> implement a
> >>>> > > >> > > >>>>>>> > > > method with default implementation. If they
> are
> >>>> > curious
> >>>> > > >> > what
> >>>> > > >> > > >>>>>>> the method
> >>>> > > >> > > >>>>>>> > > is
> >>>> > > >> > > >>>>>>> > > > for, the java doc of that method simply
> points
> >>>> > users to
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> > SupportXXX()
> >>>> > > >> > > >>>>>>> > > > method in the Source. It seems not adding
> work
> >>>> to
> >>>> > the
> >>>> > > >> users
> >>>> > > >> > > >>>>>>> compared
> >>>> > > >> > > >>>>>>> > with
> >>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much better
> >>>> > > >> > discoverability.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > - all of the methods from a single feature
> are
> >>>> > grouped
> >>>> > > >> in a
> >>>> > > >> > > >>>>>>> single
> >>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
> >>>> dedicated
> >>>> > java
> >>>> > > >> > doc.
> >>>> > > >> > > >>>>>>> It's also
> >>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
> >>>> > decorator
> >>>> > > >> name
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > - if an optional feature requires two methods
> >>>> to be
> >>>> > > >> > > >>>>>>> implemented at
> >>>> > > >> > > >>>>>>> > once,
> >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > These two points are not true when multiple
> >>>> > components
> >>>> > > >> and
> >>>> > > >> > > >>>>>>> classes are
> >>>> > > >> > > >>>>>>> > > > involved collaboratively to provide a
> feature.
> >>>> In
> >>>> > our
> >>>> > > >> case,
> >>>> > > >> > > we
> >>>> > > >> > > >>>>>>> have
> >>>> > > >> > > >>>>>>> > both
> >>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved. And
> >>>> there
> >>>> > might
> >>>> > > >> be
> >>>> > > >> > > other
> >>>> > > >> > > >>>>>>> > > interfaces
> >>>> > > >> > > >>>>>>> > > > on the JM side involved for some future
> >>>> features.
> >>>> > So the
> >>>> > > >> > > >>>>>>> relevant
> >>>> > > >> > > >>>>>>> > methods
> >>>> > > >> > > >>>>>>> > > > can actually be scattered over the places.
> That
> >>>> > said, we
> >>>> > > >> > may
> >>>> > > >> > > >>>>>>> still use
> >>>> > > >> > > >>>>>>> > > > decorative interfaces for each component, if
> >>>> the
> >>>> > > >> feature is
> >>>> > > >> > > >>>>>>> optional,
> >>>> > > >> > > >>>>>>> > > given
> >>>> > > >> > > >>>>>>> > > > there is a single source of truth for the
> >>>> feature.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards making
> life
> >>>> > easier
> >>>> > > >> for
> >>>> > > >> > new
> >>>> > > >> > > >>>>>>> users,
> >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
> >>>> slight
> >>>> > expense
> >>>> > > >> > for
> >>>> > > >> > > >>>>>>> the power
> >>>> > > >> > > >>>>>>> > > >> users.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > I actually think the current approach is
> >>>> simpler,
> >>>> > more
> >>>> > > >> > > >>>>>>> extensible and
> >>>> > > >> > > >>>>>>> > > more
> >>>> > > >> > > >>>>>>> > > > general for all the users. Can you articulate
> >>>> a bit
> >>>> > > >> more on
> >>>> > > >> > > >>>>>>> which part
> >>>> > > >> > > >>>>>>> > > you
> >>>> > > >> > > >>>>>>> > > > think makes users harder to understand?
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > There is another benefit of the decorative
> >>>> > interfaces
> >>>> > > >> which
> >>>> > > >> > > is
> >>>> > > >> > > >>>>>>> not
> >>>> > > >> > > >>>>>>> > > > mentioned, but might be worth considering
> here.
> >>>> > Usually
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> decorative
> >>>> > > >> > > >>>>>>> > > > interfaces give slightly better backwards
> >>>> > compatibility
> >>>> > > >> > than
> >>>> > > >> > > >>>>>>> the new
> >>>> > > >> > > >>>>>>> > > > default method in the interfaces. That is
> when
> >>>> > users are
> >>>> > > >> > > using
> >>>> > > >> > > >>>>>>> a jar
> >>>> > > >> > > >>>>>>> > that
> >>>> > > >> > > >>>>>>> > > > was compiled with an older version of Flink
> >>>> which
> >>>> > does
> >>>> > > >> not
> >>>> > > >> > > >>>>>>> have the
> >>>> > > >> > > >>>>>>> > > default
> >>>> > > >> > > >>>>>>> > > > method in the interfaces in question. A
> >>>> decorative
> >>>> > > >> > interface
> >>>> > > >> > > >>>>>>> may still
> >>>> > > >> > > >>>>>>> > > > provide backwards compatibility in that case,
> >>>> while
> >>>> > > >> default
> >>>> > > >> > > >>>>>>> method impl
> >>>> > > >> > > >>>>>>> > > > cannot.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > I think in Flink we in general do not
> guarantee
> >>>> > custom
> >>>> > > >> > > >>>>>>> components
> >>>> > > >> > > >>>>>>> > > compiled
> >>>> > > >> > > >>>>>>> > > > with an older version can run with a newer
> >>>> version
> >>>> > of
> >>>> > > >> > Flink.
> >>>> > > >> > > A
> >>>> > > >> > > >>>>>>> > recompile
> >>>> > > >> > > >>>>>>> > > > with a newer version would be required. That
> >>>> said,
> >>>> > if
> >>>> > > >> we do
> >>>> > > >> > > >>>>>>> care about
> >>>> > > >> > > >>>>>>> > > > this, we can just change the "supportXXX()"
> >>>> method
> >>>> > in
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> Source
> >>>> > > >> > > >>>>>>> > > interface
> >>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave the
> >>>> other
> >>>> > parts
> >>>> > > >> > > >>>>>>> unchanged.
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Thanks,
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr
> Nowojski <
> >>>> > > >> > > >>>>>>> pnowojski@apache.org>
> >>>> > > >> > > >>>>>>> > > > wrote:
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > >> Hi,
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO
> >>>> for
> >>>> > the
> >>>> > > >> last
> >>>> > > >> > > two
> >>>> > > >> > > >>>>>>> weeks.
> >>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be
> actually
> >>>> > against
> >>>> > > >> > adding
> >>>> > > >> > > >>>>>>> those
> >>>> > > >> > > >>>>>>> > > methods
> >>>> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons
> >>>> mentioned
> >>>> > above.
> >>>> > > >> > > >>>>>>> Clogging the
> >>>> > > >> > > >>>>>>> > > base
> >>>> > > >> > > >>>>>>> > > >> interface for new users with tons of methods
> >>>> that
> >>>> > they
> >>>> > > >> do
> >>>> > > >> > > not
> >>>> > > >> > > >>>>>>> need, do
> >>>> > > >> > > >>>>>>> > > not
> >>>> > > >> > > >>>>>>> > > >> understand and do not know what to do with
> >>>> them.
> >>>> > > >> Moreover,
> >>>> > > >> > > >>>>>>> such
> >>>> > > >> > > >>>>>>> > > decorative
> >>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a
> feature
> >>>> > requires
> >>>> > > >> two
> >>>> > > >> > > or
> >>>> > > >> > > >>>>>>> more
> >>>> > > >> > > >>>>>>> > > methods
> >>>> > > >> > > >>>>>>> > > >> to be implemented at the same time. If we
> >>>> have all
> >>>> > of
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> methods with
> >>>> > > >> > > >>>>>>> > > >> default implementation in the base
> interface,
> >>>> the
> >>>> > API
> >>>> > > >> > > doesn't
> >>>> > > >> > > >>>>>>> give any
> >>>> > > >> > > >>>>>>> > > >> clue
> >>>> > > >> > > >>>>>>> > > >> to the user which set of methods are
> required
> >>>> to be
> >>>> > > >> > > >>>>>>> implemented at the
> >>>> > > >> > > >>>>>>> > > >> same
> >>>> > > >> > > >>>>>>> > > >> time.
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of
> decorative
> >>>> > > >> interfaces
> >>>> > > >> > is
> >>>> > > >> > > >>>>>>> which
> >>>> > > >> > > >>>>>>> > > >> interface
> >>>> > > >> > > >>>>>>> > > >> > they can decorate and which combinations
> of
> >>>> > multiple
> >>>> > > >> > > >>>>>>> decorative
> >>>> > > >> > > >>>>>>> > > >> interfaces
> >>>> > > >> > > >>>>>>> > > >> > are valid. (...)
> >>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new feature
> >>>> added
> >>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data
> >>>> aware), are
> >>>> > we
> >>>> > > >> > going
> >>>> > > >> > > >>>>>>> to create
> >>>> > > >> > > >>>>>>> > > >> another
> >>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
> >>>> > SortedSplitReader or
> >>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> >>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the
> >>>> additional
> >>>> > > >> > decorative
> >>>> > > >> > > >>>>>>> interface
> >>>> > > >> > > >>>>>>> > > >> like
> >>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
> >>>> > > >> understanding
> >>>> > > >> > > >>>>>>> cost of
> >>>> > > >> > > >>>>>>> > users
> >>>> > > >> > > >>>>>>> > > >> > because they have to know what decorative
> >>>> > interfaces
> >>>> > > >> are
> >>>> > > >> > > >>>>>>> there,
> >>>> > > >> > > >>>>>>> > which
> >>>> > > >> > > >>>>>>> > > >> > interface they can decorate and which
> >>>> > combinations of
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> decorative
> >>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are not.
> >>>> Ideally
> >>>> > we
> >>>> > > >> want
> >>>> > > >> > to
> >>>> > > >> > > >>>>>>> avoid
> >>>> > > >> > > >>>>>>> > that.
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how embedding
> >>>> default
> >>>> > > >> methods
> >>>> > > >> > > in
> >>>> > > >> > > >>>>>>> the base
> >>>> > > >> > > >>>>>>> > > >> interface is solving the problem: what can
> be
> >>>> > combined
> >>>> > > >> or
> >>>> > > >> > > >>>>>>> not? If
> >>>> > > >> > > >>>>>>> > there
> >>>> > > >> > > >>>>>>> > > >> are
> >>>> > > >> > > >>>>>>> > > >> two conflicting features, having decorative
> >>>> > interfaces
> >>>> > > >> > that
> >>>> > > >> > > >>>>>>> can not be
> >>>> > > >> > > >>>>>>> > > >> mixed together actually makes much more
> sense
> >>>> to me
> >>>> > > >> rather
> >>>> > > >> > > >>>>>>> than having
> >>>> > > >> > > >>>>>>> > > >> them
> >>>> > > >> > > >>>>>>> > > >> all in one base class. How would you allow
> >>>> users to
> >>>> > > >> > > implement
> >>>> > > >> > > >>>>>>> only one
> >>>> > > >> > > >>>>>>> > > of
> >>>> > > >> > > >>>>>>> > > >> those two features?
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are
> >>>> > drawbacks:
> >>>> > > >> > > >>>>>>> > > >> - how can a user discover what decorators
> are
> >>>> > there?
> >>>> > > >> > > >>>>>>> > > >> - how can a user know where the decorator
> can
> >>>> be
> >>>> > > >> applied
> >>>> > > >> > to?
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> However those are drawbacks for more power
> >>>> users,
> >>>> > that
> >>>> > > >> can
> >>>> > > >> > > be
> >>>> > > >> > > >>>>>>> > mitigated
> >>>> > > >> > > >>>>>>> > > by
> >>>> > > >> > > >>>>>>> > > >> the documentation. For example listing all
> of
> >>>> the
> >>>> > > >> > decorators
> >>>> > > >> > > >>>>>>> with
> >>>> > > >> > > >>>>>>> > > >> detailed explanation both in the docs and in
> >>>> the
> >>>> > java
> >>>> > > >> > docs.
> >>>> > > >> > > >>>>>>> More
> >>>> > > >> > > >>>>>>> > > >> experienced users will be able to deal with
> >>>> those
> >>>> > > >> issues
> >>>> > > >> > > >>>>>>> easier, as
> >>>> > > >> > > >>>>>>> > they
> >>>> > > >> > > >>>>>>> > > >> will already have some basic understanding
> of
> >>>> > Flink.
> >>>> > > >> Also
> >>>> > > >> > if
> >>>> > > >> > > >>>>>>> user has
> >>>> > > >> > > >>>>>>> > a
> >>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he will
> google
> >>>> > search a
> >>>> > > >> > > >>>>>>> potential
> >>>> > > >> > > >>>>>>> > > solution
> >>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing that
> >>>> he is
> >>>> > very
> >>>> > > >> > > likely
> >>>> > > >> > > >>>>>>> to
> >>>> > > >> > > >>>>>>> > > discover
> >>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in the
> >>>> docs.
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> The advantages are:
> >>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
> >>>> > > >> > > >>>>>>> > > >> - all of the methods from a single feature
> are
> >>>> > grouped
> >>>> > > >> in
> >>>> > > >> > a
> >>>> > > >> > > >>>>>>> single
> >>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
> >>>> dedicated
> >>>> > java
> >>>> > > >> > doc.
> >>>> > > >> > > >>>>>>> It's also
> >>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
> >>>> > decorator
> >>>> > > >> name
> >>>> > > >> > > >>>>>>> > > >> - if an optional feature requires two
> methods
> >>>> to be
> >>>> > > >> > > >>>>>>> implemented at
> >>>> > > >> > > >>>>>>> > once,
> >>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards making
> life
> >>>> > easier
> >>>> > > >> for
> >>>> > > >> > > new
> >>>> > > >> > > >>>>>>> users,
> >>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
> >>>> slight
> >>>> > expense
> >>>> > > >> > for
> >>>> > > >> > > >>>>>>> the power
> >>>> > > >> > > >>>>>>> > > >> users.
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> Best,
> >>>> > > >> > > >>>>>>> > > >> Piotrek
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> >>>> > > >> becket.qin@gmail.com
> >>>> > > >> > >
> >>>> > > >> > > >>>>>>> > napisał(a):
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good summary. This
> >>>> is a
> >>>> > > >> really
> >>>> > > >> > > >>>>>>> helpful
> >>>> > > >> > > >>>>>>> > > >> discussion.
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might
> >>>> still
> >>>> > be
> >>>> > > >> > better
> >>>> > > >> > > >>>>>>> to add the
> >>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source rather
> >>>> than
> >>>> > > >> > > SourceReader.
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are trying to
> do
> >>>> > here is
> >>>> > > >> to
> >>>> > > >> > > let
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > Flink
> >>>> > > >> > > >>>>>>> > > >> > framework know what the Source is capable
> >>>> of. In
> >>>> > this
> >>>> > > >> > > FLIP,
> >>>> > > >> > > >>>>>>> it
> >>>> > > >> > > >>>>>>> > happens
> >>>> > > >> > > >>>>>>> > > >> to
> >>>> > > >> > > >>>>>>> > > >> > be the capability that only involves
> >>>> > SourceReader.
> >>>> > > >> But
> >>>> > > >> > in
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > future,
> >>>> > > >> > > >>>>>>> > > >> it is
> >>>> > > >> > > >>>>>>> > > >> > possible that another functionality
> involves
> >>>> > both the
> >>>> > > >> > > >>>>>>> > SplitEnumerator
> >>>> > > >> > > >>>>>>> > > >> and
> >>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case, following the
> >>>> current
> >>>> > > >> > > approach,
> >>>> > > >> > > >>>>>>> we
> >>>> > > >> > > >>>>>>> > should
> >>>> > > >> > > >>>>>>> > > >> put
> >>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
> >>>> > SplitEnumerator
> >>>> > > >> and
> >>>> > > >> > > >>>>>>> SourceReader.
> >>>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
> >>>> SourceReader,
> >>>> > then
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> JM would
> >>>> > > >> > > >>>>>>> > > have
> >>>> > > >> > > >>>>>>> > > >> to
> >>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to know
> >>>> whether
> >>>> > this
> >>>> > > >> > > feature
> >>>> > > >> > > >>>>>>> is
> >>>> > > >> > > >>>>>>> > > >> supported,
> >>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we put the
> >>>> > > >> > "supportsXXX()"
> >>>> > > >> > > >>>>>>> method in
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > >> > Source, we will break the "symmetric"
> design
> >>>> > because
> >>>> > > >> > this
> >>>> > > >> > > >>>>>>> FLIP
> >>>> > > >> > > >>>>>>> > chose a
> >>>> > > >> > > >>>>>>> > > >> > different way.
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > This is also why I think supportsXXX()
> >>>> method
> >>>> > seems a
> >>>> > > >> > good
> >>>> > > >> > > >>>>>>> thing to
> >>>> > > >> > > >>>>>>> > > >> have,
> >>>> > > >> > > >>>>>>> > > >> > because when there are a few interfaces /
> >>>> methods
> >>>> > > >> that
> >>>> > > >> > are
> >>>> > > >> > > >>>>>>> expected
> >>>> > > >> > > >>>>>>> > to
> >>>> > > >> > > >>>>>>> > > >> be
> >>>> > > >> > > >>>>>>> > > >> > implemented at the same time in order to
> >>>> deliver
> >>>> > a
> >>>> > > >> > > feature,
> >>>> > > >> > > >>>>>>> it is
> >>>> > > >> > > >>>>>>> > > always
> >>>> > > >> > > >>>>>>> > > >> > good to have a single source of truth to
> >>>> tell the
> >>>> > > >> > > framework
> >>>> > > >> > > >>>>>>> what to
> >>>> > > >> > > >>>>>>> > > do,
> >>>> > > >> > > >>>>>>> > > >> so
> >>>> > > >> > > >>>>>>> > > >> > the framework can do consistent things in
> >>>> > different
> >>>> > > >> > parts.
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
> >>>> sebastian@ververica.com>
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> >>>> > > >> AlignedSourceReader
> >>>> > > >> > +
> >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward
> >>>> about is
> >>>> > > >> that we
> >>>> > > >> > > are
> >>>> > > >> > > >>>>>>> > > essentially
> >>>> > > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
> >>>> > > >> implementations to
> >>>> > > >> > > >>>>>>> extend
> >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes
> >>>> the
> >>>> > > >> > SourceReader
> >>>> > > >> > > >>>>>>> interface
> >>>> > > >> > > >>>>>>> > > >> > without the pausing support useless. So
> this
> >>>> > > >> indicates
> >>>> > > >> > > that
> >>>> > > >> > > >>>>>>> public
> >>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces
> >>>> for the
> >>>> > same
> >>>> > > >> > > >>>>>>> purpose) only
> >>>> > > >> > > >>>>>>> > > >> > make sense if the original interface is
> also
> >>>> > > >> expected to
> >>>> > > >> > > be
> >>>> > > >> > > >>>>>>> used.
> >>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to
> add
> >>>> the
> >>>> > > >> method
> >>>> > > >> > to
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > original
> >>>> > > >> > > >>>>>>> > > >> > interface itself.
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > Cheers,
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid
> >>>> Wysakowicz
> >>>> > <
> >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > wrote:
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > I think most of your points aligned with
> >>>> the
> >>>> > > >> > suggestions
> >>>> > > >> > > >>>>>>> I made to
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your reply
> (I
> >>>> > believe
> >>>> > > >> we
> >>>> > > >> > > hit
> >>>> > > >> > > >>>>>>> enter
> >>>> > > >> > > >>>>>>> > > >> nearly at
> >>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> >>>> > > >> supportsWatermarksSplitAlignment
> >>>> > > >> > > to
> >>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the
> >>>> general
> >>>> > > >> > capability
> >>>> > > >> > > of
> >>>> > > >> > > >>>>>>> > pausing.
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> >>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> >>>> > > >> > > >>>>>>> > > (option
> >>>> > > >> > > >>>>>>> > > >> b)
> >>>> > > >> > > >>>>>>> > > >> > > or if we should just add the methods
> >>>> (option
> >>>> > c), I
> >>>> > > >> > > >>>>>>> suggest to
> >>>> > > >> > > >>>>>>> > simply
> >>>> > > >> > > >>>>>>> > > >> add
> >>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is much
> >>>> > preferred
> >>>> > > >> > > approach
> >>>> > > >> > > >>>>>>> Becket,
> >>>> > > >> > > >>>>>>> > > >> which
> >>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless there is an
> >>>> > opposition
> >>>> > > >> > > let's
> >>>> > > >> > > >>>>>>> go with
> >>>> > > >> > > >>>>>>> > > this
> >>>> > > >> > > >>>>>>> > > >> > > option c.
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Best,
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Dawid
> >>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis
> >>>> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Hi folks,
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many
> thanks
> >>>> for
> >>>> > all
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> input and
> >>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed
> >>>> some
> >>>> > time
> >>>> > > >> to
> >>>> > > >> > > >>>>>>> catch up and
> >>>> > > >> > > >>>>>>> > > >> > structure
> >>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted
> >>>> to
> >>>> > find a
> >>>> > > >> way
> >>>> > > >> > > to
> >>>> > > >> > > >>>>>>> > structure
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the
> >>>> feeling
> >>>> > that
> >>>> > > >> some
> >>>> > > >> > > >>>>>>> concerns
> >>>> > > >> > > >>>>>>> > got
> >>>> > > >> > > >>>>>>> > > >> lost
> >>>> > > >> > > >>>>>>> > > >> > in
> >>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt and
> >>>> please
> >>>> > > >> correct
> >>>> > > >> > > me
> >>>> > > >> > > >>>>>>> if
> >>>> > > >> > > >>>>>>> > > >> something is
> >>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to
> >>>> collect and
> >>>> > > >> > assemble
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > > opinions,
> >>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the
> best
> >>>> of my
> >>>> > > >> > > >>>>>>> knowledge):
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
> >>>> (pause/resume
> >>>> > > >> > behavior)
> >>>> > > >> > > >>>>>>> be a
> >>>> > > >> > > >>>>>>> > general
> >>>> > > >> > > >>>>>>> > > >> > > capability?
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > I personally don't see any reason no to
> >>>> have
> >>>> > it a
> >>>> > > >> > > general
> >>>> > > >> > > >>>>>>> > capability
> >>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit method it is
> >>>> > actually
> >>>> > > >> > > >>>>>>> independent of
> >>>> > > >> > > >>>>>>> > the
> >>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have it
> a
> >>>> > general
> >>>> > > >> > > >>>>>>> capability, we
> >>>> > > >> > > >>>>>>> > > >> should
> >>>> > > >> > > >>>>>>> > > >> > > also agree on the right wording. Does
> >>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> >>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then
> >>>> actually
> >>>> > > >> meant?
> >>>> > > >> > (I
> >>>> > > >> > > >>>>>>> see it as
> >>>> > > >> > > >>>>>>> > > >> okay.
> >>>> > > >> > > >>>>>>> > > >> > I
> >>>> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid
> >>>> > suggested
> >>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
> >>>> > enable/disable
> >>>> > > >> split
> >>>> > > >> > > >>>>>>> alignment?
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > I would personally not disable the split
> >>>> > alignment
> >>>> > > >> on
> >>>> > > >> > > the
> >>>> > > >> > > >>>>>>> source
> >>>> > > >> > > >>>>>>> > > >> reader
> >>>> > > >> > > >>>>>>> > > >> > > side because if split alignment is used
> >>>> for
> >>>> > some
> >>>> > > >> other
> >>>> > > >> > > >>>>>>> use case
> >>>> > > >> > > >>>>>>> > (see
> >>>> > > >> > > >>>>>>> > > >> A)
> >>>> > > >> > > >>>>>>> > > >> > it
> >>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
> >>>> other/future
> >>>> > use
> >>>> > > >> > cases.
> >>>> > > >> > > >>>>>>> Instead,
> >>>> > > >> > > >>>>>>> > I
> >>>> > > >> > > >>>>>>> > > >> would
> >>>> > > >> > > >>>>>>> > > >> > > disable "watermark split alignment"
> where
> >>>> I
> >>>> > think
> >>>> > > >> it
> >>>> > > >> > > >>>>>>> should
> >>>> > > >> > > >>>>>>> > disable
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for split
> >>>> > alignment.
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX
> method?
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define the scope
> >>>> of a
> >>>> > > >> supportsX
> >>>> > > >> > > >>>>>>> method
> >>>> > > >> > > >>>>>>> > w.r.t.
> >>>> > > >> > > >>>>>>> > > >> to
> >>>> > > >> > > >>>>>>> > > >> > > the following questions: a) Where is it
> >>>> used?
> >>>> > and
> >>>> > > >> b)
> >>>> > > >> > > What
> >>>> > > >> > > >>>>>>> is the
> >>>> > > >> > > >>>>>>> > > >> expected
> >>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not straight-forward
> >>>> to
> >>>> > > >> provide a
> >>>> > > >> > > >>>>>>> meaningful
> >>>> > > >> > > >>>>>>> > > >> output,
> >>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split
> >>>> alignment
> >>>> > but
> >>>> > > >> > > >>>>>>> SplitReader
> >>>> > > >> > > >>>>>>> > not.
> >>>> > > >> > > >>>>>>> > > >> This
> >>>> > > >> > > >>>>>>> > > >> > is
> >>>> > > >> > > >>>>>>> > > >> > > because with the current implementation,
> >>>> we can
> >>>> > > >> > > determine
> >>>> > > >> > > >>>>>>> whether
> >>>> > > >> > > >>>>>>> > > >> split
> >>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported only during
> >>>> > runtime
> >>>> > > >> and
> >>>> > > >> > > >>>>>>> specifically
> >>>> > > >> > > >>>>>>> > > >> > actually
> >>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down the
> >>>> call
> >>>> > > >> hierarchy
> >>>> > > >> > up
> >>>> > > >> > > >>>>>>> to the
> >>>> > > >> > > >>>>>>> > > actual
> >>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either
> >>>> raise an
> >>>> > > >> error or
> >>>> > > >> > > >>>>>>> warning if
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > >> > > alignment is called but not supported at
> >>>> some
> >>>> > > >> point. I
> >>>> > > >> > > >>>>>>> know we
> >>>> > > >> > > >>>>>>> > > should
> >>>> > > >> > > >>>>>>> > > >> > > carefully think about when this could be
> >>>> the
> >>>> > case
> >>>> > > >> > > because
> >>>> > > >> > > >>>>>>> we don't
> >>>> > > >> > > >>>>>>> > > >> want
> >>>> > > >> > > >>>>>>> > > >> > to
> >>>> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings.
> However,
> >>>> > warnings
> >>>> > > >> > > could
> >>>> > > >> > > >>>>>>> be an
> >>>> > > >> > > >>>>>>> > > >> indicator
> >>>> > > >> > > >>>>>>> > > >> > > for the user that for watermark split
> >>>> > alignment use
> >>>> > > >> > case
> >>>> > > >> > > >>>>>>> split
> >>>> > > >> > > >>>>>>> > > >> reading is
> >>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to either
> >>>> > disable
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> trigger for
> >>>> > > >> > > >>>>>>> > > >> > watermark
> >>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
> >>>> > use/implement a
> >>>> > > >> > source
> >>>> > > >> > > >>>>>>> and
> >>>> > > >> > > >>>>>>> > reader
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the discussion
> >>>> with the
> >>>> > the
> >>>> > > >> > > various
> >>>> > > >> > > >>>>>>> > > >> possibilities
> >>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I
> >>>> would
> >>>> > > >> like to
> >>>> > > >> > > >>>>>>> summarize
> >>>> > > >> > > >>>>>>> > the
> >>>> > > >> > > >>>>>>> > > >> > > following requirements:
> >>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
> >>>> > ("symmetric"),
> >>>> > > >> i.e.,
> >>>> > > >> > > >>>>>>> similar
> >>>> > > >> > > >>>>>>> > > >> semantics
> >>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces with
> >>>> similar
> >>>> > usage.
> >>>> > > >> > > >>>>>>> > > >> > > - Make explicit which implementations
> >>>> implement
> >>>> > > >> > > >>>>>>> interfaces/support
> >>>> > > >> > > >>>>>>> > > >> > > behavior.
> >>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
> >>>> implementations
> >>>> > and
> >>>> > > >> how
> >>>> > > >> > to
> >>>> > > >> > > >>>>>>> implement
> >>>> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the
> relations
> >>>> > between
> >>>> > > >> > > >>>>>>> relevant
> >>>> > > >> > > >>>>>>> > classes
> >>>> > > >> > > >>>>>>> > > of
> >>>> > > >> > > >>>>>>> > > >> > the
> >>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
> >>>> SourceReaderBase
> >>>> > > >> (Internal)
> >>>> > > >> > > >>>>>>> <|-- ..
> >>>> > > >> > > >>>>>>> > <|--
> >>>> > > >> > > >>>>>>> > > >> > > MySourceReader
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager
> >>>> > (Internal)
> >>>> > > >> > <>--
> >>>> > > >> > > >>>>>>> > SplitFetcher
> >>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public)
> <|--
> >>>> > > >> > MySplitReader
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B:
> A
> >>>> "has
> >>>> > a"
> >>>> > > >> B)
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> >>>> > SplitFetcherManager
> >>>> > > >> > > >>>>>>> implement most
> >>>> > > >> > > >>>>>>> > of
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just because
> >>>> we
> >>>> > wanted
> >>>> > > >> to
> >>>> > > >> > > >>>>>>> implement
> >>>> > > >> > > >>>>>>> > > split
> >>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be available
> as
> >>>> > kind of
> >>>> > > >> a
> >>>> > > >> > > >>>>>>> default. As a
> >>>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
> >>>> > implementation" for
> >>>> > > >> > > >>>>>>> SourceReader
> >>>> > > >> > > >>>>>>> > > that
> >>>> > > >> > > >>>>>>> > > >> > > implements the actual logic for split
> >>>> > alignment.
> >>>> > > >> For
> >>>> > > >> > > that
> >>>> > > >> > > >>>>>>> reason,
> >>>> > > >> > > >>>>>>> > I
> >>>> > > >> > > >>>>>>> > > >> find
> >>>> > > >> > > >>>>>>> > > >> > it
> >>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
> >>>> > > >> implementation
> >>>> > > >> > in
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > > interface
> >>>> > > >> > > >>>>>>> > > >> for
> >>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence,
> >>>> interface
> >>>> > > >> strategy
> >>>> > > >> > > c)
> >>>> > > >> > > >>>>>>> is
> >>>> > > >> > > >>>>>>> > > difficult
> >>>> > > >> > > >>>>>>> > > >> > > because this would require NOOP default
> >>>> > > >> > implementations
> >>>> > > >> > > >>>>>>> in the
> >>>> > > >> > > >>>>>>> > > public
> >>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and
> >>>> SplitReader.
> >>>> > This is
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> same for
> >>>> > > >> > > >>>>>>> > > >> strategy
> >>>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP default
> >>>> > > >> > implementation
> >>>> > > >> > > >>>>>>> in the
> >>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid
> described
> >>>> > method
> >>>> > > >> > > >>>>>>> signatures of
> >>>> > > >> > > >>>>>>> > > >> alignSplit
> >>>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ
> >>>> and it
> >>>> > > >> would
> >>>> > > >> > be
> >>>> > > >> > > >>>>>>> extremely
> >>>> > > >> > > >>>>>>> > > >> > difficult
> >>>> > > >> > > >>>>>>> > > >> > > to make the signatures the same (with
> even
> >>>> > > >> potential
> >>>> > > >> > > >>>>>>> performance
> >>>> > > >> > > >>>>>>> > > >> impact
> >>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of split
> >>>> ids).
> >>>> > > >> > Therefore,
> >>>> > > >> > > >>>>>>> having a
> >>>> > > >> > > >>>>>>> > > >> > symmetric
> >>>> > > >> > > >>>>>>> > > >> > > decorative interface as of strategy a)
> is
> >>>> > actually
> >>>> > > >> not
> >>>> > > >> > > >>>>>>> possible
> >>>> > > >> > > >>>>>>> > and
> >>>> > > >> > > >>>>>>> > > >> > having
> >>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with different
> >>>> method
> >>>> > > >> > > >>>>>>> signatures is
> >>>> > > >> > > >>>>>>> > > >> confusing.
> >>>> > > >> > > >>>>>>> > > >> > My
> >>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best with
> >>>> strategy b)
> >>>> > > >> which
> >>>> > > >> > > >>>>>>> means to
> >>>> > > >> > > >>>>>>> > have
> >>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit
> >>>> from
> >>>> > the
> >>>> > > >> > parent
> >>>> > > >> > > >>>>>>> > interface:
> >>>> > > >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader,
> >>>> > SplitReader
> >>>> > > >> > <|--
> >>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> >>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what
> >>>> the
> >>>> > > >> > > implications
> >>>> > > >> > > >>>>>>> are and
> >>>> > > >> > > >>>>>>> > if
> >>>> > > >> > > >>>>>>> > > >> this
> >>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest that
> >>>> Dawid
> >>>> > and I
> >>>> > > >> just
> >>>> > > >> > > >>>>>>> try to
> >>>> > > >> > > >>>>>>> > > >> implement
> >>>> > > >> > > >>>>>>> > > >> > and
> >>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > # Summary
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know your
> >>>> > > >> perspectives.
> >>>> > > >> > > >>>>>>> Please
> >>>> > > >> > > >>>>>>> > correct
> >>>> > > >> > > >>>>>>> > > >> me,
> >>>> > > >> > > >>>>>>> > > >> > if
> >>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
> >>>> > > >> something. My
> >>>> > > >> > > >>>>>>> perspective
> >>>> > > >> > > >>>>>>> > > >> would
> >>>> > > >> > > >>>>>>> > > >> > be:
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
> >>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark
> trigger
> >>>> for
> >>>> > split
> >>>> > > >> > > >>>>>>> alignment)
> >>>> > > >> > > >>>>>>> > > >> > > Top C: No
> >>>> > > >> > > >>>>>>> > > >> > > Top D: b)
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > Best,
> >>>> > > >> > > >>>>>>> > > >> > > Sebastian
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid
> >>>> > Wysakowicz <
> >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > > wrote:
> >>>> > > >> > > >>>>>>> > > >> > >
> >>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> >>>> > approach, I
> >>>> > > >> > fear
> >>>> > > >> > > >>>>>>> that it
> >>>> > > >> > > >>>>>>> > > >> doesn't
> >>>> > > >> > > >>>>>>> > > >> > work
> >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources
> can
> >>>> > always
> >>>> > > >> be
> >>>> > > >> > > >>>>>>> aligned
> >>>> > > >> > > >>>>>>> > > >> cross-task
> >>>> > > >> > > >>>>>>> > > >> > and
> >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> alignment.
> >>>> So
> >>>> > it's
> >>>> > > >> > > >>>>>>> plausible to put
> >>>> > > >> > > >>>>>>> > > >> > sources
> >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though
> they
> >>>> do
> >>>> > not
> >>>> > > >> use
> >>>> > > >> > any
> >>>> > > >> > > >>>>>>> of the
> >>>> > > >> > > >>>>>>> > > >> presented
> >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> issue a
> >>>> > > >> warning, if
> >>>> > > >> > > >>>>>>> they handle
> >>>> > > >> > > >>>>>>> > > >> > multiple
> >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would
> >>>> be for
> >>>> > > >> telling
> >>>> > > >> > > if
> >>>> > > >> > > >>>>>>> it
> >>>> > > >> > > >>>>>>> > > supports
> >>>> > > >> > > >>>>>>> > > >> > that
> >>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task
> >>>> alignment
> >>>> > would
> >>>> > > >> > always
> >>>> > > >> > > >>>>>>> be
> >>>> > > >> > > >>>>>>> > > supported.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I believe
> >>>> to be
> >>>> > > >> closest
> >>>> > > >> > > to a
> >>>> > > >> > > >>>>>>> > consensus
> >>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do you mind
> >>>> taking a
> >>>> > > >> look?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the
> >>>> nits?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I
> fully
> >>>> agree
> >>>> > > >> that
> >>>> > > >> > > >>>>>>> decorator
> >>>> > > >> > > >>>>>>> > > >> interfaces
> >>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a
> big
> >>>> fan
> >>>> > of
> >>>> > > >> > > >>>>>>> overloading the
> >>>> > > >> > > >>>>>>> > > base
> >>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to implement
> as
> >>>> > is). The
> >>>> > > >> > > usual
> >>>> > > >> > > >>>>>>> feedback
> >>>> > > >> > > >>>>>>> > > to
> >>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are always
> that
> >>>> > they are
> >>>> > > >> > > >>>>>>> overwhelming
> >>>> > > >> > > >>>>>>> > and
> >>>> > > >> > > >>>>>>> > > >> too
> >>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also
> not
> >>>> > oppose
> >>>> > > >> 1c)
> >>>> > > >> > as
> >>>> > > >> > > >>>>>>> scattered
> >>>> > > >> > > >>>>>>> > > >> > interfaces
> >>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike
> >>>> 1a) and
> >>>> > 1d).
> >>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> >>>> > approach, I
> >>>> > > >> > fear
> >>>> > > >> > > >>>>>>> that it
> >>>> > > >> > > >>>>>>> > > >> doesn't
> >>>> > > >> > > >>>>>>> > > >> > work
> >>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources
> can
> >>>> > always
> >>>> > > >> be
> >>>> > > >> > > >>>>>>> aligned
> >>>> > > >> > > >>>>>>> > > >> cross-task
> >>>> > > >> > > >>>>>>> > > >> > and
> >>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task
> alignment.
> >>>> So
> >>>> > it's
> >>>> > > >> > > >>>>>>> plausible to put
> >>>> > > >> > > >>>>>>> > > >> > sources
> >>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though
> they
> >>>> do
> >>>> > not
> >>>> > > >> use
> >>>> > > >> > any
> >>>> > > >> > > >>>>>>> of the
> >>>> > > >> > > >>>>>>> > > >> presented
> >>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just
> issue a
> >>>> > > >> warning, if
> >>>> > > >> > > >>>>>>> they handle
> >>>> > > >> > > >>>>>>> > > >> > multiple
> >>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to
> >>>> facilitate
> >>>> > future
> >>>> > > >> use
> >>>> > > >> > > >>>>>>> cases makes
> >>>> > > >> > > >>>>>>> > > >> sense
> >>>> > > >> > > >>>>>>> > > >> > but
> >>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen)
> >>>> should
> >>>> > be
> >>>> > > >> > > adjusted
> >>>> > > >> > > >>>>>>> > > >> accordingly.
> >>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> >>>> > PausingSourceReader
> >>>> > > >> and
> >>>> > > >> > > I'd
> >>>> > > >> > > >>>>>>> go for
> >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal
> >>>> > afaik). We
> >>>> > > >> > could
> >>>> > > >> > > >>>>>>> also
> >>>> > > >> > > >>>>>>> > split
> >>>> > > >> > > >>>>>>> > > it
> >>>> > > >> > > >>>>>>> > > >> > into
> >>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> >>>> > > >> pauseOrResumeSplits
> >>>> > > >> > > >>>>>>> may allow
> >>>> > > >> > > >>>>>>> > > >> Sources
> >>>> > > >> > > >>>>>>> > > >> > to
> >>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls
> (as
> >>>> > written
> >>>> > > >> in
> >>>> > > >> > > the
> >>>> > > >> > > >>>>>>> > Javadoc),
> >>>> > > >> > > >>>>>>> > > >> both
> >>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm
> not
> >>>> > sure if
> >>>> > > >> > there
> >>>> > > >> > > >>>>>>> is a
> >>>> > > >> > > >>>>>>> > system
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >> can.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> >>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> >>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the
> >>>> capability
> >>>> > > >> > section:
> >>>> > > >> > > >>>>>>> Sources
> >>>> > > >> > > >>>>>>> > that
> >>>> > > >> > > >>>>>>> > > >> > adopt
> >>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in
> earlier
> >>>> > > >> versions. So
> >>>> > > >> > > it
> >>>> > > >> > > >>>>>>> feels
> >>>> > > >> > > >>>>>>> > like
> >>>> > > >> > > >>>>>>> > > >> we
> >>>> > > >> > > >>>>>>> > > >> > are
> >>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old sources
> can
> >>>> be
> >>>> > used
> >>>> > > >> > after
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > change);
> >>>> > > >> > > >>>>>>> > > >> but
> >>>> > > >> > > >>>>>>> > > >> > I
> >>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API addition.
> >>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what happens
> >>>> when all
> >>>> > > >> splits
> >>>> > > >> > > are
> >>>> > > >> > > >>>>>>> paused.
> >>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how the 3
> >>>> flavors
> >>>> > of
> >>>> > > >> > > >>>>>>> SourceReaderBase
> >>>> > > >> > > >>>>>>> > > >> > interact
> >>>> > > >> > > >>>>>>> > > >> > >> with the interface.
> >>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to
> >>>> include
> >>>> > Kafka
> >>>> > > >> and
> >>>> > > >> > > >>>>>>> Pulsar in
> >>>> > > >> > > >>>>>>> > the
> >>>> > > >> > > >>>>>>> > > >> > FLIP.
> >>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate
> >>>> follow-up
> >>>> > work.
> >>>> > > >> > (could
> >>>> > > >> > > >>>>>>> be in the
> >>>> > > >> > > >>>>>>> > > >> same
> >>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Arvid
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid
> >>>> > Wysakowicz
> >>>> > > >> <
> >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> >>>> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> >>>> SourceReader,
> >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> >>>> > > >> > > >>>>>>> > > >> along
> >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> >>>> SplitReader,
> >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> >>>> > > >> > > >>>>>>> > > or
> >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> >>>> > AlignedSourceReader"
> >>>> > > >> > and
> >>>> > > >> > > >>>>>>> > > "MySplitReader
> >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> >>>> SourceReader"
> >>>> > and
> >>>> > > >> > > >>>>>>> "MySplitReader
> >>>> > > >> > > >>>>>>> > > >> > implements
> >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according
> to
> >>>> Dawid
> >>>> > > >> would
> >>>> > > >> > > be:
> >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> >>>> SourceReader"
> >>>> > and
> >>>> > > >> > > >>>>>>> "MySplitReader
> >>>> > > >> > > >>>>>>> > > >> > implements
> >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> >>>> personally
> >>>> > > >> > speaking I
> >>>> > > >> > > >>>>>>> think it
> >>>> > > >> > > >>>>>>> > is
> >>>> > > >> > > >>>>>>> > > >> > simpler
> >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
> >>>> reader
> >>>> > with
> >>>> > > >> > > >>>>>>> default impl.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to have it
> >>>> > aligned as
> >>>> > > >> > much
> >>>> > > >> > > as
> >>>> > > >> > > >>>>>>> > possible.
> >>>> > > >> > > >>>>>>> > > >> I'd
> >>>> > > >> > > >>>>>>> > > >> > be
> >>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
> >>>> > > >> > AlignedSplitReader
> >>>> > > >> > > >>>>>>> with
> >>>> > > >> > > >>>>>>> > > >> > SplitReader. We
> >>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align
> other
> >>>> > things in
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> future?
> >>>> > > >> > > >>>>>>> > For
> >>>> > > >> > > >>>>>>> > > >> > example,
> >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> >>>> limits. This
> >>>> > > >> > feature
> >>>> > > >> > > >>>>>>> may also
> >>>> > > >> > > >>>>>>> > > need
> >>>> > > >> > > >>>>>>> > > >> an
> >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> >>>> interface to
> >>>> > > >> support
> >>>> > > >> > > >>>>>>> align other
> >>>> > > >> > > >>>>>>> > > >> > things, I
> >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> >>>> > > >> > > >>>>>>> > > >> instead
> >>>> > > >> > > >>>>>>> > > >> > of
> >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it
> >>>> would
> >>>> > > >> usually
> >>>> > > >> > be
> >>>> > > >> > > >>>>>>> more
> >>>> > > >> > > >>>>>>> > > >> extensible
> >>>> > > >> > > >>>>>>> > > >> > to
> >>>> > > >> > > >>>>>>> > > >> > >> just describe what the method actually
> >>>> does,
> >>>> > > >> instead
> >>>> > > >> > of
> >>>> > > >> > > >>>>>>> assuming
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For example, in
> >>>> this
> >>>> > case,
> >>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> >>>> > > >> > > >>>>>>> > > >> > >> would be more extensible because this
> >>>> can be
> >>>> > used
> >>>> > > >> for
> >>>> > > >> > > >>>>>>> any kind of
> >>>> > > >> > > >>>>>>> > > >> flow
> >>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark alignment or
> >>>> simple
> >>>> > rate
> >>>> > > >> > > >>>>>>> limiting.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this.
> I
> >>>> think
> >>>> > > >> they
> >>>> > > >> > > have
> >>>> > > >> > > >>>>>>> their
> >>>> > > >> > > >>>>>>> > own
> >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces
> >>>> for
> >>>> > > >> extending
> >>>> > > >> > > >>>>>>> abilities
> >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> TableSource/TableSink,
> >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
> >>>> methods in
> >>>> > > >> this
> >>>> > > >> > > >>>>>>> case. When
> >>>> > > >> > > >>>>>>> > you
> >>>> > > >> > > >>>>>>> > > >> have
> >>>> > > >> > > >>>>>>> > > >> > a
> >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> >>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces
> can
> >>>> help
> >>>> > to
> >>>> > > >> > > >>>>>>> organize them
> >>>> > > >> > > >>>>>>> > and
> >>>> > > >> > > >>>>>>> > > >> make
> >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have
> >>>> an
> >>>> > > >> ability.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if
> >>>> this is
> >>>> > a
> >>>> > > >> > general
> >>>> > > >> > > >>>>>>> design
> >>>> > > >> > > >>>>>>> > > >> pattern
> >>>> > > >> > > >>>>>>> > > >> > in
> >>>> > > >> > > >>>>>>> > > >> > >> the other components in Flink. But it
> >>>> looks
> >>>> > like
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> current API
> >>>> > > >> > > >>>>>>> > > >> > proposal
> >>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> The current proposal is essentially
> >>>> > > >> "MySourceReader
> >>>> > > >> > > >>>>>>> implements
> >>>> > > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment",
> >>>> along with
> >>>> > > >> > > >>>>>>> "MySplitReader
> >>>> > > >> > > >>>>>>> > > >> implements
> >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd
> >>>> > consider any
> >>>> > > >> > one
> >>>> > > >> > > >>>>>>> of the
> >>>> > > >> > > >>>>>>> > > >> following
> >>>> > > >> > > >>>>>>> > > >> > as
> >>>> > > >> > > >>>>>>> > > >> > >> symmetric.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
> >>>> SourceReader,
> >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> >>>> > > >> > > >>>>>>> > > >> along
> >>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
> >>>> SplitReader,
> >>>> > > >> > > >>>>>>> WithSplitsAlignment",
> >>>> > > >> > > >>>>>>> > > or
> >>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> >>>> > AlignedSourceReader"
> >>>> > > >> > and
> >>>> > > >> > > >>>>>>> > > "MySplitReader
> >>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> >>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
> >>>> SourceReader"
> >>>> > and
> >>>> > > >> > > >>>>>>> "MySplitReader
> >>>> > > >> > > >>>>>>> > > >> > implements
> >>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according
> to
> >>>> Dawid
> >>>> > > >> would
> >>>> > > >> > > be:
> >>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
> >>>> SourceReader"
> >>>> > and
> >>>> > > >> > > >>>>>>> "MySplitReader
> >>>> > > >> > > >>>>>>> > > >> > implements
> >>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> >>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
> >>>> personally
> >>>> > > >> > speaking I
> >>>> > > >> > > >>>>>>> think it
> >>>> > > >> > > >>>>>>> > is
> >>>> > > >> > > >>>>>>> > > >> > simpler
> >>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
> >>>> reader
> >>>> > with
> >>>> > > >> > > >>>>>>> default impl.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
> >>>> dwysakowicz@apache.org> <
> >>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > > >> <
> >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> >>>> dwysakowicz@apache.org
> >>>> > >,
> >>>> > > >> > thanks
> >>>> > > >> > > >>>>>>> for the
> >>>> > > >> > > >>>>>>> > > reply.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> >>>> preference
> >>>> > > >> and I
> >>>> > > >> > > >>>>>>> agree most
> >>>> > > >> > > >>>>>>> > of
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
> >>>> fine
> >>>> > > >> > following
> >>>> > > >> > > >>>>>>> your
> >>>> > > >> > > >>>>>>> > > >> suggestion
> >>>> > > >> > > >>>>>>> > > >> > to
> >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> >>>> > > >> > > >>>>>>> > > >> but
> >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not
> >>>> in the
> >>>> > > >> Source
> >>>> > > >> > to
> >>>> > > >> > > >>>>>>> keep the
> >>>> > > >> > > >>>>>>> > > two
> >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX"
> >>>> in
> >>>> > Source
> >>>> > > >> is
> >>>> > > >> > > >>>>>>> that this
> >>>> > > >> > > >>>>>>> > > allows
> >>>> > > >> > > >>>>>>> > > >> > some
> >>>> > > >> > > >>>>>>> > > >> > >> compile time check. For example, if a
> >>>> user
> >>>> > enabled
> >>>> > > >> > > >>>>>>> watermark
> >>>> > > >> > > >>>>>>> > > >> alignment
> >>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by the
> Source,
> >>>> an
> >>>> > > >> exception
> >>>> > > >> > > >>>>>>> can be
> >>>> > > >> > > >>>>>>> > thrown
> >>>> > > >> > > >>>>>>> > > >> at
> >>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general
> >>>> useful. That
> >>>> > > >> said,
> >>>> > > >> > I
> >>>> > > >> > > >>>>>>> agree that
> >>>> > > >> > > >>>>>>> > > API
> >>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put
> the
> >>>> two
> >>>> > > >> methods
> >>>> > > >> > > >>>>>>> together.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark
> Wu <
> >>>> > > >> > > >>>>>>> imjark@gmail.com> <
> >>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
> >>>> > > >> imjark@gmail.com>
> >>>> > > >> > > >>>>>>> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on
> FLIP-182.
> >>>> > Sorry I
> >>>> > > >> did
> >>>> > > >> > > >>>>>>> miss it.
> >>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >> Jark
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
> >>>> > Wysakowicz <
> >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> >>>> dwysakowicz@apache.org>
> >>>> > <
> >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Jark:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
> >>>> > watermarks
> >>>> > > >> > when
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > source
> >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> watermark
> >>>> > > >> alignment
> >>>> > > >> > > even
> >>>> > > >> > > >>>>>>> if Kafka
> >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> >>>> somehow. I
> >>>> > > >> agree
> >>>> > > >> > > >>>>>>> with Becket
> >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
> >>>> users to
> >>>> > > >> > > configure
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > source
> >>>> > > >> > > >>>>>>> > > >> to
> >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> >>>> > > >> > > maxDesiredWatermark?
> >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> >>>> maxDesiredWatermark
> >>>> > will
> >>>> > > >> > > >>>>>>> greatly affect
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> throughput
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume.
> Can
> >>>> users
> >>>> > > >> > > configure
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > > >> alignment
> >>>> > > >> > > >>>>>>> > > >> > >> offset?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1]
> >>>> which
> >>>> > has
> >>>> > > >> > been
> >>>> > > >> > > >>>>>>> already
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> implemented
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled
> >>>> with the
> >>>> > > >> > > watermark
> >>>> > > >> > > >>>>>>> strategy
> >>>> > > >> > > >>>>>>> > > >> which
> >>>> > > >> > > >>>>>>> > > >> > >> also configures drift and update
> >>>> interval.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> >>>> interface to
> >>>> > > >> support
> >>>> > > >> > > >>>>>>> align other
> >>>> > > >> > > >>>>>>> > > >> things,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> >>>> > > >> > > >>>>>>> > > >> instead
> >>>> > > >> > > >>>>>>> > > >> > of
> >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the other
> >>>> hand
> >>>> > putting
> >>>> > > >> > all
> >>>> > > >> > > >>>>>>> methods,
> >>>> > > >> > > >>>>>>> > > even
> >>>> > > >> > > >>>>>>> > > >> > with
> >>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling
> >>>> certain
> >>>> > > >> features,
> >>>> > > >> > > >>>>>>> makes the
> >>>> > > >> > > >>>>>>> > > entry
> >>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new source
> >>>> higher.
> >>>> > > >> Instead of
> >>>> > > >> > > >>>>>>> focusing on
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> basic
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> and required properties of the Source,
> >>>> the
> >>>> > person
> >>>> > > >> > > >>>>>>> implementing a
> >>>> > > >> > > >>>>>>> > > >> source
> >>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to figure out
> >>>> what
> >>>> > all
> >>>> > > >> of
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> extra
> >>>> > > >> > > >>>>>>> > > >> features
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> are
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with them. It
> >>>> makes it
> >>>> > also
> >>>> > > >> > > harder
> >>>> > > >> > > >>>>>>> to
> >>>> > > >> > > >>>>>>> > > organize
> >>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> >>>> preference
> >>>> > > >> and I
> >>>> > > >> > > >>>>>>> agree most
> >>>> > > >> > > >>>>>>> > of
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
> >>>> fine
> >>>> > > >> > following
> >>>> > > >> > > >>>>>>> your
> >>>> > > >> > > >>>>>>> > > >> suggestion
> >>>> > > >> > > >>>>>>> > > >> > to
> >>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> >>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> >>>> > > >> > > >>>>>>> > > >> but
> >>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not
> >>>> in the
> >>>> > > >> Source
> >>>> > > >> > to
> >>>> > > >> > > >>>>>>> keep the
> >>>> > > >> > > >>>>>>> > > two
> >>>> > > >> > > >>>>>>> > > >> > >> methods together.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really
> unfortunate
> >>>> the
> >>>> > > >> > > >>>>>>> "alignSplits"
> >>>> > > >> > > >>>>>>> > methods
> >>>> > > >> > > >>>>>>> > > >> > differ
> >>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and
> >>>> SpitReader. The
> >>>> > > >> reason
> >>>> > > >> > > for
> >>>> > > >> > > >>>>>>> that is
> >>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with
> >>>> SplitIds,
> >>>> > whereas
> >>>> > > >> > > >>>>>>> SplitReader
> >>>> > > >> > > >>>>>>> > > needs
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I found
> the
> >>>> > > >> discrepancy
> >>>> > > >> > > >>>>>>> acceptable
> >>>> > > >> > > >>>>>>> > for
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
> >>>> significantly,
> >>>> > > >> especially
> >>>> > > >> > > as
> >>>> > > >> > > >>>>>>> they
> >>>> > > >> > > >>>>>>> > would
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> highly
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we would
> >>>> have to
> >>>> > > >> perform
> >>>> > > >> > > >>>>>>> additional
> >>>> > > >> > > >>>>>>> > > >> > lookups.
> >>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary
> >>>> > interface.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> [1]
> >>>> > https://cwiki.apache.org/confluence/x/hQYBCw
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and
> >>>> Sebastian!
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I just have some minor questions
> (maybe I
> >>>> > missed
> >>>> > > >> > > >>>>>>> something).
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
> >>>> > watermarks
> >>>> > > >> > when
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > source
> >>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
> >>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs
> watermark
> >>>> > > >> alignment
> >>>> > > >> > > even
> >>>> > > >> > > >>>>>>> if Kafka
> >>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
> >>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> >>>> somehow. I
> >>>> > > >> agree
> >>>> > > >> > > >>>>>>> with Becket
> >>>> > > >> > > >>>>>>> > > >> > >> we may need a
> >>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
> >>>> users to
> >>>> > > >> > > configure
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > source
> >>>> > > >> > > >>>>>>> > > >> to
> >>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> >>>> > > >> > > maxDesiredWatermark?
> >>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
> >>>> maxDesiredWatermark
> >>>> > will
> >>>> > > >> > > >>>>>>> greatly affect
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> throughput
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> >>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume.
> Can
> >>>> users
> >>>> > > >> > > configure
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > > >> alignment
> >>>> > > >> > > >>>>>>> > > >> > >> offset?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> >>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align
> other
> >>>> > things in
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> future?
> >>>> > > >> > > >>>>>>> > For
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> example,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> >>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
> >>>> limits. This
> >>>> > > >> > feature
> >>>> > > >> > > >>>>>>> may also
> >>>> > > >> > > >>>>>>> > > need
> >>>> > > >> > > >>>>>>> > > >> an
> >>>> > > >> > > >>>>>>> > > >> > >> interface like this.
> >>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
> >>>> interface to
> >>>> > > >> support
> >>>> > > >> > > >>>>>>> align other
> >>>> > > >> > > >>>>>>> > > >> things,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> >>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> >>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
> >>>> > > >> > > >>>>>>> > > >> instead
> >>>> > > >> > > >>>>>>> > > >> > of
> >>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> >>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this.
> I
> >>>> think
> >>>> > > >> they
> >>>> > > >> > > have
> >>>> > > >> > > >>>>>>> their
> >>>> > > >> > > >>>>>>> > own
> >>>> > > >> > > >>>>>>> > > >> > >> advantages.
> >>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces
> >>>> for
> >>>> > > >> extending
> >>>> > > >> > > >>>>>>> abilities
> >>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for
> TableSource/TableSink,
> >>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
> >>>> methods in
> >>>> > > >> this
> >>>> > > >> > > >>>>>>> case. When
> >>>> > > >> > > >>>>>>> > you
> >>>> > > >> > > >>>>>>> > > >> have
> >>>> > > >> > > >>>>>>> > > >> > a
> >>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> >>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces
> can
> >>>> help
> >>>> > to
> >>>> > > >> > > >>>>>>> organize them
> >>>> > > >> > > >>>>>>> > and
> >>>> > > >> > > >>>>>>> > > >> make
> >>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
> >>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have
> >>>> an
> >>>> > > >> ability.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >> Jark
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket
> Qin
> >>>> <
> >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> >>>> > > >> > > >>>>>>> > <
> >>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <
> becket.qin@gmail.com>
> >>>> <
> >>>> > > >> > > >>>>>>> becket.qin@gmail.com>
> >>>> > > >> > > >>>>>>> > <
> >>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies
> >>>> that I
> >>>> > > >> somehow
> >>>> > > >> > > >>>>>>> misread a
> >>>> > > >> > > >>>>>>> > > bunch
> >>>> > > >> > > >>>>>>> > > >> of
> >>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default implementation,
> >>>> I was
> >>>> > > >> > thinking
> >>>> > > >> > > >>>>>>> of the
> >>>> > > >> > > >>>>>>> > > default
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> no-op
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little worried
> >>>> about
> >>>> > the
> >>>> > > >> > > >>>>>>> proliferation of
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> decorative
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most important
> >>>> thing
> >>>> > about
> >>>> > > >> > > >>>>>>> interfaces is
> >>>> > > >> > > >>>>>>> > > that
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> they
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this case, I
> >>>> prefer
> >>>> > > >> adding
> >>>> > > >> > > >>>>>>> new method
> >>>> > > >> > > >>>>>>> > to
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > >> existing interface for the following
> >>>> reasons:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of
> >>>> decorative
> >>>> > > >> > interfaces
> >>>> > > >> > > >>>>>>> is which
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> interface
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which
> combinations
> >>>> of
> >>>> > > >> multiple
> >>>> > > >> > > >>>>>>> decorative
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> interfaces
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> >>>> > > >> > withSplitsAlignment
> >>>> > > >> > > >>>>>>> interface
> >>>> > > >> > > >>>>>>> > > is
> >>>> > > >> > > >>>>>>> > > >> > only
> >>>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader which
> >>>> means it
> >>>> > > >> can't
> >>>> > > >> > > >>>>>>> decorate any
> >>>> > > >> > > >>>>>>> > > >> other
> >>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface design
> >>>> > perspective, a
> >>>> > > >> > > >>>>>>> natural
> >>>> > > >> > > >>>>>>> > question
> >>>> > > >> > > >>>>>>> > > >> is
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> why
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> >>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> >>>> > > >> > > >>>>>>> > > is
> >>>> > > >> > > >>>>>>> > > >> > also
> >>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split reader
> >>>> > implementing
> >>>> > > >> > both
> >>>> > > >> > > >>>>>>> > SplitReader
> >>>> > > >> > > >>>>>>> > > >> and
> >>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because
> a
> >>>> > source
> >>>> > > >> > reader
> >>>> > > >> > > >>>>>>> > implementing
> >>>> > > >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment
> >>>> works.
> >>>> > So why
> >>>> > > >> > > isn't
> >>>> > > >> > > >>>>>>> there an
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> interface
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future,
> if
> >>>> > there
> >>>> > > >> is a
> >>>> > > >> > > new
> >>>> > > >> > > >>>>>>> feature
> >>>> > > >> > > >>>>>>> > > >> added
> >>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data
> >>>> aware),
> >>>> > are
> >>>> > > >> we
> >>>> > > >> > > >>>>>>> going to
> >>>> > > >> > > >>>>>>> > create
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> another
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
> >>>> > > >> SortedSplitReader or
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think the
> >>>> > additional
> >>>> > > >> > > >>>>>>> decorative
> >>>> > > >> > > >>>>>>> > > interface
> >>>> > > >> > > >>>>>>> > > >> > like
> >>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases
> >>>> the
> >>>> > > >> > > understanding
> >>>> > > >> > > >>>>>>> cost of
> >>>> > > >> > > >>>>>>> > > >> users
> >>>> > > >> > > >>>>>>> > > >> > >> because they have to know what
> decorative
> >>>> > > >> interfaces
> >>>> > > >> > > are
> >>>> > > >> > > >>>>>>> there,
> >>>> > > >> > > >>>>>>> > > which
> >>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and which
> >>>> > > >> combinations of
> >>>> > > >> > > the
> >>>> > > >> > > >>>>>>> > > decorative
> >>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which are not.
> >>>> > Ideally we
> >>>> > > >> > want
> >>>> > > >> > > >>>>>>> to avoid
> >>>> > > >> > > >>>>>>> > > >> that.
> >>>> > > >> > > >>>>>>> > > >> > To
> >>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an
> >>>> > interface of
> >>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> it
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an
> >>>> internal
> >>>> > > >> interface
> >>>> > > >> > > and
> >>>> > > >> > > >>>>>>> let
> >>>> > > >> > > >>>>>>> > > >> > SourceReader
> >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the
> >>>> SourceReader
> >>>> > with a
> >>>> > > >> > > default
> >>>> > > >> > > >>>>>>> > > >> implementation
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> of
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic branching
> >>>> in the
> >>>> > > >> source
> >>>> > > >> > > >>>>>>> logic,
> >>>> > > >> > > >>>>>>> > > >> especially
> >>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast
> >>>> majority of
> >>>> > the
> >>>> > > >> > > >>>>>>> SourceReader
> >>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all, would just
> >>>> extend
> >>>> > > >> from
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> >>>> > > >> > > >>>>>>> > > >> > >> That means adding a new method to the
> >>>> > interface
> >>>> > > >> would
> >>>> > > >> > > >>>>>>> effectively
> >>>> > > >> > > >>>>>>> > > >> give
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> the
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle that may
> be
> >>>> > worth
> >>>> > > >> > > >>>>>>> discussing is how
> >>>> > > >> > > >>>>>>> > > do
> >>>> > > >> > > >>>>>>> > > >> we
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> let
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell Flink
> >>>> what
> >>>> > > >> capability
> >>>> > > >> > > is
> >>>> > > >> > > >>>>>>> > supported
> >>>> > > >> > > >>>>>>> > > >> and
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> what
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the
> >>>> most
> >>>> > > >> intuitive
> >>>> > > >> > > >>>>>>> place to me
> >>>> > > >> > > >>>>>>> > > is
> >>>> > > >> > > >>>>>>> > > >> in
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> the
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is the
> >>>> entrance
> >>>> > of the
> >>>> > > >> > > >>>>>>> entire Source
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> connector
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> logic.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am
> >>>> wondering
> >>>> > if the
> >>>> > > >> > > >>>>>>> following
> >>>> > > >> > > >>>>>>> > > >> interface
> >>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand by the
> >>>> users.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to
> >>>> internal
> >>>> > > >> interface,
> >>>> > > >> > > >>>>>>> let both
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> SourceReader
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a
> >>>> default
> >>>> > > >> no-op
> >>>> > > >> > > >>>>>>> > > implementation.
> >>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> >>>> > > >> supportSplitsAlignment()"
> >>>> > > >> > > to
> >>>> > > >> > > >>>>>>> the
> >>>> > > >> > > >>>>>>> > Source
> >>>> > > >> > > >>>>>>> > > >> > >> interface, with a default
> implementation
> >>>> > returning
> >>>> > > >> > > >>>>>>> false. Sources
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> have
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic can
> >>>> change
> >>>> > this to
> >>>> > > >> > > >>>>>>> return true,
> >>>> > > >> > > >>>>>>> > and
> >>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits() methods in
> the
> >>>> > > >> > SourceReader
> >>>> > > >> > > /
> >>>> > > >> > > >>>>>>> > > SplitReader
> >>>> > > >> > > >>>>>>> > > >> if
> >>>> > > >> > > >>>>>>> > > >> > >> needed.
> >>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new optional
> >>>> feature is
> >>>> > > >> going
> >>>> > > >> > to
> >>>> > > >> > > >>>>>>> be added
> >>>> > > >> > > >>>>>>> > to
> >>>> > > >> > > >>>>>>> > > >> the
> >>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires the
> >>>> > awareness
> >>>> > > >> from
> >>>> > > >> > > >>>>>>> Flink, we
> >>>> > > >> > > >>>>>>> > can
> >>>> > > >> > > >>>>>>> > > >> add
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> more
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> What do you think?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid
> >>>> > Wysakowicz <
> >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> >>>> dwysakowicz@apache.org>
> >>>> > <
> >>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> >>>> > dwysakowicz@apache.org>
> >>>> > > >> <
> >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> >>>> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> >>>> `AlignedSplitReader`
> >>>> > > >> > > interface
> >>>> > > >> > > >>>>>>> (aka the
> >>>> > > >> > > >>>>>>> > > >> stop &
> >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented
> for
> >>>> > Kafka and
> >>>> > > >> > > >>>>>>> Pulsar only,
> >>>> > > >> > > >>>>>>> > > >> correct?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though, those
> >>>> are
> >>>> > the
> >>>> > > >> only
> >>>> > > >> > > >>>>>>> sources
> >>>> > > >> > > >>>>>>> > which
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> consume
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits and
> >>>> thus
> >>>> > > >> alignment
> >>>> > > >> > > >>>>>>> applies.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> alignment
> >>>> in
> >>>> > > >> > > >>>>>>> SourceReaderBase and
> >>>> > > >> > > >>>>>>> > > then
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> doing
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
> >>>> implement
> >>>> > > >> > > >>>>>>> AlignedSplitReader
> >>>> > > >> > > >>>>>>> > > >> could
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> be
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment
> >>>> can
> >>>> > > >> instead
> >>>> > > >> > be
> >>>> > > >> > > >>>>>>> added to
> >>>> > > >> > > >>>>>>> > the
> >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> >>>> > KafkaSourceReader) to
> >>>> > > >> > make
> >>>> > > >> > > >>>>>>> it
> >>>> > > >> > > >>>>>>> > explicit
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think
> >>>> we
> >>>> > could
> >>>> > > >> > > >>>>>>> actually do
> >>>> > > >> > > >>>>>>> > that.
> >>>> > > >> > > >>>>>>> > > >> Given
> >>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
> >>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> >>>> > > >> > > >>>>>>> > > >> rather
> >>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the
> >>>> corresponding
> >>>> > method
> >>>> > > >> of
> >>>> > > >> > > >>>>>>> > SplitFetcher),
> >>>> > > >> > > >>>>>>> > > >> we
> >>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the actual
> source
> >>>> > > >> > > >>>>>>> implementations. This
> >>>> > > >> > > >>>>>>> > > >> solution
> >>>> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors of new
> >>>> > sources
> >>>> > > >> would
> >>>> > > >> > > >>>>>>> have to do
> >>>> > > >> > > >>>>>>> > > two
> >>>> > > >> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader
> >>>> and
> >>>> > > >> implement
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
> >>>> AlignedSplitReader.
> >>>> > I
> >>>> > > >> would
> >>>> > > >> > > be
> >>>> > > >> > > >>>>>>> fine
> >>>> > > >> > > >>>>>>> > with
> >>>> > > >> > > >>>>>>> > > >> such
> >>>> > > >> > > >>>>>>> > > >> > a
> >>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Steven:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> >>>> section, is
> >>>> > it
> >>>> > > >> > > >>>>>>> accurate? Let's
> >>>> > > >> > > >>>>>>> > > >> assume
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> one
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions
> >>>> and
> >>>> > one of
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> partition
> >>>> > > >> > > >>>>>>> > is
> >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> situation,
> >>>> > watermark
> >>>> > > >> > for
> >>>> > > >> > > >>>>>>> this
> >>>> > > >> > > >>>>>>> > source
> >>>> > > >> > > >>>>>>> > > >> task
> >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
> >>>> data
> >>>> > from
> >>>> > > >> > other
> >>>> > > >> > > >>>>>>> two Kafka
> >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold
> >>>> back
> >>>> > the
> >>>> > > >> > overall
> >>>> > > >> > > >>>>>>> > watermark.
> >>>> > > >> > > >>>>>>> > > I
> >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
> >>>> late
> >>>> > data
> >>>> > > >> for
> >>>> > > >> > > >>>>>>> this slow
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> partition.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark.
> >>>> Watermark of
> >>>> > an
> >>>> > > >> > > >>>>>>> operator is the
> >>>> > > >> > > >>>>>>> > > >> minimum
> >>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> >>>> restart. Say
> >>>> > > >> split
> >>>> > > >> > > >>>>>>> alignment is
> >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
> >>>> > failed and
> >>>> > > >> > > >>>>>>> restored from
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > >> > last
> >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision
> >>>> is not
> >>>> > > >> > > >>>>>>> checkpointed,
> >>>> > > >> > > >>>>>>> > > initially
> >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we
> get
> >>>> a
> >>>> > cycle
> >>>> > > >> of
> >>>> > > >> > > >>>>>>> watermark
> >>>> > > >> > > >>>>>>> > > >> > aggregation
> >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
> >>>> > corner is
> >>>> > > >> a
> >>>> > > >> > > >>>>>>> problem. Just
> >>>> > > >> > > >>>>>>> > > >> want
> >>>> > > >> > > >>>>>>> > > >> > to
> >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> @Becket:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort
> >>>> of a
> >>>> > > >> general
> >>>> > > >> > use
> >>>> > > >> > > >>>>>>> case, so
> >>>> > > >> > > >>>>>>> > > >> should
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> we
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> >>>> SourceReader
> >>>> > > >> directly
> >>>> > > >> > > >>>>>>> instead of
> >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> >>>> > > >> WithSplitAssignment?
> >>>> > > >> > > We
> >>>> > > >> > > >>>>>>> can
> >>>> > > >> > > >>>>>>> > > provide
> >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> >>>> > > >> compatibility
> >>>> > > >> > > >>>>>>> won't be an
> >>>> > > >> > > >>>>>>> > > >> issue.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a default
> >>>> > > >> > implementation.
> >>>> > > >> > > >>>>>>> How would
> >>>> > > >> > > >>>>>>> > we
> >>>> > > >> > > >>>>>>> > > >> do
> >>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it
> >>>> better
> >>>> > than
> >>>> > > >> > > having
> >>>> > > >> > > >>>>>>> an
> >>>> > > >> > > >>>>>>> > opt-in
> >>>> > > >> > > >>>>>>> > > >> > >> interface? The default implementation
> >>>> would
> >>>> > have
> >>>> > > >> to
> >>>> > > >> > be
> >>>> > > >> > > >>>>>>> added
> >>>> > > >> > > >>>>>>> > > >> exclusively
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> in
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By
> >>>> the way
> >>>> > > >> notice
> >>>> > > >> > > >>>>>>> > > SourceReaderBase
> >>>> > > >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment,
> so
> >>>> > > >> effectively
> >>>> > > >> > > all
> >>>> > > >> > > >>>>>>> > > >> implementations
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> do
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be
> honest I
> >>>> > think
> >>>> > > >> it is
> >>>> > > >> > > >>>>>>> impossible
> >>>> > > >> > > >>>>>>> > to
> >>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader interface
> >>>> directly
> >>>> > by
> >>>> > > >> end
> >>>> > > >> > > >>>>>>> users.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
> >>>> interface
> >>>> > > >> > probably
> >>>> > > >> > > >>>>>>> also
> >>>> > > >> > > >>>>>>> > needs
> >>>> > > >> > > >>>>>>> > > >> some
> >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the
> split
> >>>> > > >> > granularity.
> >>>> > > >> > > >>>>>>> Can you
> >>>> > > >> > > >>>>>>> > add
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> >>>> interface
> >>>> > > >> section as
> >>>> > > >> > > >>>>>>> well?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> It has been added from the beginning.
> See
> >>>> > > >> > > >>>>>>> *AlignedSplitReader.*
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method
> >>>> name
> >>>> > > >> > assignSplits
> >>>> > > >> > > >>>>>>> here,
> >>>> > > >> > > >>>>>>> > given
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> it
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> >>>> > assignments? It
> >>>> > > >> > > seems
> >>>> > > >> > > >>>>>>> > something
> >>>> > > >> > > >>>>>>> > > >> like
> >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> >>>> > > >> adjustSplitsThrottling() is
> >>>> > > >> > > >>>>>>> more
> >>>> > > >> > > >>>>>>> > > accurate.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not
> >>>> > assign. Do
> >>>> > > >> you
> >>>> > > >> > > >>>>>>> still
> >>>> > > >> > > >>>>>>> > prefer
> >>>> > > >> > > >>>>>>> > > a
> >>>> > > >> > > >>>>>>> > > >> > >> different name for that? Personally, I
> am
> >>>> > open for
> >>>> > > >> > > >>>>>>> suggestions
> >>>> > > >> > > >>>>>>> > > here.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> [1]
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > >
> >>>> > > >> >
> >>>> > > >>
> >>>> >
> >>>>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort,
> >>>> Sebastion. I
> >>>> > think
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> motivation
> >>>> > > >> > > >>>>>>> > > >> makes a
> >>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions /
> >>>> > questions.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort
> >>>> of a
> >>>> > > >> general
> >>>> > > >> > use
> >>>> > > >> > > >>>>>>> case, so
> >>>> > > >> > > >>>>>>> > > >> should
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> we
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
> >>>> SourceReader
> >>>> > > >> directly
> >>>> > > >> > > >>>>>>> instead of
> >>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> >>>> > > >> WithSplitAssignment?
> >>>> > > >> > > We
> >>>> > > >> > > >>>>>>> can
> >>>> > > >> > > >>>>>>> > > provide
> >>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> >>>> > > >> compatibility
> >>>> > > >> > > >>>>>>> won't be an
> >>>> > > >> > > >>>>>>> > > >> issue.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
> >>>> interface
> >>>> > > >> > probably
> >>>> > > >> > > >>>>>>> also
> >>>> > > >> > > >>>>>>> > needs
> >>>> > > >> > > >>>>>>> > > >> some
> >>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the
> split
> >>>> > > >> > granularity.
> >>>> > > >> > > >>>>>>> Can you
> >>>> > > >> > > >>>>>>> > add
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
> >>>> interface
> >>>> > > >> section as
> >>>> > > >> > > >>>>>>> well?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method
> >>>> name
> >>>> > > >> > assignSplits
> >>>> > > >> > > >>>>>>> here,
> >>>> > > >> > > >>>>>>> > given
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> it
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> >>>> > assignments? It
> >>>> > > >> > > seems
> >>>> > > >> > > >>>>>>> > something
> >>>> > > >> > > >>>>>>> > > >> like
> >>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> >>>> > > >> adjustSplitsThrottling() is
> >>>> > > >> > > >>>>>>> more
> >>>> > > >> > > >>>>>>> > > accurate.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven
> >>>> Wu <
> >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> > > <
> >>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <
> stevenz3wu@gmail.com>
> >>>> <
> >>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> >>>> > > >> > > >>>>>>> > <
> >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> >>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> However, a single source operator may
> >>>> read
> >>>> > data
> >>>> > > >> from
> >>>> > > >> > > >>>>>>> multiple
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
> >>>> > > >> partitions,
> >>>> > > >> > > such
> >>>> > > >> > > >>>>>>> that
> >>>> > > >> > > >>>>>>> > even
> >>>> > > >> > > >>>>>>> > > >> with
> >>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source operator
> >>>> may
> >>>> > need
> >>>> > > >> to
> >>>> > > >> > > >>>>>>> buffer
> >>>> > > >> > > >>>>>>> > > excessive
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> amount
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data faster
> >>>> than
> >>>> > > >> another.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
> >>>> section, is
> >>>> > it
> >>>> > > >> > > >>>>>>> accurate? Let's
> >>>> > > >> > > >>>>>>> > > >> assume
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> one
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions
> >>>> and
> >>>> > one of
> >>>> > > >> the
> >>>> > > >> > > >>>>>>> partition
> >>>> > > >> > > >>>>>>> > is
> >>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this
> situation,
> >>>> > watermark
> >>>> > > >> > for
> >>>> > > >> > > >>>>>>> this
> >>>> > > >> > > >>>>>>> > source
> >>>> > > >> > > >>>>>>> > > >> task
> >>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
> >>>> data
> >>>> > from
> >>>> > > >> > other
> >>>> > > >> > > >>>>>>> two Kafka
> >>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold
> >>>> back
> >>>> > the
> >>>> > > >> > overall
> >>>> > > >> > > >>>>>>> > watermark.
> >>>> > > >> > > >>>>>>> > > I
> >>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
> >>>> late
> >>>> > data
> >>>> > > >> for
> >>>> > > >> > > >>>>>>> this slow
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> partition.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
> >>>> restart. Say
> >>>> > > >> split
> >>>> > > >> > > >>>>>>> alignment is
> >>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
> >>>> > failed and
> >>>> > > >> > > >>>>>>> restored from
> >>>> > > >> > > >>>>>>> > > the
> >>>> > > >> > > >>>>>>> > > >> > last
> >>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision
> >>>> is not
> >>>> > > >> > > >>>>>>> checkpointed,
> >>>> > > >> > > >>>>>>> > > initially
> >>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we
> get
> >>>> a
> >>>> > cycle
> >>>> > > >> of
> >>>> > > >> > > >>>>>>> watermark
> >>>> > > >> > > >>>>>>> > > >> > aggregation
> >>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
> >>>> > corner is
> >>>> > > >> a
> >>>> > > >> > > >>>>>>> problem. Just
> >>>> > > >> > > >>>>>>> > > >> want
> >>>> > > >> > > >>>>>>> > > >> > to
> >>>> > > >> > > >>>>>>> > > >> > >> understand it more.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas
> >>>> Weise <
> >>>> > > >> > > >>>>>>> thw@apache.org> <
> >>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
> >>>> thw@apache.org>
> >>>> > <
> >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> >>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split
> alignment
> >>>> in
> >>>> > > >> > > >>>>>>> SourceReaderBase and
> >>>> > > >> > > >>>>>>> > > then
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> doing
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
> >>>> implement
> >>>> > > >> > > >>>>>>> AlignedSplitReader
> >>>> > > >> > > >>>>>>> > > >> could
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> be
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment
> >>>> can
> >>>> > > >> instead
> >>>> > > >> > be
> >>>> > > >> > > >>>>>>> added to
> >>>> > > >> > > >>>>>>> > the
> >>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> >>>> > KafkaSourceReader) to
> >>>> > > >> > make
> >>>> > > >> > > >>>>>>> it
> >>>> > > >> > > >>>>>>> > explicit
> >>>> > > >> > > >>>>>>> > > >> that
> >>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Thanks,
> >>>> > > >> > > >>>>>>> > > >> > >> Thomas
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM
> >>>> Konstantin
> >>>> > Knauf <
> >>>> > > >> > > >>>>>>> > > knaufk@apache.org>
> >>>> > > >> > > >>>>>>> > > >> <
> >>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
> >>>> > > >> > knaufk@apache.org
> >>>> > > >> > > >
> >>>> > > >> > > >>>>>>> <
> >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> >>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> >>>> `AlignedSplitReader`
> >>>> > > >> > > interface
> >>>> > > >> > > >>>>>>> (aka the
> >>>> > > >> > > >>>>>>> > > >> stop
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> &
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented
> for
> >>>> > Kafka and
> >>>> > > >> > > >>>>>>> Pulsar only,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> correct?
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable
> >>>> to
> >>>> > > >> complete
> >>>> > > >> > the
> >>>> > > >> > > >>>>>>> watermark
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> aligned
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Cheers,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Konstantin
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid
> >>>> > Wysakowicz
> >>>> > > >> <
> >>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> >>>> dwysakowicz@apache.org>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I
> >>>> > support it
> >>>> > > >> ;)
> >>>> > > >> > I
> >>>> > > >> > > >>>>>>> think we
> >>>> > > >> > > >>>>>>> > can
> >>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there
> >>>> are no
> >>>> > > >> concerns
> >>>> > > >> > > so
> >>>> > > >> > > >>>>>>> far.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Dawid
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis
> >>>> wrote:
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion on
> >>>> FLIP 217
> >>>> > [1]
> >>>> > > >> for
> >>>> > > >> > > an
> >>>> > > >> > > >>>>>>> > extension
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> of
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform
> alignment
> >>>> also
> >>>> > in
> >>>> > > >> > > >>>>>>> SplitReaders. To
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> do
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> so,
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend
> and
> >>>> > resume
> >>>> > > >> > reading
> >>>> > > >> > > >>>>>>> from
> >>>> > > >> > > >>>>>>> > split
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> sources
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates
> and
> >>>> > controlls
> >>>> > > >> > > >>>>>>> suspend and
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> resume.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> To
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> gather information about current
> >>>> watermarks
> >>>> > of the
> >>>> > > >> > > >>>>>>> SplitReaders,
> >>>> > > >> > > >>>>>>> > we
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> extend
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer
> >>>> and
> >>>> > report
> >>>> > > >> > > >>>>>>> watermarks to
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> the
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2],
> >>>> prototyped
> >>>> > by
> >>>> > > >> Arvid
> >>>> > > >> > > >>>>>>> Heise and
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> revised
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> and
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did
> >>>> most of
> >>>> > the
> >>>> > > >> > work.)
> >>>> > > >> > > >>>>>>> and me.
> >>>> > > >> > > >>>>>>> > The
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> changes
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way that
> if
> >>>> > affected
> >>>> > > >> > > >>>>>>> components do
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> not
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> support split alignment the behavior is
> >>>> as
> >>>> > before.
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Best,
> >>>> > > >> > > >>>>>>> > > >> > >> Sebastian
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> [1]
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > >
> >>>> > > >> >
> >>>> > > >>
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> [2]
> >>>> > > >> > > >>>>>>>
> >>>> https://github.com/dawidwys/flink/tree/aligned-splits
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> --
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> >>>> > > >> > > >>>>>>> > > >>
> >>>> twitter.com/snntrablehttps://github.com/knaufk
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> > >>
> >>>> > > >> > > >>>>>>> > > >> >
> >>>> > > >> > > >>>>>>> > > >>
> >>>> > > >> > > >>>>>>> > > >
> >>>> > > >> > > >>>>>>> > >
> >>>> > > >> > > >>>>>>> >
> >>>> > > >> > > >>>>>>>
> >>>> > > >> > > >>>>>>
> >>>> > > >> > >
> >>>> > > >> >
> >>>> > > >>
> >>>> > > >
> >>>> >
> >>>>
> >>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Hi Sebastian,

Regarding the question,

>
> @Becket: I'm not sure about the intention of solution 1. Can you explain
> that a bit more? In particular, I don't understand: "The reasoning behind
> this solution is that existing users should only use the
> coarse watermark alignment when a source reader only reads from a single
> split." Why should a user not use coarse watermark alignment when source
> reader reads from multiple splits? The split alignment uses the "coarse
> watermark", i.e., maxDesiredWatermark, as described in the FLIP for
> alignment.


Imagine you have a source reader reading from two splits, and the
watermarks look like the following:
1. Watermark of Split 1: 10:00 AM Jun 29,
2. Watermark of Split 2: 11:00 AM Jun 29
3. maxDesiredWatermark:10:30 AM Jun 29

At this point, the source reader's watermark is 10:00 AM which is lower
than the maxDesiredWatermark, so the source reader won't be paused from
reading. However, because the source reader cannot specify which split to
read from, if it continues to read, the watermark gap between the two
splits may become even bigger. This essentially fails the main purpose of
watermark alignment - to reduce the number of records buffered in the
state. This does not necessarily happen, but this is not what FLIP-182 was
designed for to begin with. So I'd rather avoid extending the feature
to that case.

Thanks,

Jiangjie (Becket) Qin

On Tue, Jun 28, 2022 at 6:53 PM Sebastian Mattheis <se...@ververica.com>
wrote:

> (Sorry I didn't send to the list but only to Becket. My bad and thanks
> Piotr. Next attempt:)
>
> Hi all,
>
> Thanks for pushing the FLIP. I would drive it and would be happy to get
> back to you, @Thomas, for reviews. (Sorry for low responsiveness, there
> were several efforts with high priority on my side ...) As next step, I
> would revise the FLIP to get the discussion concluded.
>
> However, as Becket mentioned I feel that some things are still not clear
> yet:
>
> Re: Thomas
>>
>> However, from a user perspective, should the split level alignment be
>>> an opt-in feature, at least for a few releases? If yes, then we would
>>> require a knob similar to supportsPausingSplits(), which I understand
>>> won't be part of the revised FLIP. Such control may be beneficial:
>>
>>
>>> * Compare runtime behavior with split level alignment on/off
>>> * Allow use of sources that don't implement pausing splits yet
>>
>>
>>> The second point would, from the user's perspective, be necessary for
>>> backward compatibility? While the interface aspect and source
>>> compatibility has been discussed in great detail, I don't think it
>>> would be desirable if an application that already uses alignment fails
>>> after upgrading to the new Flink version, forcing users to lock step
>>> modify sources for the new non-optional split level alignment.
>>
>>
>>> So I think clarification of the compatibility aspect on the FLIP page
>>> would be necessary.
>>
>>
>> That is a good point. Currently FLIP-182 is marked as experimental. So
>> technically speaking it could change. That said, I agree that it would be
>> good to not break the existing sources.
>>
>> My main concern of having a supportsPausingSplits() knob is that this
>> might be something requiring code change on future derepcation. I am
>> thinking of two potential solutions:
>>
>> Solution 1:
>> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
>> source reader only has one split assigned, the source reader simply stops
>> polling but just returns NOTHING_AVAILABLE. If there are more than one
>> splits assigned, it throws an exception with a message such as "The
>> unpausable SplitReader CLASS_NAME only works with watermark alignment
>> when assigned a single split. There are more than one split assigned to the
>> SplitReader".
>> The reasoning behind this solution is that existing users should only use
>> the coarse watermark alignment when a source reader only reads from a
>> single split. Reading from more than one split might have unwanted side
>> effects, so we might as well throw an exception in this case.
>>
>> Solution 2:
>> Having a configuration "enable.coarse-grained.watermark.alignment", the
>> default value is false. Once it is set to true, we will allow
>> coarse-grained watermark alignment if a SplitReader is pausable.
>> This solution allows users to keep the current FLIP-182 behavior, with
>> the risk of side effects.
>>
>> Personally speaking, I feel solution 1 seems better because
>> coarse-grained watermark alignment could be frustrating to the users
>> when more than one split is assigned. So we might as well not support it at
>> all. And also there is nothing to deprecate in the future with this
>> solution.
>>
>> What do you think?
>>
>
> @Thomas: My understanding is that you intend a simple switch/knob to test
> w/ and w/o (split) watermark alignment, right? Isn't the coarse-grained w/
> vs w/o watermark alignment sufficient for that? Or do you think that
> switching watermark aligment explicitly on split level is required?
>
> @Becket: I'm not sure about the intention of solution 1. Can you explain
> that a bit more? In particular, I don't understand: "The reasoning behind
> this solution is that existing users should only use the coarse watermark
> alignment when a source reader only reads from a single split." Why
> should a user not use coarse watermark alignment when source reader reads
> from multiple splits? The split alignment uses the "coarse watermark",
> i.e., maxDesiredWatermark, as described in the FLIP for alignment.
>
> Could you please clarify?
>
> Regards,
> Sebastian
>
> On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <be...@gmail.com> wrote:
>
>> Thanks for the feedback, Thomas and Steve. And thanks Piotr for the
>> patient and detailed discussion.
>>
>> Let's move forward with option 1 then.
>>
>> Re: Thomas
>>
>> However, from a user perspective, should the split level alignment be
>>> an opt-in feature, at least for a few releases? If yes, then we would
>>> require a knob similar to supportsPausingSplits(), which I understand
>>> won't be part of the revised FLIP. Such control may be beneficial:
>>
>>
>>> * Compare runtime behavior with split level alignment on/off
>>> * Allow use of sources that don't implement pausing splits yet
>>
>>
>>> The second point would, from the user's perspective, be necessary for
>>> backward compatibility? While the interface aspect and source
>>> compatibility has been discussed in great detail, I don't think it
>>> would be desirable if an application that already uses alignment fails
>>> after upgrading to the new Flink version, forcing users to lock step
>>> modify sources for the new non-optional split level alignment.
>>
>>
>>> So I think clarification of the compatibility aspect on the FLIP page
>>> would be necessary.
>>
>>
>> That is a good point. Currently FLIP-182 is marked as experimental. So
>> technically speaking it could change. That said, I agree that it would be
>> good to not break the existing sources.
>>
>> My main concern of having a supportsPausingSplits() knob is that this
>> might be something requiring code change on future derepcation. I am
>> thinking of two potential solutions:
>>
>> Solution 1:
>> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
>> source reader only has one split assigned, the source reader simply stops
>> polling but just returns NOTHING_AVAILABLE. If there are more than one
>> splits assigned, it throws an exception with a message such as "The
>> unpausable SplitReader CLASS_NAME only works with watermark alignment when
>> assigned a single split. There are more than one split assigned to the
>> SplitReader".
>> The reasoning behind this solution is that existing users should only use
>> the coarse watermark alignment when a source reader only reads from a
>> single split. Reading from more than one split might have unwanted side
>> effects, so we might as well throw an exception in this case.
>>
>> Solution 2:
>> Having a configuration "enable.coarse-grained.watermark.alignment", the
>> default value is false. Once it is set to true, we will allow
>> coarse-grained watermark alignment if a SplitReader is pausable.
>> This solution allows users to keep the current FLIP-182 behavior, with
>> the risk of side effects.
>>
>> Personally speaking, I feel solution 1 seems better because
>> coarse-grained watermark alignment could be frustrating to the users when
>> more than one split is assigned. So we might as well not support it at all.
>> And also there is nothing to deprecate in the future with this solution.
>>
>> What do you think?
>>
>> Thank,
>>
>> Jiangjie (Becket) Qin
>>
>>
>> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <pn...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> It looks like option 1 wins overall? So let's go with that.
>>>
>>> Best,
>>> Piotrek
>>>
>>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com> napisał(a):
>>>
>>>> Both option 1 (default impl in base interface) and option 2 (decorative
>>>> interface) are pretty common patterns. I would also be fine with either.
>>>> The important thing is that an exception is thrown if a source doesn't
>>>> support the alignment capability.
>>>>
>>>> The other point is that we can validate the source capability if
>>>> alignment
>>>> is enabled in WatermarkStrategy. I believe either option can achieve
>>>> this
>>>> goal too.
>>>> public interface WatermarkStrategy<T> {
>>>>     WatermarkStrategy<T> withWatermarkAlignment(String watermarkGroup,
>>>> Duration maxAllowedWatermarkDrift);
>>>> }
>>>>
>>>> If I have to pick one, I am slightly favoring option 1 (base
>>>> interface). As
>>>> watermark is already an essential concept of source, maybe watermark
>>>> alignment capability can also be a property of the base source/reader
>>>> interface.
>>>>
>>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org> wrote:
>>>>
>>>> > Hi everyone,
>>>> >
>>>> > Thank you for all the effort that went into this discussion. The split
>>>> > level watermark alignment will be an important feature for Flink that
>>>> > will address operational problems for various use cases. From reading
>>>> > through this thread it appears that not too much remains to bring this
>>>> > FLIP to acceptance and allow development to move forward. I would like
>>>> > to contribute if possible.
>>>> >
>>>> > Regarding option 1 vs. option 2: I don't have a strong preference,
>>>> > perhaps slightly leaning towards option 1.
>>>> >
>>>> > However, from a user perspective, should the split level alignment be
>>>> > an opt-in feature, at least for a few releases? If yes, then we would
>>>> > require a knob similar to supportsPausingSplits(), which I understand
>>>> > won't be part of the revised FLIP. Such control may be beneficial:
>>>> >
>>>> > * Compare runtime behavior with split level alignment on/off
>>>> > * Allow use of sources that don't implement pausing splits yet
>>>> >
>>>> > The second point would, from the user's perspective, be necessary for
>>>> > backward compatibility? While the interface aspect and source
>>>> > compatibility has been discussed in great detail, I don't think it
>>>> > would be desirable if an application that already uses alignment fails
>>>> > after upgrading to the new Flink version, forcing users to lock step
>>>> > modify sources for the new non-optional split level alignment.
>>>> >
>>>> > So I think clarification of the compatibility aspect on the FLIP page
>>>> > would be necessary.
>>>> >
>>>> > Thanks,
>>>> > Thomas
>>>> >
>>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
>>>> piotr.nowojski@gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > Hi Becket,
>>>> > >
>>>> > > Thanks for summing this up. Just one correction:
>>>> > >
>>>> > > > Piotr prefers option 2, his opinions are:
>>>> > > >   e) It is OK that the code itself in option 2 indicates the
>>>> developers
>>>> > > that a feature is optional. We will rely on the documentation to
>>>> correct
>>>> > > that and clarify that the feature is actually obligatory.
>>>> > >
>>>> > > I would say based on a) and b) that feature would be still
>>>> optional. So
>>>> > > both the implementation and the documentation would be saying that.
>>>> We
>>>> > > could add a mention to the docs and release notes, that this
>>>> feature will
>>>> > > be obligatory in the next major release and plan such a release
>>>> > accordingly.
>>>> > >
>>>> > > Re the option 1., as you mentioned:
>>>> > > > As for option 1: For developers, the feature is still optional
>>>> due to
>>>> > the
>>>> > > default implementation in the interface, regardless of what the
>>>> default
>>>> > > implementation does, because the code compiles without overriding
>>>> these
>>>> > > methods
>>>> > >
>>>> > > Also importantly, the code will work in most cases.
>>>> > >
>>>> > > > Obligatory: Jobs may fail if these methods are not implemented
>>>> > properly.
>>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in
>>>> > Java,
>>>> > > e.g. Iterator.remove() by default throws
>>>> "UnsupportedOperationException",
>>>> > > informing the implementation that things may go wrong if this
>>>> method is
>>>> > not
>>>> > > implemented.
>>>> > >
>>>> > > For me `Iterator#remove()` is an optional feature. Personally, I
>>>> don't
>>>> > > remember if I have ever implemented it.
>>>> > >
>>>> > > Best,
>>>> > > Piotrek
>>>> > >
>>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com>
>>>> napisał(a):
>>>> > >
>>>> > > > I had an offline discussion with Piotr and here is the summary.
>>>> Please
>>>> > > > correct me if I miss something, Piotr.
>>>> > > >
>>>> > > > There are two things we would like to seek more opinions from the
>>>> > > > community, so we can make progress on this FLIP.
>>>> > > >
>>>> > > > 1. The General pattern to add obligatory features to existing
>>>> > interfaces.
>>>> > > >
>>>> > > >
>>>> >
>>>> ***********************************************************************************
>>>> > > > For interfaces exposed to the developers for implementation, they
>>>> are
>>>> > > > either intended to be *optional* or *obligatory. *While it is
>>>> quite
>>>> > clear
>>>> > > > about how to convey that intention when creating the interfaces,
>>>> it is
>>>> > not
>>>> > > > as commonly agreed when we are adding new features to an existing
>>>> > > > interface. In general, Flink uses decorative interfaces when
>>>> adding
>>>> > > > optional features to existing interfaces. Both Piotr and I agree
>>>> that
>>>> > looks
>>>> > > > good.
>>>> > > >
>>>> > > > Different opinions are mainly about how to add obligatory
>>>> features to
>>>> > the
>>>> > > > existing interfaces, probably due to different understandings of
>>>> > > > "obligatory".
>>>> > > >
>>>> > > > We have discussed about four options:
>>>> > > >
>>>> > > > *Option 1:*
>>>> > > >
>>>> > > >    - Just add a new method to the existing interface.
>>>> > > >    - For backwards compatibility, the method would have a default
>>>> > > >    implementation throwing "UnsupportedOperationException".
>>>> > > >    - In the next major version, remove the default implementation.
>>>> > > >    - For the developers, any method with a default implementation
>>>> > > >    throwing an "UnsupportedOperationException" should be taken as
>>>> > obligatory.
>>>> > > >
>>>> > > > *Option 2:*
>>>> > > >
>>>> > > >    - Always make the features optional by adding a decorative
>>>> > interface,
>>>> > > >    just like ordinary optional features.
>>>> > > >    - Inform the developers via documentation that this feature is
>>>> > > >    obligatory, although it looks like optional from the code.
>>>> > > >    - In case the developers did not implement the decorative
>>>> interface,
>>>> > > >    throw an exception
>>>> > > >    - In the next major version, move the methods in the decorative
>>>> > > >    interface to the base interface, and deprecate the decorative
>>>> > interface.
>>>> > > >
>>>> > > > *Option 3:*
>>>> > > >
>>>> > > >    - Always bump the major version when a new obligatory feature
>>>> is
>>>> > > >    added, even if we may have to do it frequently.
>>>> > > >
>>>> > > > *Option 4:*
>>>> > > >
>>>> > > >    - Add a V2, V3... of the interface affected by the new
>>>> obligatory
>>>> > > >    feature.
>>>> > > >    - In the next major versions, deprecate old versions of the
>>>> > interfaces.
>>>> > > >
>>>> > > > Both Piotr and me agreed that option 3 and option 4 have a big
>>>> side
>>>> > effect
>>>> > > > and should be avoided. We have different preference between
>>>> option 1
>>>> > and
>>>> > > > option 2.
>>>> > > >
>>>> > > > Personally I prefer option 1, the reasons are:
>>>> > > >   a) simple and intuitive. Java 8 introduced the default impl in
>>>> > > > interfaces exactly for interface evolving, and this is a common
>>>> > pattern in
>>>> > > > many projects.
>>>> > > >   b) prominent to the developers that the feature is expected to
>>>> be
>>>> > > > implemented, because it explicitly throws an exception in the
>>>> default
>>>> > impl.
>>>> > > >   c) low maintenance overhead - the Flink framework can always
>>>> assume
>>>> > the
>>>> > > > method exists, so no special handling logic is needed.
>>>> > > >   d) communicate a clear semantic boundary between optional and
>>>> > obligatory
>>>> > > > features in the Flink to the developers.
>>>> > > >       - Optional: Jobs still run without exception if these
>>>> methods are
>>>> > > > not implemented. e.g. all the SupportsXXXPushDown interfaces.
>>>> > > >       - Obligatory: Jobs may fail if these methods are not
>>>> implemented
>>>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a
>>>> common
>>>> > pattern
>>>> > > > in Java, e.g. Iterator.remove() by default throws
>>>> > > > "UnsupportedOperationException", informing the implementation that
>>>> > things
>>>> > > > may go wrong if this method is not implemented.
>>>> > > >
>>>> > > > As for option 2, Although the API itself sounds clean, it misleads
>>>> > people
>>>> > > > to think of an obligatory feature to be optional - from the code
>>>> the
>>>> > > > feature is optional, but the documents say it is obligatory. We
>>>> > probably
>>>> > > > should avoid such code-doc inconsistency, as people will be
>>>> confused.
>>>> > And I
>>>> > > > would actually be bewildered that sometimes not implementing an
>>>> > "optional"
>>>> > > > feature is fine, but sometimes it causes the jobs to fail.
>>>> > > >
>>>> > > > In response to the argument that the method with a default
>>>> > implementation
>>>> > > > is always optional, if that is true, it actually means all the
>>>> > interfaces
>>>> > > > should be immutable once they are created. If we want to add a
>>>> method
>>>> > to an
>>>> > > > existing interface, for backwards compatibility, we will have to
>>>> > provide a
>>>> > > > default implementation. And the fact it has a default
>>>> implementation
>>>> > > > indicates the method is optional. If that method is optional, it
>>>> should
>>>> > > > reside in a separate decorative interface, otherwise it clogs that
>>>> > existing
>>>> > > > interface. Therefore, people should never add a method to an
>>>> existing
>>>> > > > interface. I find this conclusion a bit extreme.
>>>> > > >
>>>> > > > Piotr prefers option 2, his opinions are:
>>>> > > >     a) Obligatory methods are the methods that fail the code
>>>> > compilation
>>>> > > > if not implemented.
>>>> > > >     b) All obligatory methods should reside in the base interface,
>>>> > without
>>>> > > > a default implementation. And all the optional methods should be
>>>> in
>>>> > > > decorative interfaces. This is a clean API.
>>>> > > >     c) due to b), there isn't a viable solution to add an
>>>> obligatory
>>>> > > > method to an existing interface in a backwards compatible way.
>>>> Unless
>>>> > we
>>>> > > > are OK with breaking backwards compatibility, all the interfaces
>>>> > should be
>>>> > > > treated as immutable. As a compromise, we might as well just
>>>> treat all
>>>> > the
>>>> > > > features added later as optional features. This way we keep the
>>>> API
>>>> > clean.
>>>> > > >     d) based on b) and c), option 2 has a clean API, while option
>>>> 1
>>>> > does
>>>> > > > not.
>>>> > > >     e) It is OK that the code itself in option 2 indicates the
>>>> > developers
>>>> > > > that a feature is optional. We will rely on the documentation to
>>>> > correct
>>>> > > > that and clarify that the feature is actually obligatory.
>>>> > > >     f) Regarding the effectiveness of making people aware that the
>>>> > feature
>>>> > > > is obligatory, Option 1 and Option 2 are similar. For people that
>>>> do
>>>> > not
>>>> > > > read the release note / documentation, they will mistake the
>>>> feature
>>>> > to be
>>>> > > > optional anyways.
>>>> > > >
>>>> > > > As for option 1: For developers, the feature is still optional
>>>> due to
>>>> > the
>>>> > > > default implementation in the interface, regardless of what the
>>>> default
>>>> > > > implementation does, because the code compiles without overriding
>>>> these
>>>> > > > methods. Also, another problem of this option is that for users
>>>> that
>>>> > do not
>>>> > > > know about the history of the interface, they may be confused by
>>>> the
>>>> > > > default implementation throwing an exception.
>>>> > > >
>>>> > > >
>>>> > > > 2. For this particular FLIP, should it be optional or not?
>>>> > > >
>>>> > > >
>>>> >
>>>> ***********************************************************************************
>>>> > > > As mentioned in the previous email, I feel this FLIP should be
>>>> > obligatory,
>>>> > > > for the following reasons:
>>>> > > > 1. The Flink framework exposes the watermark alignment API to the
>>>> end
>>>> > > > users. From the end users' perspective, the feature should be
>>>> available
>>>> > > > regardless of the implementation details in the pluggables. This
>>>> is
>>>> > true
>>>> > > > for any other methods exposed as the Flink API.
>>>> > > > 2. If a Source is not pausable, the end user should receive an
>>>> > exception
>>>> > > > when enable the watermark alignment, (both Piotr and me agree on
>>>> > this). In
>>>> > > > that case, it meets my criteria of obligatory feature because not
>>>> > > > implementing the feature causes a framework API to throw
>>>> exception and
>>>> > > > fails the job.
>>>> > > >
>>>> > > > On the other hand, Piotr does not have a strong opinion regarding
>>>> > whether
>>>> > > > this feature should be optional or not.
>>>> > > >
>>>> > > >
>>>> > > > Thanks for reading through this long email. So basically in order
>>>> to
>>>> > make
>>>> > > > progress on this FLIP, we want to see what do people feel about
>>>> the
>>>> > above
>>>> > > > two topics.
>>>> > > >
>>>> > > > Thanks,
>>>> > > >
>>>> > > > Jiangjie (Becket) Qin
>>>> > > >
>>>> > > >
>>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
>>>> pnowojski@apache.org>
>>>> > > > wrote:
>>>> > > >
>>>> > > >> Hi Becket,
>>>> > > >>
>>>> > > >> I still sustain what I wrote before:
>>>> > > >> > I think I would still vote soft -1 on this option, but I
>>>> wouldn't
>>>> > block
>>>> > > >> it in case I am out-voted.
>>>> > > >>
>>>> > > >> > I think it might be helpful to agree on the definition of
>>>> optional
>>>> > in
>>>> > > >> our
>>>> > > >> case.
>>>> > > >>
>>>> > > >> For me it doesn't matter whether a default method throwing an
>>>> > exception we
>>>> > > >> call optional or non-optional. As long as we keep it this way,
>>>> the
>>>> > effect
>>>> > > >> is the same. It's effectively a method that a user doesn't have
>>>> to
>>>> > > >> implement. If interface/system allows some methods to be not
>>>> > implemented,
>>>> > > >> some users will do just that, regardless if we call it and
>>>> document as
>>>> > > >> non-optional. And at the same time it's clogging the base
>>>> interface.
>>>> > > >>
>>>> > > >> By the way, just the need for a java-doc/documentation
>>>> explaining the
>>>> > > >> existence of some construct is a bad smell (code should be
>>>> > > >> self-documenting
>>>> > > >> and default method throwing an UnsupportedOperationException is
>>>> not).
>>>> > > >>
>>>> > > >> > Please note that so far we do not assume whether the feature
>>>> is in
>>>> > > >> > the original API or it is added later. A newly added feature
>>>> can
>>>> > also be
>>>> > > >> > non-optional, although it might take some time for all the
>>>> pluggable
>>>> > > >> > developers to catch up, and they should still work if the new
>>>> > feature is
>>>> > > >> > not used until they catch up. In contrast, we may never expect
>>>> an
>>>> > > >> optional
>>>> > > >> > feature to catch up, because leaving it unimplemented is also
>>>> > blessed.
>>>> > > >> >
>>>> > > >> > Let's take the checkpointing as an example. Imagine Flink did
>>>> not
>>>> > > >> support
>>>> > > >> > checkpointing before release 1.16. And now we are trying to add
>>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
>>>> configuration
>>>> > to
>>>> > > >> the
>>>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
>>>> > operators,
>>>> > > >> > connectors) developers that methods like "snapshotState()" is
>>>> > optional?
>>>> > > >> If
>>>> > > >> > we do that, the availability of checkpointing in Flink would be
>>>> > severely
>>>> > > >> > weakened. But apparently we should still allow the existing
>>>> > > >> implementations
>>>> > > >> > to work without checkpointing. It looks to me that adding the
>>>> > method to
>>>> > > >> the
>>>> > > >> > pluggable interfaces with a default implementation throwing
>>>> > > >> > "UnsupportedOperationException" would be the solution here.
>>>> Please
>>>> > note
>>>> > > >> > that in this case, having the default implementation does not
>>>> mean
>>>> > this
>>>> > > >> is
>>>> > > >> > optional. It is just the technique to support backwards
>>>> > compatibility in
>>>> > > >> > the feature evolution. The fact that this method is in the base
>>>> > > >> interface
>>>> > > >> > suggests it is not optional, so the developers SHOULD
>>>> implement it.
>>>> > > >>
>>>> > > >> I would soft vote -1 for having the default method throwing
>>>> > > >> UnsupportedOperationException as one of thing for this (FLIP-217)
>>>> > special
>>>> > > >> circumstances.
>>>> > > >>
>>>> > > >> At the moment, without thinking this over too much, I would vote
>>>> > harder -1
>>>> > > >> for having this as a general rule when adding new features. If we
>>>> > ever end
>>>> > > >> up with an API that is littered with default methods throwing
>>>> > > >> UnsupportedOperationException that are documented as "non
>>>> optional" it
>>>> > > >> would be IMO a big design failure. I would be ok-ish with that,
>>>> only
>>>> > if
>>>> > > >> that was a temporary thing and we had an aggressive plan to
>>>> release
>>>> > more
>>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
>>>> > > >> compatibility, that would get rid of those default methods.
>>>> Adding
>>>> > > >> checkpointing and methods like "snapshotState()" would IMO easily
>>>> > justify
>>>> > > >> a
>>>> > > >> new major Flink release. In that case we could add those methods
>>>> with
>>>> > > >> default implementation for some transition period, a one or two
>>>> minor
>>>> > > >> releases, followed by a clean up in a major release. However I
>>>> would
>>>> > still
>>>> > > >> argue that it would be cleaner/better to add a decorative
>>>> interface
>>>> > like
>>>> > > >> `CheckpointedOperator` interface instead of adding those default
>>>> > methods
>>>> > > >> to
>>>> > > >> the base `Operator` interface.
>>>> > > >>
>>>> > > >> I think I can sum up our disagreement here is that I would like
>>>> to
>>>> > keep
>>>> > > >> the
>>>> > > >> interfaces simpler, with only obligatory methods/features on one
>>>> side
>>>> > and
>>>> > > >> clearly optional features on the other. While you would like to
>>>> add an
>>>> > > >> extra third state in between those two?
>>>> > > >>
>>>> > > >> Best,
>>>> > > >> Piotrek
>>>> > > >>
>>>> > > >>
>>>> > > >>
>>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com>
>>>> > napisał(a):
>>>> > > >>
>>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
>>>> > > >> >
>>>> > > >> > It looks like the key problem is still whether the
>>>> implementation of
>>>> > > >> > pausable splits in the Sources should be optional or not.
>>>> > > >> >
>>>> > > >> > I think it might be helpful to agree on the definition of
>>>> optional
>>>> > in
>>>> > > >> our
>>>> > > >> > case. To me:
>>>> > > >> > Optional = "You CAN leave the method unimplemented, and that is
>>>> > fine."
>>>> > > >> > Non-Optional = "You CAN leave the method unimplemented, but you
>>>> > SHOULD
>>>> > > >> NOT,
>>>> > > >> > because people assume this works."
>>>> > > >> >
>>>> > > >> > I think one sufficient condition of a Non-Optional feature is
>>>> that
>>>> > if
>>>> > > >> the
>>>> > > >> > feature is exposed through the framework API, Flink should
>>>> expect
>>>> > the
>>>> > > >> > pluggables to support this feature by default. Otherwise the
>>>> > > >> availability
>>>> > > >> > of that feature becomes undefined.
>>>> > > >> >
>>>> > > >> > Please note that so far we do not assume whether the feature
>>>> is in
>>>> > > >> > the original API or it is added later. A newly added feature
>>>> can
>>>> > also be
>>>> > > >> > non-optional, although it might take some time for all the
>>>> pluggable
>>>> > > >> > developers to catch up, and they should still work if the new
>>>> > feature is
>>>> > > >> > not used until they catch up. In contrast, we may never expect
>>>> an
>>>> > > >> optional
>>>> > > >> > feature to catch up, because leaving it unimplemented is also
>>>> > blessed.
>>>> > > >> >
>>>> > > >> > Let's take the checkpointing as an example. Imagine Flink did
>>>> not
>>>> > > >> support
>>>> > > >> > checkpointing before release 1.16. And now we are trying to add
>>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
>>>> configuration
>>>> > to
>>>> > > >> the
>>>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
>>>> > operators,
>>>> > > >> > connectors) developers that methods like "snapshotState()" is
>>>> > optional?
>>>> > > >> If
>>>> > > >> > we do that, the availability of checkpointing in Flink would be
>>>> > severely
>>>> > > >> > weakened. But apparently we should still allow the existing
>>>> > > >> implementations
>>>> > > >> > to work without checkpointing. It looks to me that adding the
>>>> > method to
>>>> > > >> the
>>>> > > >> > pluggable interfaces with a default implementation throwing
>>>> > > >> > "UnsupportedOperationException" would be the solution here.
>>>> Please
>>>> > note
>>>> > > >> > that in this case, having the default implementation does not
>>>> mean
>>>> > this
>>>> > > >> is
>>>> > > >> > optional. It is just the technique to support backwards
>>>> > compatibility in
>>>> > > >> > the feature evolution. The fact that this method is in the base
>>>> > > >> interface
>>>> > > >> > suggests it is not optional, so the developers SHOULD
>>>> implement it.
>>>> > > >> >
>>>> > > >> > When it comes to this FLIP, I think it meets the criteria of
>>>> > > >> non-optional
>>>> > > >> > features, so we should just use the evolution path of
>>>> non-optional
>>>> > > >> > features.
>>>> > > >> >
>>>> > > >> > Thanks,
>>>> > > >> >
>>>> > > >> > Jiangjie (Becket) Qin
>>>> > > >> >
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
>>>> > pnowojski@apache.org>
>>>> > > >> > wrote:
>>>> > > >> >
>>>> > > >> > > Hi,
>>>> > > >> > >
>>>> > > >> > > Actually previously I thought about having a decorative
>>>> interface
>>>> > and
>>>> > > >> > > whenever watermark alignment is enabled, checking that the
>>>> source
>>>> > > >> > > implements the decorative interface. If not, throwing an
>>>> > exception.
>>>> > > >> > >
>>>> > > >> > > The option with default methods in the source interfaces
>>>> throwing
>>>> > > >> > > `UnsupportedOperationException` I think still suffers from
>>>> the
>>>> > same
>>>> > > >> > > problems I mentioned before. It's still an optional
>>>> implementation
>>>> > > >> and at
>>>> > > >> > > the same time it's clogging the base interface. I think I
>>>> would
>>>> > still
>>>> > > >> > vote
>>>> > > >> > > soft -1 on this option, but I wouldn't block it in case I am
>>>> > > >> out-voted.
>>>> > > >> > >
>>>> > > >> > > Best,
>>>> > > >> > > Piotrek
>>>> > > >> > >
>>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
>>>> > sebastian@ververica.com>
>>>> > > >> > > napisał(a):
>>>> > > >> > >
>>>> > > >> > > > Hi Becket,
>>>> > > >> > > >
>>>> > > >> > > > Thanks a lot for your fast and detailed response. For me,
>>>> it
>>>> > > >> converges
>>>> > > >> > > and
>>>> > > >> > > > dropping the supportsX method sounds very reasonable to me.
>>>> > (Side
>>>> > > >> note:
>>>> > > >> > > > With "pausable splits" enabled as "default" I think we
>>>> > > >> misunderstood.
>>>> > > >> > As
>>>> > > >> > > > you described now "default" I understand as that it should
>>>> be
>>>> > the
>>>> > > >> new
>>>> > > >> > > > recommended way of implementation, and I think that is
>>>> fully
>>>> > valid.
>>>> > > >> > > Before,
>>>> > > >> > > > I understood "default" here as the default implementation,
>>>> i.e.,
>>>> > > >> > throwing
>>>> > > >> > > > UnsupportedOperationException, which is the exact
>>>> opposite. :) )
>>>> > > >> > > >
>>>> > > >> > > > Nevertheless: As mentioned, an open question for me is if
>>>> > watermark
>>>> > > >> > > > alignment should enforce pausable splits. For
>>>> clarification, the
>>>> > > >> > current
>>>> > > >> > > > documentation [1] says:
>>>> > > >> > > >
>>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning across tasks
>>>> of the
>>>> > same
>>>> > > >> > > >> source and/or different sources. It does not support
>>>> aligning
>>>> > > >> > > >> splits/partitions/shards in the same task.
>>>> > > >> > > >>
>>>> > > >> > > >> In a case where there are e.g. two Kafka partitions that
>>>> > produce
>>>> > > >> > > >> watermarks at different pace, that get assigned to the
>>>> same
>>>> > task
>>>> > > >> > > watermark
>>>> > > >> > > >> might not behave as expected. Fortunately, worst case it
>>>> > should not
>>>> > > >> > > perform
>>>> > > >> > > >> worse than without alignment.
>>>> > > >> > > >>
>>>> > > >> > > >> Given the limitation above, we suggest applying watermark
>>>> > > >> alignment in
>>>> > > >> > > >> two situations:
>>>> > > >> > > >>
>>>> > > >> > > >>    1. You have two different sources (e.g. Kafka and
>>>> File) that
>>>> > > >> > produce
>>>> > > >> > > >>    watermarks at different speeds
>>>> > > >> > > >>    2. You run your source with parallelism equal to the
>>>> number
>>>> > of
>>>> > > >> > > >>    splits/shards/partitions, which results in every
>>>> subtask
>>>> > being
>>>> > > >> > > assigned a
>>>> > > >> > > >>    single unit of work.
>>>> > > >> > > >>
>>>> > > >> > > >> I personally see no issue in implementing and I see no
>>>> reason
>>>> > > >> against
>>>> > > >> > > > implementing this dependency of watermark alignment and
>>>> pausable
>>>> > > >> > splits.
>>>> > > >> > > (I
>>>> > > >> > > > think this would even be a good path towards shaping
>>>> watermark
>>>> > > >> > alignment
>>>> > > >> > > in
>>>> > > >> > > > 1.16.) However, "I don't see" means that I would be happy
>>>> to
>>>> > hear
>>>> > > >> > Dawid's
>>>> > > >> > > > and Piotrek's opinions as they implemented watermark
>>>> alignment
>>>> > > >> based on
>>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
>>>> > rationale/background
>>>> > > >> > info
>>>> > > >> > > > from their side.
>>>> > > >> > > >
>>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
>>>> > > >> > > >
>>>> > > >> > > > Regards,
>>>> > > >> > > > Sebastian
>>>> > > >> > > >
>>>> > > >> > > > [1]
>>>> > > >> > > >
>>>> > > >> > >
>>>> > > >> >
>>>> > > >>
>>>> >
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
>>>> > > >> > > > [2]
>>>> > > >> > > >
>>>> > > >> > >
>>>> > > >> >
>>>> > > >>
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
>>>> > > >> > > >
>>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
>>>> > becket.qin@gmail.com>
>>>> > > >> > wrote:
>>>> > > >> > > >
>>>> > > >> > > >> +dev
>>>> > > >> > > >>
>>>> > > >> > > >> Hi Sebastian,
>>>> > > >> > > >>
>>>> > > >> > > >> Thank you for the summary. Please see the detailed replies
>>>> > inline.
>>>> > > >> As
>>>> > > >> > a
>>>> > > >> > > >> recap of my suggestions.
>>>> > > >> > > >>
>>>> > > >> > > >> 1. Pausable splits API.
>>>> > > >> > > >>   a) Add default implementations to methods
>>>> > "pauseOrResumeSplits"
>>>> > > >> in
>>>> > > >> > > both
>>>> > > >> > > >> SourceReader and SplitReader where both default
>>>> implementations
>>>> > > >> throw
>>>> > > >> > > >>  UnsupportedOperationException.
>>>> > > >> > > >>
>>>> > > >> > > >> 2. User story.
>>>> > > >> > > >>     a) We tell users to enable the watermark alignment as
>>>> they
>>>> > > >> like.
>>>> > > >> > > This
>>>> > > >> > > >> is exactly what the current Flink API is.
>>>> > > >> > > >>     b) We tell the source developers, please implement
>>>> pausable
>>>> > > >> > splits,
>>>> > > >> > > >> otherwise bad things may happen. Think of it like you are
>>>> > expected
>>>> > > >> to
>>>> > > >> > > >> implement SourceReader#snapshotState() properly, otherwise
>>>> > > >> exceptions
>>>> > > >> > > will
>>>> > > >> > > >> be thrown when users enable checkpointing.
>>>> > > >> > > >>
>>>> > > >> > > >> Thanks,
>>>> > > >> > > >>
>>>> > > >> > > >> Jiangjie (Becket) Qin
>>>> > > >> > > >>
>>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
>>>> > > >> > > >> sebastian@ververica.com> wrote:
>>>> > > >> > > >>
>>>> > > >> > > >>> Hi Becket, Hi everybody,
>>>> > > >> > > >>>
>>>> > > >> > > >>> I'm sorry if I misread the messages but I could not
>>>> derive an
>>>> > > >> > agreement
>>>> > > >> > > >>> from the mailing list. Nevertheless, if I understand you
>>>> > right the
>>>> > > >> > > >>> suggestion is:
>>>> > > >> > > >>>
>>>> > > >> > > >>> * Add default implementations to methods
>>>> > "pauseOrResumeSplits" in
>>>> > > >> > both
>>>> > > >> > > >>> SourceReader and SplitReader where both default
>>>> > implementations
>>>> > > >> throw
>>>> > > >> > > >>> UnsupportedOperationException.
>>>> > > >> > > >>>
>>>> > > >> > > >> Yes.
>>>> > > >> > > >>
>>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source
>>>> interface.
>>>> > (In
>>>> > > >> the
>>>> > > >> > > >>> following, I refer to supporting this as "pausable
>>>> splits".)
>>>> > > >> > > >>>
>>>> > > >> > > >> We may no longer need this if pausable splits are
>>>> expected to
>>>> > be
>>>> > > >> > > >> implemented by the source developers, i.e. non-optional.
>>>> Having
>>>> > > >> this
>>>> > > >> > > method
>>>> > > >> > > >> would then be somewhat misleading as it looks like the
>>>> sources
>>>> > > >> that do
>>>> > > >> > > not
>>>> > > >> > > >> support pausable splits are also acceptable in the long
>>>> term.
>>>> > So
>>>> > > >> API
>>>> > > >> > > wise,
>>>> > > >> > > >> I'd say maybe we should remove this for this FLIP,
>>>> although I
>>>> > > >> believe
>>>> > > >> > > this
>>>> > > >> > > >> supportXXX pattern itself is still attractive for optional
>>>> > > >> features.
>>>> > > >> > > >>
>>>> > > >> > > >>
>>>> > > >> > > >>>
>>>> > > >> > > >>> To make the conclusions explicit:
>>>> > > >> > > >>>
>>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in both
>>>> > interfaces
>>>> > > >> > > >>> SourceReader and SplitReader are optional where the
>>>> default is
>>>> > > >> that
>>>> > > >> > it
>>>> > > >> > > >>> doesn't support it. (--> This means that the
>>>> implementation is
>>>> > > >> still
>>>> > > >> > > >>> optional for the source developer.)
>>>> > > >> > > >>>
>>>> > > >> > > >> It is optional for backwards compatibility with existing
>>>> > sources,
>>>> > > >> as
>>>> > > >> > > they
>>>> > > >> > > >> may still compile without code change. But starting from
>>>> this
>>>> > FLIP,
>>>> > > >> > > Flink
>>>> > > >> > > >> will always optimistically assume that all the sources
>>>> support
>>>> > > >> > pausable
>>>> > > >> > > >> splits. If a source does not support pausable splits, it
>>>> goes
>>>> > to an
>>>> > > >> > > error
>>>> > > >> > > >> handling path when watermark alignment is enabled on it.
>>>> This
>>>> > is
>>>> > > >> > > different
>>>> > > >> > > >> from a usual optional feature, where no error is expected.
>>>> > > >> > > >>
>>>> > > >> > > >>
>>>> > > >> > > >>> 2. If watermark alignment is enabled in the application
>>>> code
>>>> > by
>>>> > > >> > adding
>>>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
>>>> > > >> SourceReader or
>>>> > > >> > > >>> SplitReader do not support pausableSplits, we throw an
>>>> > > >> > > >>> UnsupportedOperationException.
>>>> > > >> > > >>>
>>>> > > >> > > >> Yes.
>>>> > > >> > > >>
>>>> > > >> > > >>
>>>> > > >> > > >>> 3. With regard to your statement:
>>>> > > >> > > >>>
>>>> > > >> > > >>>> [...] basically means watermark alignment is an
>>>> non-optional
>>>> > > >> feature
>>>> > > >> > > to
>>>> > > >> > > >>>> the end users.
>>>> > > >> > > >>>
>>>> > > >> > > >>> You actually mean that "pausable splits" are
>>>> non-optional for
>>>> > the
>>>> > > >> app
>>>> > > >> > > >>> developer if watermark alignment is enabled. However,
>>>> > watermark
>>>> > > >> > > alignment
>>>> > > >> > > >>> is optional and can be enabled/disabled.
>>>> > > >> > > >>>
>>>> > > >> > > >> Yes, watermark alignment can be enabled/disabled in
>>>> individual
>>>> > > >> sources
>>>> > > >> > > in
>>>> > > >> > > >> Flink jobs, which basically means the code supporting
>>>> watermark
>>>> > > >> > > alignment
>>>> > > >> > > >> has to already be there. That again means the Source
>>>> > developers are
>>>> > > >> > also
>>>> > > >> > > >> expected to support pausable splits by default. So this
>>>> way we
>>>> > > >> > > essentially
>>>> > > >> > > >> tell the end users that you may enable / disable this
>>>> feature
>>>> > as
>>>> > > >> you
>>>> > > >> > > wish,
>>>> > > >> > > >> and tell the source developers that you SHOULD implement
>>>> this
>>>> > > >> because
>>>> > > >> > > the
>>>> > > >> > > >> end users may turn it on/off at will. And if the source
>>>> does
>>>> > not
>>>> > > >> > support
>>>> > > >> > > >> pausable splits, that goes to an error handling path when
>>>> > watermark
>>>> > > >> > > >> alignment is enabled on it. So users know they have to
>>>> > explicitly
>>>> > > >> > > exclude
>>>> > > >> > > >> this source.
>>>> > > >> > > >>
>>>> > > >> > > >>
>>>> > > >> > > >>>
>>>> > > >> > > >>> So far it's totally clear to me and I hope this is what
>>>> you
>>>> > mean.
>>>> > > >> I
>>>> > > >> > > also
>>>> > > >> > > >>> agree with both statements:
>>>> > > >> > > >>>
>>>> > > >> > > >>> So making that expectation aligned with the source
>>>> developers
>>>> > > >> seems
>>>> > > >> > > >>>> reasonable.
>>>> > > >> > > >>>>
>>>> > > >> > > >>>
>>>> > > >> > > >>> I think this is a simple and clean solution from both
>>>> the end
>>>> > user
>>>> > > >> > and
>>>> > > >> > > >>>> source developers' standpoint.
>>>> > > >> > > >>>>
>>>> > > >> > > >>>
>>>> > > >> > > >>> However, a last conclusion derives from 3. and is an open
>>>> > question
>>>> > > >> > for
>>>> > > >> > > >>> me:
>>>> > > >> > > >>>
>>>> > > >> > > >>> 4. The feature of "pausable splits" is now tightly bound
>>>> to
>>>> > > >> watermark
>>>> > > >> > > >>> alignment, i.e., if sources do not support "pausable
>>>> splits"
>>>> > one
>>>> > > >> can
>>>> > > >> > > not
>>>> > > >> > > >>> enable watermark alignment for these sources. This
>>>> dependency
>>>> > is
>>>> > > >> not
>>>> > > >> > > the
>>>> > > >> > > >>> current status of watermark alignment implementation
>>>> because
>>>> > it
>>>> > > >> > is/was
>>>> > > >> > > >>> implemented without pausable splits. Do we want to
>>>> introduce
>>>> > this
>>>> > > >> > > >>> dependency? (This is an open question. I cannot judge
>>>> that.)
>>>> > > >> > > >>>
>>>> > > >> > > >> The watermark alignment basically relies on the pausable
>>>> > splits,
>>>> > > >> > right?
>>>> > > >> > > >> So personally I found it quite reasonable that if the
>>>> source
>>>> > does
>>>> > > >> not
>>>> > > >> > > >> support pausable splits, end users cannot enable watermark
>>>> > > >> alignment
>>>> > > >> > on
>>>> > > >> > > it.
>>>> > > >> > > >>
>>>> > > >> > > >>
>>>> > > >> > > >>> If something is wrong, please correct me.
>>>> > > >> > > >>>
>>>> > > >> > > >>> Regards,
>>>> > > >> > > >>> Sebastian
>>>> > > >> > > >>>
>>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
>>>> > becket.qin@gmail.com>
>>>> > > >> > > wrote:
>>>> > > >> > > >>>
>>>> > > >> > > >>>> Hi Sebastian,
>>>> > > >> > > >>>>
>>>> > > >> > > >>>> Thanks for the reply and patient discussion. I agree
>>>> this is
>>>> > a
>>>> > > >> > tricky
>>>> > > >> > > >>>> decision.
>>>> > > >> > > >>>>
>>>> > > >> > > >>>>
>>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
>>>> > which I
>>>> > > >> see
>>>> > > >> > as
>>>> > > >> > > >>>>> follows:
>>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
>>>> makes the
>>>> > > >> > > >>>>> implementation optional. And in my opinion, a default
>>>> > > >> > implementation
>>>> > > >> > > is and
>>>> > > >> > > >>>>> will remain a way of making implementation optional
>>>> because
>>>> > > >> even in
>>>> > > >> > > future
>>>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
>>>> without
>>>> > > >> > support
>>>> > > >> > > for
>>>> > > >> > > >>>>> pausable splits.
>>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
>>>> suboptimal
>>>> > > >> that
>>>> > > >> > > with
>>>> > > >> > > >>>>> a NOOP default implementation there is no way to check
>>>> at
>>>> > > >> runtime
>>>> > > >> > if
>>>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing.
>>>> (To
>>>> > do so,
>>>> > > >> > one
>>>> > > >> > > would
>>>> > > >> > > >>>>> need a supportsX method which makes it again more
>>>> > complicated.)\
>>>> > > >> > > >>>>
>>>> > > >> > > >>>>
>>>> > > >> > > >>>> Based on the last few messages in the mailing list.
>>>> Piotr
>>>> > and I
>>>> > > >> > > agreed
>>>> > > >> > > >>>> that the default implementation should just throw an
>>>> > > >> > > >>>> UnsupportedOperationException if the source is
>>>> unpausable. So
>>>> > > >> this
>>>> > > >> > > >>>> basically tells the Source developers that this feature
>>>> is
>>>> > > >> expected
>>>> > > >> > > to be
>>>> > > >> > > >>>> supported. Because we cannot prevent end users from
>>>> putting
>>>> > an
>>>> > > >> > > unpausable
>>>> > > >> > > >>>> source into the watermark alignment group, that
>>>> basically
>>>> > means
>>>> > > >> > > watermark
>>>> > > >> > > >>>> alignment is an non-optional feature to the end users.
>>>> So
>>>> > making
>>>> > > >> > that
>>>> > > >> > > >>>> expectation aligned with the source developers seems
>>>> > reasonable.
>>>> > > >> > And
>>>> > > >> > > if a
>>>> > > >> > > >>>> source does not support this feature, the end users
>>>> should
>>>> > > >> > explicitly
>>>> > > >> > > >>>> remove that source from the watermark alignment group.
>>>> > > >> > > >>>>
>>>> > > >> > > >>>> Personally speaking I think this is a simple and clean
>>>> > solution
>>>> > > >> from
>>>> > > >> > > >>>> both the end user and source developers' standpoint.
>>>> > > >> > > >>>>
>>>> > > >> > > >>>> Does this address your concerns?
>>>> > > >> > > >>>>
>>>> > > >> > > >>>> Thanks,
>>>> > > >> > > >>>>
>>>> > > >> > > >>>> Jiangjie (Becket) Qin
>>>> > > >> > > >>>>
>>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
>>>> > > >> > > >>>> sebastian@ververica.com> wrote:
>>>> > > >> > > >>>>
>>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> we, Dawid and I, discussed the various
>>>> suggestions/options
>>>> > and
>>>> > > >> we
>>>> > > >> > > >>>>> would be okay either way because we find neither
>>>> solution is
>>>> > > >> > perfect
>>>> > > >> > > just
>>>> > > >> > > >>>>> because of the already present complexity.
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
>>>> SourceReader
>>>> > and
>>>> > > >> > > >>>>> SplitReader
>>>> > > >> > > >>>>> Option a) Adding decorative interfaces to be used by
>>>> > > >> SourceReader
>>>> > > >> > and
>>>> > > >> > > >>>>> SplitReader
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is
>>>> > based on
>>>> > > >> > > >>>>> Option c) which we find acceptable because the
>>>> complexity
>>>> > added
>>>> > > >> is
>>>> > > >> > > only a
>>>> > > >> > > >>>>> single method.
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
>>>> > which I
>>>> > > >> see
>>>> > > >> > as
>>>> > > >> > > >>>>> follows:
>>>> > > >> > > >>>>> (1) An interface with default NOOP implementation
>>>> makes the
>>>> > > >> > > >>>>> implementation optional. And in my opinion, a default
>>>> > > >> > implementation
>>>> > > >> > > is and
>>>> > > >> > > >>>>> will remain a way of making implementation optional
>>>> because
>>>> > > >> even in
>>>> > > >> > > future
>>>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
>>>> without
>>>> > > >> > support
>>>> > > >> > > for
>>>> > > >> > > >>>>> pausable splits.
>>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
>>>> suboptimal
>>>> > > >> that
>>>> > > >> > > with
>>>> > > >> > > >>>>> a NOOP default implementation there is no way to check
>>>> at
>>>> > > >> runtime
>>>> > > >> > if
>>>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing.
>>>> (To
>>>> > do so,
>>>> > > >> > one
>>>> > > >> > > would
>>>> > > >> > > >>>>> need a supportsX method which makes it again more
>>>> > complicated.)
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> However, we haven't changed it because Option a) is
>>>> also not
>>>> > > >> > optimal
>>>> > > >> > > >>>>> or straight-forward:
>>>> > > >> > > >>>>> (1) We need to add two distinct yet similar decorative
>>>> > > >> interfaces
>>>> > > >> > > >>>>> since, as mentioned, the signatures of the methods are
>>>> > > >> different.
>>>> > > >> > For
>>>> > > >> > > >>>>> example, we would need decorative interfaces like
>>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
>>>> > > >> > `SourceReaderWithPausableSplits`.
>>>> > > >> > > >>>>> (2) As a consequence, we would need to somehow document
>>>> > > >> how/where
>>>> > > >> > to
>>>> > > >> > > >>>>> implement both interfaces and how this relates to each
>>>> > other.
>>>> > > >> This
>>>> > > >> > > we could
>>>> > > >> > > >>>>> solve by adding a note in the interface of
>>>> SourceReader and
>>>> > > >> > > SplitReader and
>>>> > > >> > > >>>>> reference to the decorative interfaces but it still
>>>> > increases
>>>> > > >> > > complexity
>>>> > > >> > > >>>>> too.
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> In summary, we see both as acceptable and preferred
>>>> over
>>>> > other
>>>> > > >> > > >>>>> options. The question is if we can find a solution or
>>>> > compromise
>>>> > > >> > > that is
>>>> > > >> > > >>>>> acceptable for everybody to reach consensus.
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> Please let us know what you think because we would be
>>>> happy
>>>> > if
>>>> > > >> we
>>>> > > >> > can
>>>> > > >> > > >>>>> conclude the discussion to avoid dropping the
>>>> initiative on
>>>> > this
>>>> > > >> > > FLIP.
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> Regards,
>>>> > > >> > > >>>>> Sebastian
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> [1]
>>>> > > >> > > >>>>>
>>>> > > >> > >
>>>> > > >> >
>>>> > > >>
>>>> >
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
>>>> > > >> > > >>>>> (v. 12)
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
>>>> > > >> > pnowojski@apache.org
>>>> > > >> > > >
>>>> > > >> > > >>>>> wrote:
>>>> > > >> > > >>>>>
>>>> > > >> > > >>>>>> Hi Guowei,
>>>> > > >> > > >>>>>>
>>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
>>>> > > >> > > >>>>>>
>>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1] which has
>>>> been
>>>> > > >> already
>>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be enabled
>>>> with the
>>>> > > >> > watermark
>>>> > > >> > > >>>>>> strategy which also configures drift and update
>>>> interval
>>>> > > >> > > >>>>>>
>>>> > > >> > > >>>>>> So by default watermark alignment is disabled,
>>>> regardless
>>>> > if a
>>>> > > >> > > source
>>>> > > >> > > >>>>>> supports it or not.
>>>> > > >> > > >>>>>>
>>>> > > >> > > >>>>>> Best,
>>>> > > >> > > >>>>>> Piotrek
>>>> > > >> > > >>>>>>
>>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
>>>> guowei.mgw@gmail.com>
>>>> > > >> > > napisał(a):
>>>> > > >> > > >>>>>>
>>>> > > >> > > >>>>>>> Hi,
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>> We know that in the case of Bounded input Flink
>>>> supports
>>>> > the
>>>> > > >> > Batch
>>>> > > >> > > >>>>>>> execution mode. Currently in Batch execution mode,
>>>> flink
>>>> > is
>>>> > > >> > > executed
>>>> > > >> > > >>>>>>> on a
>>>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark
>>>> > alignment
>>>> > > >> > > might
>>>> > > >> > > >>>>>>> not
>>>> > > >> > > >>>>>>> gain much.
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>> So my question is: Is watermark alignment the default
>>>> > > >> > behavior(for
>>>> > > >> > > >>>>>>> implemented source only)? If so, have you considered
>>>> > > >> evaluating
>>>> > > >> > the
>>>> > > >> > > >>>>>>> impact
>>>> > > >> > > >>>>>>> of this behavior on the Batch execution mode? Or
>>>> thinks
>>>> > it is
>>>> > > >> not
>>>> > > >> > > >>>>>>> necessary.
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>> Correct me if I miss something.
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>> Best,
>>>> > > >> > > >>>>>>> Guowei
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
>>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
>>>> > > >> > > >>>>>>> wrote:
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > > I feel that no matter which option we choose
>>>> this can
>>>> > not
>>>> > > >> be
>>>> > > >> > > >>>>>>> solved
>>>> > > >> > > >>>>>>> > entirely in either of the options, because of the
>>>> point
>>>> > > >> above
>>>> > > >> > and
>>>> > > >> > > >>>>>>> because
>>>> > > >> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits
>>>> and
>>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly
>>>> different
>>>> > (one
>>>> > > >> > > >>>>>>> identifies
>>>> > > >> > > >>>>>>> > splits with splitId the other one passes the splits
>>>> > > >> directly).
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > Yes, that's a good point in this case and for
>>>> features
>>>> > that
>>>> > > >> > need
>>>> > > >> > > >>>>>>> to be
>>>> > > >> > > >>>>>>> > implemented in more than one place.
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
>>>> split
>>>> > an
>>>> > > >> > > optional
>>>> > > >> > > >>>>>>> feature,
>>>> > > >> > > >>>>>>> > > other than that this was not included in the
>>>> original
>>>> > > >> > > interface?
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > An additional argument in favor of making it
>>>> optional
>>>> > is to
>>>> > > >> > > >>>>>>> simplify source
>>>> > > >> > > >>>>>>> > implementation. But on its own I'm not sure if that
>>>> > would be
>>>> > > >> > > >>>>>>> enough to
>>>> > > >> > > >>>>>>> > justify making this feature optional. Maybe.
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to
>>>> just
>>>> > let
>>>> > > >> end
>>>> > > >> > > >>>>>>> users and
>>>> > > >> > > >>>>>>> > Flink
>>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
>>>> feature.
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > As I wrote above that would be an interesting
>>>> choice to
>>>> > make
>>>> > > >> > > (ease
>>>> > > >> > > >>>>>>> of
>>>> > > >> > > >>>>>>> > implementation for new users, vs system
>>>> consistency).
>>>> > > >> > Regardless
>>>> > > >> > > >>>>>>> of that,
>>>> > > >> > > >>>>>>> > yes, for me the main argument is the API backward
>>>> > > >> > compatibility.
>>>> > > >> > > >>>>>>> But let's
>>>> > > >> > > >>>>>>> > clear a couple of points:
>>>> > > >> > > >>>>>>> > - The current proposal adding methods to the base
>>>> > interface
>>>> > > >> > with
>>>> > > >> > > >>>>>>> default
>>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
>>>> > > >> decorative
>>>> > > >> > > >>>>>>> version
>>>> > > >> > > >>>>>>> > would be.
>>>> > > >> > > >>>>>>> > - Decorative version could implement "throw
>>>> > > >> > > >>>>>>> UnsupportedOperationException"
>>>> > > >> > > >>>>>>> > if user enabled watermark alignment just as well
>>>> and I
>>>> > agree
>>>> > > >> > > >>>>>>> that's a
>>>> > > >> > > >>>>>>> > better option compared to logging a warning.
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > Best,
>>>> > > >> > > >>>>>>> > Piotrek
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
>>>> > becket.qin@gmail.com>
>>>> > > >> > > >>>>>>> napisał(a):
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>> > > Thanks for the reply and patient discussion,
>>>> Piotr and
>>>> > > >> Dawid.
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
>>>> split
>>>> > an
>>>> > > >> > > optional
>>>> > > >> > > >>>>>>> feature,
>>>> > > >> > > >>>>>>> > > other than that this was not included in the
>>>> original
>>>> > > >> > > interface?
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > To be honest I am really worried about the
>>>> complexity
>>>> > of
>>>> > > >> the
>>>> > > >> > > >>>>>>> user story
>>>> > > >> > > >>>>>>> > > here. Optional features like this have a high
>>>> > overhead.
>>>> > > >> > Imagine
>>>> > > >> > > >>>>>>> this
>>>> > > >> > > >>>>>>> > > feature is optional, now a user enabled watermark
>>>> > > >> alignment
>>>> > > >> > and
>>>> > > >> > > >>>>>>> defined a
>>>> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that
>>>> > depends on
>>>> > > >> > > >>>>>>> whether the
>>>> > > >> > > >>>>>>> > > involved Source has implmemented this feature.
>>>> If the
>>>> > > >> Sources
>>>> > > >> > > >>>>>>> are well
>>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end users may
>>>> have to
>>>> > > >> look
>>>> > > >> > > into
>>>> > > >> > > >>>>>>> the code
>>>> > > >> > > >>>>>>> > > of the Source to see whether the feature is
>>>> supported.
>>>> > > >> Which
>>>> > > >> > is
>>>> > > >> > > >>>>>>> something
>>>> > > >> > > >>>>>>> > > they shouldn't have to do.
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to
>>>> just
>>>> > let
>>>> > > >> end
>>>> > > >> > > >>>>>>> users and
>>>> > > >> > > >>>>>>> > Flink
>>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
>>>> feature.
>>>> > > >> After
>>>> > > >> > > all
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > > watermark group is not optinoal to the end
>>>> users. If
>>>> > in
>>>> > > >> some
>>>> > > >> > > >>>>>>> rare cases,
>>>> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
>>>> > > >> > > >>>>>>> UnsupportedOperationException
>>>> > > >> > > >>>>>>> > will
>>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly remove this
>>>> > Source
>>>> > > >> from
>>>> > > >> > > the
>>>> > > >> > > >>>>>>> > watermark
>>>> > > >> > > >>>>>>> > > group. I don't think we should have a warning
>>>> message
>>>> > > >> here,
>>>> > > >> > as
>>>> > > >> > > >>>>>>> they tend
>>>> > > >> > > >>>>>>> > to
>>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do this, we
>>>> don't even
>>>> > > >> need
>>>> > > >> > the
>>>> > > >> > > >>>>>>> > supportXXX
>>>> > > >> > > >>>>>>> > > method in the Source for this feature. In fact
>>>> this is
>>>> > > >> > exactly
>>>> > > >> > > >>>>>>> how many
>>>> > > >> > > >>>>>>> > > interfaces works today. For example,
>>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
>>>> > > >> > > >>>>>>> > not
>>>> > > >> > > >>>>>>> > > supported by Pravega source because it does not
>>>> > support
>>>> > > >> > partial
>>>> > > >> > > >>>>>>> failover.
>>>> > > >> > > >>>>>>> > > In that case, it simply throws an exception to
>>>> > trigger a
>>>> > > >> > global
>>>> > > >> > > >>>>>>> recovery.
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > The reason we add a default implementation in
>>>> this
>>>> > case
>>>> > > >> would
>>>> > > >> > > >>>>>>> just for
>>>> > > >> > > >>>>>>> > the
>>>> > > >> > > >>>>>>> > > sake of backwards compatibility so the old
>>>> source can
>>>> > > >> still
>>>> > > >> > > >>>>>>> compile.
>>>> > > >> > > >>>>>>> > Sure,
>>>> > > >> > > >>>>>>> > > in short term, this feature might not be
>>>> supported by
>>>> > many
>>>> > > >> > > >>>>>>> existing
>>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite visible to
>>>> the
>>>> > source
>>>> > > >> > > >>>>>>> developers
>>>> > > >> > > >>>>>>> > that
>>>> > > >> > > >>>>>>> > > they did not override the default impl which
>>>> throws an
>>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > @Dawid,
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in the
>>>> Source
>>>> > > >> would
>>>> > > >> > be
>>>> > > >> > > >>>>>>> the single
>>>> > > >> > > >>>>>>> > > >> source of truth regarding how to implement
>>>> this
>>>> > > >> feature.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > I also don't find it entirely true. Half of the
>>>> > classes
>>>> > > >> are
>>>> > > >> > > >>>>>>> theoretically
>>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
>>>> point of
>>>> > view
>>>> > > >> how
>>>> > > >> > > the
>>>> > > >> > > >>>>>>> > > interfaces
>>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need
>>>> to
>>>> > use
>>>> > > >> any
>>>> > > >> > of
>>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird
>>>> to
>>>> > list
>>>> > > >> > their
>>>> > > >> > > >>>>>>> methods in
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > > Source interface.
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is to
>>>> guide
>>>> > users
>>>> > > >> to
>>>> > > >> > > >>>>>>> implement the
>>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way
>>>> to
>>>> > > >> > implement a
>>>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning that.
>>>> Even the
>>>> > > >> Java
>>>> > > >> > > >>>>>>> language
>>>> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
>>>> > implementations
>>>> > > >> [1]
>>>> > > >> > so
>>>> > > >> > > >>>>>>> people
>>>> > > >> > > >>>>>>> > can
>>>> > > >> > > >>>>>>> > > leverage them. But for this particular case, if
>>>> we
>>>> > make
>>>> > > >> the
>>>> > > >> > > >>>>>>> feature
>>>> > > >> > > >>>>>>> > > non-optional, we don't even need the supportXXX()
>>>> > method
>>>> > > >> for
>>>> > > >> > > now.
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > Thanks,
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > wrote:
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy with
>>>> whichever
>>>> > > >> option
>>>> > > >> > is
>>>> > > >> > > >>>>>>> agreed in
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > > discussion.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from the
>>>> discussion
>>>> > > >> > though:
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > @Becket:
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > The main argument for adding the methods to the
>>>> > > >> > SourceReader
>>>> > > >> > > >>>>>>> is that
>>>> > > >> > > >>>>>>> > > these
>>>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
>>>> SourceReader
>>>> > > >> impl,
>>>> > > >> > > i.e.
>>>> > > >> > > >>>>>>> > starting
>>>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are
>>>> > expected
>>>> > > >> to
>>>> > > >> > > >>>>>>> support this
>>>> > > >> > > >>>>>>> > > > method, although some old implementations may
>>>> not
>>>> > have
>>>> > > >> > > >>>>>>> implemented this
>>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
>>>> > features
>>>> > > >> > from
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > optional
>>>> > > >> > > >>>>>>> > > > features. While the public decorative
>>>> interface is a
>>>> > > >> > solution
>>>> > > >> > > >>>>>>> to the
>>>> > > >> > > >>>>>>> > > > optional features. We should not use it for the
>>>> > features
>>>> > > >> > that
>>>> > > >> > > >>>>>>> are
>>>> > > >> > > >>>>>>> > > > non-optional.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > I don't think that this feature is
>>>> NON-OPTIONAL.
>>>> > Even
>>>> > > >> > though
>>>> > > >> > > >>>>>>> > preferred, I
>>>> > > >> > > >>>>>>> > > > still think it can be simply optional.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the
>>>> > Source
>>>> > > >> would
>>>> > > >> > > be
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > single
>>>> > > >> > > >>>>>>> > > > source of truth regarding how to implement this
>>>> > feature.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > I also don't find it entirely true. Half of the
>>>> > classes
>>>> > > >> are
>>>> > > >> > > >>>>>>> > theoretically
>>>> > > >> > > >>>>>>> > > > optional and are utility classes from the
>>>> point of
>>>> > view
>>>> > > >> how
>>>> > > >> > > the
>>>> > > >> > > >>>>>>> > > interfaces
>>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need
>>>> to
>>>> > use
>>>> > > >> any
>>>> > > >> > of
>>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird
>>>> to
>>>> > list
>>>> > > >> > their
>>>> > > >> > > >>>>>>> methods in
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > > Source interface.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > @Piotr
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > If we have all of the methods with default
>>>> > > >> implementation
>>>> > > >> > in
>>>> > > >> > > >>>>>>> the base
>>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any clue to
>>>> the user
>>>> > > >> which
>>>> > > >> > > set
>>>> > > >> > > >>>>>>> of
>>>> > > >> > > >>>>>>> > methods
>>>> > > >> > > >>>>>>> > > > are required to be implemented at the same
>>>> time.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > I feel that no matter which option we choose
>>>> this
>>>> > can
>>>> > > >> not
>>>> > > >> > be
>>>> > > >> > > >>>>>>> solved
>>>> > > >> > > >>>>>>> > > > entirely in either of the options, because of
>>>> the
>>>> > point
>>>> > > >> > above
>>>> > > >> > > >>>>>>> and
>>>> > > >> > > >>>>>>> > because
>>>> > > >> > > >>>>>>> > > > the signature of
>>>> SplitReader#pauseOrResumeSplits and
>>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly
>>>> > different
>>>> > > >> > (one
>>>> > > >> > > >>>>>>> identifies
>>>> > > >> > > >>>>>>> > > > splits with splitId the other one passes the
>>>> splits
>>>> > > >> > > directly).
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Best,
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Dawid
>>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Hi Piotr,
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Thanks for the comment.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the
>>>> decorative
>>>> > > >> > interfaces,
>>>> > > >> > > >>>>>>> but I do
>>>> > > >> > > >>>>>>> > > > think we should use them with caution. The main
>>>> > argument
>>>> > > >> > for
>>>> > > >> > > >>>>>>> adding the
>>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that these
>>>> methods
>>>> > are
>>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl,
>>>> i.e.
>>>> > > >> > starting
>>>> > > >> > > >>>>>>> from this
>>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected
>>>> to
>>>> > support
>>>> > > >> > this
>>>> > > >> > > >>>>>>> > > > method, although some old implementations may
>>>> not
>>>> > have
>>>> > > >> > > >>>>>>> implemented this
>>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
>>>> > features
>>>> > > >> > from
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > optional
>>>> > > >> > > >>>>>>> > > > features. While the public decorative
>>>> interface is a
>>>> > > >> > solution
>>>> > > >> > > >>>>>>> to the
>>>> > > >> > > >>>>>>> > > > optional features. We should not use it for the
>>>> > features
>>>> > > >> > that
>>>> > > >> > > >>>>>>> are
>>>> > > >> > > >>>>>>> > > > non-optional.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > That said, this feature is optional for
>>>> > SplitReaders.
>>>> > > >> > > Arguably
>>>> > > >> > > >>>>>>> we can
>>>> > > >> > > >>>>>>> > > have
>>>> > > >> > > >>>>>>> > > > a decorative interface for that, but for
>>>> simplicity
>>>> > and
>>>> > > >> > > >>>>>>> symmetry of the
>>>> > > >> > > >>>>>>> > > > interface, personally I prefer just adding a
>>>> new
>>>> > method.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned about
>>>> the
>>>> > > >> decorative
>>>> > > >> > > >>>>>>> interfaces,
>>>> > > >> > > >>>>>>> > > > they would make sense if:
>>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
>>>> > > >> > > >>>>>>> > > > 2. There is only one decorative interface
>>>> involved
>>>> > for a
>>>> > > >> > > >>>>>>> feature.
>>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the methods are
>>>> > grouped
>>>> > > >> > > >>>>>>> together will
>>>> > > >> > > >>>>>>> > not
>>>> > > >> > > >>>>>>> > > > stand.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Compared with that, I think the current
>>>> solution
>>>> > works
>>>> > > >> fine
>>>> > > >> > > in
>>>> > > >> > > >>>>>>> all
>>>> > > >> > > >>>>>>> > cases,
>>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and
>>>> > default
>>>> > > >> > > >>>>>>> methods /
>>>> > > >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > The advantages are:
>>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > In the current approach, the Java doc of the
>>>> > > >> SupportXXX()
>>>> > > >> > > >>>>>>> method in the
>>>> > > >> > > >>>>>>> > > > Source would be the single source of truth
>>>> regarding
>>>> > > >> how to
>>>> > > >> > > >>>>>>> implement
>>>> > > >> > > >>>>>>> > > this
>>>> > > >> > > >>>>>>> > > > feature. It lists the method that has to be
>>>> > implemented
>>>> > > >> to
>>>> > > >> > > >>>>>>> support this
>>>> > > >> > > >>>>>>> > > > feature, regardless of how many classes /
>>>> > interfaces are
>>>> > > >> > > >>>>>>> involved.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > When implementing the base interface, users do
>>>> not
>>>> > need
>>>> > > >> to
>>>> > > >> > > >>>>>>> implement a
>>>> > > >> > > >>>>>>> > > > method with default implementation. If they are
>>>> > curious
>>>> > > >> > what
>>>> > > >> > > >>>>>>> the method
>>>> > > >> > > >>>>>>> > > is
>>>> > > >> > > >>>>>>> > > > for, the java doc of that method simply points
>>>> > users to
>>>> > > >> the
>>>> > > >> > > >>>>>>> > SupportXXX()
>>>> > > >> > > >>>>>>> > > > method in the Source. It seems not adding work
>>>> to
>>>> > the
>>>> > > >> users
>>>> > > >> > > >>>>>>> compared
>>>> > > >> > > >>>>>>> > with
>>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much better
>>>> > > >> > discoverability.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > - all of the methods from a single feature are
>>>> > grouped
>>>> > > >> in a
>>>> > > >> > > >>>>>>> single
>>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
>>>> dedicated
>>>> > java
>>>> > > >> > doc.
>>>> > > >> > > >>>>>>> It's also
>>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
>>>> > decorator
>>>> > > >> name
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > - if an optional feature requires two methods
>>>> to be
>>>> > > >> > > >>>>>>> implemented at
>>>> > > >> > > >>>>>>> > once,
>>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > These two points are not true when multiple
>>>> > components
>>>> > > >> and
>>>> > > >> > > >>>>>>> classes are
>>>> > > >> > > >>>>>>> > > > involved collaboratively to provide a feature.
>>>> In
>>>> > our
>>>> > > >> case,
>>>> > > >> > > we
>>>> > > >> > > >>>>>>> have
>>>> > > >> > > >>>>>>> > both
>>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved. And
>>>> there
>>>> > might
>>>> > > >> be
>>>> > > >> > > other
>>>> > > >> > > >>>>>>> > > interfaces
>>>> > > >> > > >>>>>>> > > > on the JM side involved for some future
>>>> features.
>>>> > So the
>>>> > > >> > > >>>>>>> relevant
>>>> > > >> > > >>>>>>> > methods
>>>> > > >> > > >>>>>>> > > > can actually be scattered over the places. That
>>>> > said, we
>>>> > > >> > may
>>>> > > >> > > >>>>>>> still use
>>>> > > >> > > >>>>>>> > > > decorative interfaces for each component, if
>>>> the
>>>> > > >> feature is
>>>> > > >> > > >>>>>>> optional,
>>>> > > >> > > >>>>>>> > > given
>>>> > > >> > > >>>>>>> > > > there is a single source of truth for the
>>>> feature.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards making life
>>>> > easier
>>>> > > >> for
>>>> > > >> > new
>>>> > > >> > > >>>>>>> users,
>>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
>>>> slight
>>>> > expense
>>>> > > >> > for
>>>> > > >> > > >>>>>>> the power
>>>> > > >> > > >>>>>>> > > >> users.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > I actually think the current approach is
>>>> simpler,
>>>> > more
>>>> > > >> > > >>>>>>> extensible and
>>>> > > >> > > >>>>>>> > > more
>>>> > > >> > > >>>>>>> > > > general for all the users. Can you articulate
>>>> a bit
>>>> > > >> more on
>>>> > > >> > > >>>>>>> which part
>>>> > > >> > > >>>>>>> > > you
>>>> > > >> > > >>>>>>> > > > think makes users harder to understand?
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > There is another benefit of the decorative
>>>> > interfaces
>>>> > > >> which
>>>> > > >> > > is
>>>> > > >> > > >>>>>>> not
>>>> > > >> > > >>>>>>> > > > mentioned, but might be worth considering here.
>>>> > Usually
>>>> > > >> the
>>>> > > >> > > >>>>>>> decorative
>>>> > > >> > > >>>>>>> > > > interfaces give slightly better backwards
>>>> > compatibility
>>>> > > >> > than
>>>> > > >> > > >>>>>>> the new
>>>> > > >> > > >>>>>>> > > > default method in the interfaces. That is when
>>>> > users are
>>>> > > >> > > using
>>>> > > >> > > >>>>>>> a jar
>>>> > > >> > > >>>>>>> > that
>>>> > > >> > > >>>>>>> > > > was compiled with an older version of Flink
>>>> which
>>>> > does
>>>> > > >> not
>>>> > > >> > > >>>>>>> have the
>>>> > > >> > > >>>>>>> > > default
>>>> > > >> > > >>>>>>> > > > method in the interfaces in question. A
>>>> decorative
>>>> > > >> > interface
>>>> > > >> > > >>>>>>> may still
>>>> > > >> > > >>>>>>> > > > provide backwards compatibility in that case,
>>>> while
>>>> > > >> default
>>>> > > >> > > >>>>>>> method impl
>>>> > > >> > > >>>>>>> > > > cannot.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > I think in Flink we in general do not guarantee
>>>> > custom
>>>> > > >> > > >>>>>>> components
>>>> > > >> > > >>>>>>> > > compiled
>>>> > > >> > > >>>>>>> > > > with an older version can run with a newer
>>>> version
>>>> > of
>>>> > > >> > Flink.
>>>> > > >> > > A
>>>> > > >> > > >>>>>>> > recompile
>>>> > > >> > > >>>>>>> > > > with a newer version would be required. That
>>>> said,
>>>> > if
>>>> > > >> we do
>>>> > > >> > > >>>>>>> care about
>>>> > > >> > > >>>>>>> > > > this, we can just change the "supportXXX()"
>>>> method
>>>> > in
>>>> > > >> the
>>>> > > >> > > >>>>>>> Source
>>>> > > >> > > >>>>>>> > > interface
>>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave the
>>>> other
>>>> > parts
>>>> > > >> > > >>>>>>> unchanged.
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Thanks,
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>>>> > > >> > > >>>>>>> pnowojski@apache.org>
>>>> > > >> > > >>>>>>> > > > wrote:
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > >> Hi,
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO
>>>> for
>>>> > the
>>>> > > >> last
>>>> > > >> > > two
>>>> > > >> > > >>>>>>> weeks.
>>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be actually
>>>> > against
>>>> > > >> > adding
>>>> > > >> > > >>>>>>> those
>>>> > > >> > > >>>>>>> > > methods
>>>> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons
>>>> mentioned
>>>> > above.
>>>> > > >> > > >>>>>>> Clogging the
>>>> > > >> > > >>>>>>> > > base
>>>> > > >> > > >>>>>>> > > >> interface for new users with tons of methods
>>>> that
>>>> > they
>>>> > > >> do
>>>> > > >> > > not
>>>> > > >> > > >>>>>>> need, do
>>>> > > >> > > >>>>>>> > > not
>>>> > > >> > > >>>>>>> > > >> understand and do not know what to do with
>>>> them.
>>>> > > >> Moreover,
>>>> > > >> > > >>>>>>> such
>>>> > > >> > > >>>>>>> > > decorative
>>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a feature
>>>> > requires
>>>> > > >> two
>>>> > > >> > > or
>>>> > > >> > > >>>>>>> more
>>>> > > >> > > >>>>>>> > > methods
>>>> > > >> > > >>>>>>> > > >> to be implemented at the same time. If we
>>>> have all
>>>> > of
>>>> > > >> the
>>>> > > >> > > >>>>>>> methods with
>>>> > > >> > > >>>>>>> > > >> default implementation in the base interface,
>>>> the
>>>> > API
>>>> > > >> > > doesn't
>>>> > > >> > > >>>>>>> give any
>>>> > > >> > > >>>>>>> > > >> clue
>>>> > > >> > > >>>>>>> > > >> to the user which set of methods are required
>>>> to be
>>>> > > >> > > >>>>>>> implemented at the
>>>> > > >> > > >>>>>>> > > >> same
>>>> > > >> > > >>>>>>> > > >> time.
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
>>>> > > >> interfaces
>>>> > > >> > is
>>>> > > >> > > >>>>>>> which
>>>> > > >> > > >>>>>>> > > >> interface
>>>> > > >> > > >>>>>>> > > >> > they can decorate and which combinations of
>>>> > multiple
>>>> > > >> > > >>>>>>> decorative
>>>> > > >> > > >>>>>>> > > >> interfaces
>>>> > > >> > > >>>>>>> > > >> > are valid. (...)
>>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new feature
>>>> added
>>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data
>>>> aware), are
>>>> > we
>>>> > > >> > going
>>>> > > >> > > >>>>>>> to create
>>>> > > >> > > >>>>>>> > > >> another
>>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
>>>> > SortedSplitReader or
>>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
>>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the
>>>> additional
>>>> > > >> > decorative
>>>> > > >> > > >>>>>>> interface
>>>> > > >> > > >>>>>>> > > >> like
>>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
>>>> > > >> understanding
>>>> > > >> > > >>>>>>> cost of
>>>> > > >> > > >>>>>>> > users
>>>> > > >> > > >>>>>>> > > >> > because they have to know what decorative
>>>> > interfaces
>>>> > > >> are
>>>> > > >> > > >>>>>>> there,
>>>> > > >> > > >>>>>>> > which
>>>> > > >> > > >>>>>>> > > >> > interface they can decorate and which
>>>> > combinations of
>>>> > > >> > the
>>>> > > >> > > >>>>>>> decorative
>>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are not.
>>>> Ideally
>>>> > we
>>>> > > >> want
>>>> > > >> > to
>>>> > > >> > > >>>>>>> avoid
>>>> > > >> > > >>>>>>> > that.
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how embedding
>>>> default
>>>> > > >> methods
>>>> > > >> > > in
>>>> > > >> > > >>>>>>> the base
>>>> > > >> > > >>>>>>> > > >> interface is solving the problem: what can be
>>>> > combined
>>>> > > >> or
>>>> > > >> > > >>>>>>> not? If
>>>> > > >> > > >>>>>>> > there
>>>> > > >> > > >>>>>>> > > >> are
>>>> > > >> > > >>>>>>> > > >> two conflicting features, having decorative
>>>> > interfaces
>>>> > > >> > that
>>>> > > >> > > >>>>>>> can not be
>>>> > > >> > > >>>>>>> > > >> mixed together actually makes much more sense
>>>> to me
>>>> > > >> rather
>>>> > > >> > > >>>>>>> than having
>>>> > > >> > > >>>>>>> > > >> them
>>>> > > >> > > >>>>>>> > > >> all in one base class. How would you allow
>>>> users to
>>>> > > >> > > implement
>>>> > > >> > > >>>>>>> only one
>>>> > > >> > > >>>>>>> > > of
>>>> > > >> > > >>>>>>> > > >> those two features?
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are
>>>> > drawbacks:
>>>> > > >> > > >>>>>>> > > >> - how can a user discover what decorators are
>>>> > there?
>>>> > > >> > > >>>>>>> > > >> - how can a user know where the decorator can
>>>> be
>>>> > > >> applied
>>>> > > >> > to?
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> However those are drawbacks for more power
>>>> users,
>>>> > that
>>>> > > >> can
>>>> > > >> > > be
>>>> > > >> > > >>>>>>> > mitigated
>>>> > > >> > > >>>>>>> > > by
>>>> > > >> > > >>>>>>> > > >> the documentation. For example listing all of
>>>> the
>>>> > > >> > decorators
>>>> > > >> > > >>>>>>> with
>>>> > > >> > > >>>>>>> > > >> detailed explanation both in the docs and in
>>>> the
>>>> > java
>>>> > > >> > docs.
>>>> > > >> > > >>>>>>> More
>>>> > > >> > > >>>>>>> > > >> experienced users will be able to deal with
>>>> those
>>>> > > >> issues
>>>> > > >> > > >>>>>>> easier, as
>>>> > > >> > > >>>>>>> > they
>>>> > > >> > > >>>>>>> > > >> will already have some basic understanding of
>>>> > Flink.
>>>> > > >> Also
>>>> > > >> > if
>>>> > > >> > > >>>>>>> user has
>>>> > > >> > > >>>>>>> > a
>>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he will google
>>>> > search a
>>>> > > >> > > >>>>>>> potential
>>>> > > >> > > >>>>>>> > > solution
>>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing that
>>>> he is
>>>> > very
>>>> > > >> > > likely
>>>> > > >> > > >>>>>>> to
>>>> > > >> > > >>>>>>> > > discover
>>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in the
>>>> docs.
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> The advantages are:
>>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
>>>> > > >> > > >>>>>>> > > >> - all of the methods from a single feature are
>>>> > grouped
>>>> > > >> in
>>>> > > >> > a
>>>> > > >> > > >>>>>>> single
>>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
>>>> dedicated
>>>> > java
>>>> > > >> > doc.
>>>> > > >> > > >>>>>>> It's also
>>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
>>>> > decorator
>>>> > > >> name
>>>> > > >> > > >>>>>>> > > >> - if an optional feature requires two methods
>>>> to be
>>>> > > >> > > >>>>>>> implemented at
>>>> > > >> > > >>>>>>> > once,
>>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards making life
>>>> > easier
>>>> > > >> for
>>>> > > >> > > new
>>>> > > >> > > >>>>>>> users,
>>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo)
>>>> slight
>>>> > expense
>>>> > > >> > for
>>>> > > >> > > >>>>>>> the power
>>>> > > >> > > >>>>>>> > > >> users.
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> Best,
>>>> > > >> > > >>>>>>> > > >> Piotrek
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
>>>> > > >> becket.qin@gmail.com
>>>> > > >> > >
>>>> > > >> > > >>>>>>> > napisał(a):
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good summary. This
>>>> is a
>>>> > > >> really
>>>> > > >> > > >>>>>>> helpful
>>>> > > >> > > >>>>>>> > > >> discussion.
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might
>>>> still
>>>> > be
>>>> > > >> > better
>>>> > > >> > > >>>>>>> to add the
>>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source rather
>>>> than
>>>> > > >> > > SourceReader.
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are trying to do
>>>> > here is
>>>> > > >> to
>>>> > > >> > > let
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > Flink
>>>> > > >> > > >>>>>>> > > >> > framework know what the Source is capable
>>>> of. In
>>>> > this
>>>> > > >> > > FLIP,
>>>> > > >> > > >>>>>>> it
>>>> > > >> > > >>>>>>> > happens
>>>> > > >> > > >>>>>>> > > >> to
>>>> > > >> > > >>>>>>> > > >> > be the capability that only involves
>>>> > SourceReader.
>>>> > > >> But
>>>> > > >> > in
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > future,
>>>> > > >> > > >>>>>>> > > >> it is
>>>> > > >> > > >>>>>>> > > >> > possible that another functionality involves
>>>> > both the
>>>> > > >> > > >>>>>>> > SplitEnumerator
>>>> > > >> > > >>>>>>> > > >> and
>>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case, following the
>>>> current
>>>> > > >> > > approach,
>>>> > > >> > > >>>>>>> we
>>>> > > >> > > >>>>>>> > should
>>>> > > >> > > >>>>>>> > > >> put
>>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
>>>> > SplitEnumerator
>>>> > > >> and
>>>> > > >> > > >>>>>>> SourceReader.
>>>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
>>>> SourceReader,
>>>> > then
>>>> > > >> > the
>>>> > > >> > > >>>>>>> JM would
>>>> > > >> > > >>>>>>> > > have
>>>> > > >> > > >>>>>>> > > >> to
>>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to know
>>>> whether
>>>> > this
>>>> > > >> > > feature
>>>> > > >> > > >>>>>>> is
>>>> > > >> > > >>>>>>> > > >> supported,
>>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we put the
>>>> > > >> > "supportsXXX()"
>>>> > > >> > > >>>>>>> method in
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > >> > Source, we will break the "symmetric" design
>>>> > because
>>>> > > >> > this
>>>> > > >> > > >>>>>>> FLIP
>>>> > > >> > > >>>>>>> > chose a
>>>> > > >> > > >>>>>>> > > >> > different way.
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > This is also why I think supportsXXX()
>>>> method
>>>> > seems a
>>>> > > >> > good
>>>> > > >> > > >>>>>>> thing to
>>>> > > >> > > >>>>>>> > > >> have,
>>>> > > >> > > >>>>>>> > > >> > because when there are a few interfaces /
>>>> methods
>>>> > > >> that
>>>> > > >> > are
>>>> > > >> > > >>>>>>> expected
>>>> > > >> > > >>>>>>> > to
>>>> > > >> > > >>>>>>> > > >> be
>>>> > > >> > > >>>>>>> > > >> > implemented at the same time in order to
>>>> deliver
>>>> > a
>>>> > > >> > > feature,
>>>> > > >> > > >>>>>>> it is
>>>> > > >> > > >>>>>>> > > always
>>>> > > >> > > >>>>>>> > > >> > good to have a single source of truth to
>>>> tell the
>>>> > > >> > > framework
>>>> > > >> > > >>>>>>> what to
>>>> > > >> > > >>>>>>> > > do,
>>>> > > >> > > >>>>>>> > > >> so
>>>> > > >> > > >>>>>>> > > >> > the framework can do consistent things in
>>>> > different
>>>> > > >> > parts.
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <
>>>> sebastian@ververica.com>
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
>>>> > > >> AlignedSourceReader
>>>> > > >> > +
>>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward
>>>> about is
>>>> > > >> that we
>>>> > > >> > > are
>>>> > > >> > > >>>>>>> > > essentially
>>>> > > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
>>>> > > >> implementations to
>>>> > > >> > > >>>>>>> extend
>>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes
>>>> the
>>>> > > >> > SourceReader
>>>> > > >> > > >>>>>>> interface
>>>> > > >> > > >>>>>>> > > >> > without the pausing support useless. So this
>>>> > > >> indicates
>>>> > > >> > > that
>>>> > > >> > > >>>>>>> public
>>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces
>>>> for the
>>>> > same
>>>> > > >> > > >>>>>>> purpose) only
>>>> > > >> > > >>>>>>> > > >> > make sense if the original interface is also
>>>> > > >> expected to
>>>> > > >> > > be
>>>> > > >> > > >>>>>>> used.
>>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add
>>>> the
>>>> > > >> method
>>>> > > >> > to
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > original
>>>> > > >> > > >>>>>>> > > >> > interface itself.
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > Cheers,
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid
>>>> Wysakowicz
>>>> > <
>>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > >> > wrote:
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > I think most of your points aligned with
>>>> the
>>>> > > >> > suggestions
>>>> > > >> > > >>>>>>> I made to
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I
>>>> > believe
>>>> > > >> we
>>>> > > >> > > hit
>>>> > > >> > > >>>>>>> enter
>>>> > > >> > > >>>>>>> > > >> nearly at
>>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
>>>> > > >> supportsWatermarksSplitAlignment
>>>> > > >> > > to
>>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the
>>>> general
>>>> > > >> > capability
>>>> > > >> > > of
>>>> > > >> > > >>>>>>> > pausing.
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
>>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
>>>> > > >> > > >>>>>>> > > (option
>>>> > > >> > > >>>>>>> > > >> b)
>>>> > > >> > > >>>>>>> > > >> > > or if we should just add the methods
>>>> (option
>>>> > c), I
>>>> > > >> > > >>>>>>> suggest to
>>>> > > >> > > >>>>>>> > simply
>>>> > > >> > > >>>>>>> > > >> add
>>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is much
>>>> > preferred
>>>> > > >> > > approach
>>>> > > >> > > >>>>>>> Becket,
>>>> > > >> > > >>>>>>> > > >> which
>>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless there is an
>>>> > opposition
>>>> > > >> > > let's
>>>> > > >> > > >>>>>>> go with
>>>> > > >> > > >>>>>>> > > this
>>>> > > >> > > >>>>>>> > > >> > > option c.
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Best,
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Dawid
>>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis
>>>> wrote:
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Hi folks,
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks
>>>> for
>>>> > all
>>>> > > >> the
>>>> > > >> > > >>>>>>> input and
>>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed
>>>> some
>>>> > time
>>>> > > >> to
>>>> > > >> > > >>>>>>> catch up and
>>>> > > >> > > >>>>>>> > > >> > structure
>>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted
>>>> to
>>>> > find a
>>>> > > >> way
>>>> > > >> > > to
>>>> > > >> > > >>>>>>> > structure
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the
>>>> feeling
>>>> > that
>>>> > > >> some
>>>> > > >> > > >>>>>>> concerns
>>>> > > >> > > >>>>>>> > got
>>>> > > >> > > >>>>>>> > > >> lost
>>>> > > >> > > >>>>>>> > > >> > in
>>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt and
>>>> please
>>>> > > >> correct
>>>> > > >> > > me
>>>> > > >> > > >>>>>>> if
>>>> > > >> > > >>>>>>> > > >> something is
>>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to
>>>> collect and
>>>> > > >> > assemble
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > > opinions,
>>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best
>>>> of my
>>>> > > >> > > >>>>>>> knowledge):
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
>>>> (pause/resume
>>>> > > >> > behavior)
>>>> > > >> > > >>>>>>> be a
>>>> > > >> > > >>>>>>> > general
>>>> > > >> > > >>>>>>> > > >> > > capability?
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > I personally don't see any reason no to
>>>> have
>>>> > it a
>>>> > > >> > > general
>>>> > > >> > > >>>>>>> > capability
>>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit method it is
>>>> > actually
>>>> > > >> > > >>>>>>> independent of
>>>> > > >> > > >>>>>>> > the
>>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a
>>>> > general
>>>> > > >> > > >>>>>>> capability, we
>>>> > > >> > > >>>>>>> > > >> should
>>>> > > >> > > >>>>>>> > > >> > > also agree on the right wording. Does
>>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
>>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then
>>>> actually
>>>> > > >> meant?
>>>> > > >> > (I
>>>> > > >> > > >>>>>>> see it as
>>>> > > >> > > >>>>>>> > > >> okay.
>>>> > > >> > > >>>>>>> > > >> > I
>>>> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid
>>>> > suggested
>>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
>>>> > enable/disable
>>>> > > >> split
>>>> > > >> > > >>>>>>> alignment?
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > I would personally not disable the split
>>>> > alignment
>>>> > > >> on
>>>> > > >> > > the
>>>> > > >> > > >>>>>>> source
>>>> > > >> > > >>>>>>> > > >> reader
>>>> > > >> > > >>>>>>> > > >> > > side because if split alignment is used
>>>> for
>>>> > some
>>>> > > >> other
>>>> > > >> > > >>>>>>> use case
>>>> > > >> > > >>>>>>> > (see
>>>> > > >> > > >>>>>>> > > >> A)
>>>> > > >> > > >>>>>>> > > >> > it
>>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
>>>> other/future
>>>> > use
>>>> > > >> > cases.
>>>> > > >> > > >>>>>>> Instead,
>>>> > > >> > > >>>>>>> > I
>>>> > > >> > > >>>>>>> > > >> would
>>>> > > >> > > >>>>>>> > > >> > > disable "watermark split alignment" where
>>>> I
>>>> > think
>>>> > > >> it
>>>> > > >> > > >>>>>>> should
>>>> > > >> > > >>>>>>> > disable
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for split
>>>> > alignment.
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define the scope
>>>> of a
>>>> > > >> supportsX
>>>> > > >> > > >>>>>>> method
>>>> > > >> > > >>>>>>> > w.r.t.
>>>> > > >> > > >>>>>>> > > >> to
>>>> > > >> > > >>>>>>> > > >> > > the following questions: a) Where is it
>>>> used?
>>>> > and
>>>> > > >> b)
>>>> > > >> > > What
>>>> > > >> > > >>>>>>> is the
>>>> > > >> > > >>>>>>> > > >> expected
>>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not straight-forward
>>>> to
>>>> > > >> provide a
>>>> > > >> > > >>>>>>> meaningful
>>>> > > >> > > >>>>>>> > > >> output,
>>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split
>>>> alignment
>>>> > but
>>>> > > >> > > >>>>>>> SplitReader
>>>> > > >> > > >>>>>>> > not.
>>>> > > >> > > >>>>>>> > > >> This
>>>> > > >> > > >>>>>>> > > >> > is
>>>> > > >> > > >>>>>>> > > >> > > because with the current implementation,
>>>> we can
>>>> > > >> > > determine
>>>> > > >> > > >>>>>>> whether
>>>> > > >> > > >>>>>>> > > >> split
>>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported only during
>>>> > runtime
>>>> > > >> and
>>>> > > >> > > >>>>>>> specifically
>>>> > > >> > > >>>>>>> > > >> > actually
>>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down the
>>>> call
>>>> > > >> hierarchy
>>>> > > >> > up
>>>> > > >> > > >>>>>>> to the
>>>> > > >> > > >>>>>>> > > actual
>>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either
>>>> raise an
>>>> > > >> error or
>>>> > > >> > > >>>>>>> warning if
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > >> > > alignment is called but not supported at
>>>> some
>>>> > > >> point. I
>>>> > > >> > > >>>>>>> know we
>>>> > > >> > > >>>>>>> > > should
>>>> > > >> > > >>>>>>> > > >> > > carefully think about when this could be
>>>> the
>>>> > case
>>>> > > >> > > because
>>>> > > >> > > >>>>>>> we don't
>>>> > > >> > > >>>>>>> > > >> want
>>>> > > >> > > >>>>>>> > > >> > to
>>>> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings. However,
>>>> > warnings
>>>> > > >> > > could
>>>> > > >> > > >>>>>>> be an
>>>> > > >> > > >>>>>>> > > >> indicator
>>>> > > >> > > >>>>>>> > > >> > > for the user that for watermark split
>>>> > alignment use
>>>> > > >> > case
>>>> > > >> > > >>>>>>> split
>>>> > > >> > > >>>>>>> > > >> reading is
>>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to either
>>>> > disable
>>>> > > >> the
>>>> > > >> > > >>>>>>> trigger for
>>>> > > >> > > >>>>>>> > > >> > watermark
>>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
>>>> > use/implement a
>>>> > > >> > source
>>>> > > >> > > >>>>>>> and
>>>> > > >> > > >>>>>>> > reader
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the discussion
>>>> with the
>>>> > the
>>>> > > >> > > various
>>>> > > >> > > >>>>>>> > > >> possibilities
>>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I
>>>> would
>>>> > > >> like to
>>>> > > >> > > >>>>>>> summarize
>>>> > > >> > > >>>>>>> > the
>>>> > > >> > > >>>>>>> > > >> > > following requirements:
>>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
>>>> > ("symmetric"),
>>>> > > >> i.e.,
>>>> > > >> > > >>>>>>> similar
>>>> > > >> > > >>>>>>> > > >> semantics
>>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces with
>>>> similar
>>>> > usage.
>>>> > > >> > > >>>>>>> > > >> > > - Make explicit which implementations
>>>> implement
>>>> > > >> > > >>>>>>> interfaces/support
>>>> > > >> > > >>>>>>> > > >> > > behavior.
>>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
>>>> implementations
>>>> > and
>>>> > > >> how
>>>> > > >> > to
>>>> > > >> > > >>>>>>> implement
>>>> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the relations
>>>> > between
>>>> > > >> > > >>>>>>> relevant
>>>> > > >> > > >>>>>>> > classes
>>>> > > >> > > >>>>>>> > > of
>>>> > > >> > > >>>>>>> > > >> > the
>>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|--
>>>> SourceReaderBase
>>>> > > >> (Internal)
>>>> > > >> > > >>>>>>> <|-- ..
>>>> > > >> > > >>>>>>> > <|--
>>>> > > >> > > >>>>>>> > > >> > > MySourceReader
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager
>>>> > (Internal)
>>>> > > >> > <>--
>>>> > > >> > > >>>>>>> > SplitFetcher
>>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
>>>> > > >> > MySplitReader
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A
>>>> "has
>>>> > a"
>>>> > > >> B)
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
>>>> > SplitFetcherManager
>>>> > > >> > > >>>>>>> implement most
>>>> > > >> > > >>>>>>> > of
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just because
>>>> we
>>>> > wanted
>>>> > > >> to
>>>> > > >> > > >>>>>>> implement
>>>> > > >> > > >>>>>>> > > split
>>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be available as
>>>> > kind of
>>>> > > >> a
>>>> > > >> > > >>>>>>> default. As a
>>>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
>>>> > implementation" for
>>>> > > >> > > >>>>>>> SourceReader
>>>> > > >> > > >>>>>>> > > that
>>>> > > >> > > >>>>>>> > > >> > > implements the actual logic for split
>>>> > alignment.
>>>> > > >> For
>>>> > > >> > > that
>>>> > > >> > > >>>>>>> reason,
>>>> > > >> > > >>>>>>> > I
>>>> > > >> > > >>>>>>> > > >> find
>>>> > > >> > > >>>>>>> > > >> > it
>>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
>>>> > > >> implementation
>>>> > > >> > in
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > > interface
>>>> > > >> > > >>>>>>> > > >> for
>>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence,
>>>> interface
>>>> > > >> strategy
>>>> > > >> > > c)
>>>> > > >> > > >>>>>>> is
>>>> > > >> > > >>>>>>> > > difficult
>>>> > > >> > > >>>>>>> > > >> > > because this would require NOOP default
>>>> > > >> > implementations
>>>> > > >> > > >>>>>>> in the
>>>> > > >> > > >>>>>>> > > public
>>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and
>>>> SplitReader.
>>>> > This is
>>>> > > >> > the
>>>> > > >> > > >>>>>>> same for
>>>> > > >> > > >>>>>>> > > >> strategy
>>>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP default
>>>> > > >> > implementation
>>>> > > >> > > >>>>>>> in the
>>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described
>>>> > method
>>>> > > >> > > >>>>>>> signatures of
>>>> > > >> > > >>>>>>> > > >> alignSplit
>>>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ
>>>> and it
>>>> > > >> would
>>>> > > >> > be
>>>> > > >> > > >>>>>>> extremely
>>>> > > >> > > >>>>>>> > > >> > difficult
>>>> > > >> > > >>>>>>> > > >> > > to make the signatures the same (with even
>>>> > > >> potential
>>>> > > >> > > >>>>>>> performance
>>>> > > >> > > >>>>>>> > > >> impact
>>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of split
>>>> ids).
>>>> > > >> > Therefore,
>>>> > > >> > > >>>>>>> having a
>>>> > > >> > > >>>>>>> > > >> > symmetric
>>>> > > >> > > >>>>>>> > > >> > > decorative interface as of strategy a) is
>>>> > actually
>>>> > > >> not
>>>> > > >> > > >>>>>>> possible
>>>> > > >> > > >>>>>>> > and
>>>> > > >> > > >>>>>>> > > >> > having
>>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with different
>>>> method
>>>> > > >> > > >>>>>>> signatures is
>>>> > > >> > > >>>>>>> > > >> confusing.
>>>> > > >> > > >>>>>>> > > >> > My
>>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best with
>>>> strategy b)
>>>> > > >> which
>>>> > > >> > > >>>>>>> means to
>>>> > > >> > > >>>>>>> > have
>>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit
>>>> from
>>>> > the
>>>> > > >> > parent
>>>> > > >> > > >>>>>>> > interface:
>>>> > > >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader,
>>>> > SplitReader
>>>> > > >> > <|--
>>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
>>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what
>>>> the
>>>> > > >> > > implications
>>>> > > >> > > >>>>>>> are and
>>>> > > >> > > >>>>>>> > if
>>>> > > >> > > >>>>>>> > > >> this
>>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest that
>>>> Dawid
>>>> > and I
>>>> > > >> just
>>>> > > >> > > >>>>>>> try to
>>>> > > >> > > >>>>>>> > > >> implement
>>>> > > >> > > >>>>>>> > > >> > and
>>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > # Summary
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know your
>>>> > > >> perspectives.
>>>> > > >> > > >>>>>>> Please
>>>> > > >> > > >>>>>>> > correct
>>>> > > >> > > >>>>>>> > > >> me,
>>>> > > >> > > >>>>>>> > > >> > if
>>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
>>>> > > >> something. My
>>>> > > >> > > >>>>>>> perspective
>>>> > > >> > > >>>>>>> > > >> would
>>>> > > >> > > >>>>>>> > > >> > be:
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
>>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger
>>>> for
>>>> > split
>>>> > > >> > > >>>>>>> alignment)
>>>> > > >> > > >>>>>>> > > >> > > Top C: No
>>>> > > >> > > >>>>>>> > > >> > > Top D: b)
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > Best,
>>>> > > >> > > >>>>>>> > > >> > > Sebastian
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid
>>>> > Wysakowicz <
>>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > > wrote:
>>>> > > >> > > >>>>>>> > > >> > >
>>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
>>>> > approach, I
>>>> > > >> > fear
>>>> > > >> > > >>>>>>> that it
>>>> > > >> > > >>>>>>> > > >> doesn't
>>>> > > >> > > >>>>>>> > > >> > work
>>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
>>>> > always
>>>> > > >> be
>>>> > > >> > > >>>>>>> aligned
>>>> > > >> > > >>>>>>> > > >> cross-task
>>>> > > >> > > >>>>>>> > > >> > and
>>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment.
>>>> So
>>>> > it's
>>>> > > >> > > >>>>>>> plausible to put
>>>> > > >> > > >>>>>>> > > >> > sources
>>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they
>>>> do
>>>> > not
>>>> > > >> use
>>>> > > >> > any
>>>> > > >> > > >>>>>>> of the
>>>> > > >> > > >>>>>>> > > >> presented
>>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>>>> > > >> warning, if
>>>> > > >> > > >>>>>>> they handle
>>>> > > >> > > >>>>>>> > > >> > multiple
>>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would
>>>> be for
>>>> > > >> telling
>>>> > > >> > > if
>>>> > > >> > > >>>>>>> it
>>>> > > >> > > >>>>>>> > > supports
>>>> > > >> > > >>>>>>> > > >> > that
>>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task
>>>> alignment
>>>> > would
>>>> > > >> > always
>>>> > > >> > > >>>>>>> be
>>>> > > >> > > >>>>>>> > > supported.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I believe
>>>> to be
>>>> > > >> closest
>>>> > > >> > > to a
>>>> > > >> > > >>>>>>> > consensus
>>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do you mind
>>>> taking a
>>>> > > >> look?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the
>>>> nits?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully
>>>> agree
>>>> > > >> that
>>>> > > >> > > >>>>>>> decorator
>>>> > > >> > > >>>>>>> > > >> interfaces
>>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big
>>>> fan
>>>> > of
>>>> > > >> > > >>>>>>> overloading the
>>>> > > >> > > >>>>>>> > > base
>>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as
>>>> > is). The
>>>> > > >> > > usual
>>>> > > >> > > >>>>>>> feedback
>>>> > > >> > > >>>>>>> > > to
>>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are always that
>>>> > they are
>>>> > > >> > > >>>>>>> overwhelming
>>>> > > >> > > >>>>>>> > and
>>>> > > >> > > >>>>>>> > > >> too
>>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not
>>>> > oppose
>>>> > > >> 1c)
>>>> > > >> > as
>>>> > > >> > > >>>>>>> scattered
>>>> > > >> > > >>>>>>> > > >> > interfaces
>>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike
>>>> 1a) and
>>>> > 1d).
>>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
>>>> > approach, I
>>>> > > >> > fear
>>>> > > >> > > >>>>>>> that it
>>>> > > >> > > >>>>>>> > > >> doesn't
>>>> > > >> > > >>>>>>> > > >> > work
>>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
>>>> > always
>>>> > > >> be
>>>> > > >> > > >>>>>>> aligned
>>>> > > >> > > >>>>>>> > > >> cross-task
>>>> > > >> > > >>>>>>> > > >> > and
>>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment.
>>>> So
>>>> > it's
>>>> > > >> > > >>>>>>> plausible to put
>>>> > > >> > > >>>>>>> > > >> > sources
>>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they
>>>> do
>>>> > not
>>>> > > >> use
>>>> > > >> > any
>>>> > > >> > > >>>>>>> of the
>>>> > > >> > > >>>>>>> > > >> presented
>>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>>>> > > >> warning, if
>>>> > > >> > > >>>>>>> they handle
>>>> > > >> > > >>>>>>> > > >> > multiple
>>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to
>>>> facilitate
>>>> > future
>>>> > > >> use
>>>> > > >> > > >>>>>>> cases makes
>>>> > > >> > > >>>>>>> > > >> sense
>>>> > > >> > > >>>>>>> > > >> > but
>>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen)
>>>> should
>>>> > be
>>>> > > >> > > adjusted
>>>> > > >> > > >>>>>>> > > >> accordingly.
>>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
>>>> > PausingSourceReader
>>>> > > >> and
>>>> > > >> > > I'd
>>>> > > >> > > >>>>>>> go for
>>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal
>>>> > afaik). We
>>>> > > >> > could
>>>> > > >> > > >>>>>>> also
>>>> > > >> > > >>>>>>> > split
>>>> > > >> > > >>>>>>> > > it
>>>> > > >> > > >>>>>>> > > >> > into
>>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
>>>> > > >> pauseOrResumeSplits
>>>> > > >> > > >>>>>>> may allow
>>>> > > >> > > >>>>>>> > > >> Sources
>>>> > > >> > > >>>>>>> > > >> > to
>>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as
>>>> > written
>>>> > > >> in
>>>> > > >> > > the
>>>> > > >> > > >>>>>>> > Javadoc),
>>>> > > >> > > >>>>>>> > > >> both
>>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not
>>>> > sure if
>>>> > > >> > there
>>>> > > >> > > >>>>>>> is a
>>>> > > >> > > >>>>>>> > system
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >> can.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
>>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
>>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the
>>>> capability
>>>> > > >> > section:
>>>> > > >> > > >>>>>>> Sources
>>>> > > >> > > >>>>>>> > that
>>>> > > >> > > >>>>>>> > > >> > adopt
>>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in earlier
>>>> > > >> versions. So
>>>> > > >> > > it
>>>> > > >> > > >>>>>>> feels
>>>> > > >> > > >>>>>>> > like
>>>> > > >> > > >>>>>>> > > >> we
>>>> > > >> > > >>>>>>> > > >> > are
>>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old sources can
>>>> be
>>>> > used
>>>> > > >> > after
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > change);
>>>> > > >> > > >>>>>>> > > >> but
>>>> > > >> > > >>>>>>> > > >> > I
>>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API addition.
>>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what happens
>>>> when all
>>>> > > >> splits
>>>> > > >> > > are
>>>> > > >> > > >>>>>>> paused.
>>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how the 3
>>>> flavors
>>>> > of
>>>> > > >> > > >>>>>>> SourceReaderBase
>>>> > > >> > > >>>>>>> > > >> > interact
>>>> > > >> > > >>>>>>> > > >> > >> with the interface.
>>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to
>>>> include
>>>> > Kafka
>>>> > > >> and
>>>> > > >> > > >>>>>>> Pulsar in
>>>> > > >> > > >>>>>>> > the
>>>> > > >> > > >>>>>>> > > >> > FLIP.
>>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate
>>>> follow-up
>>>> > work.
>>>> > > >> > (could
>>>> > > >> > > >>>>>>> be in the
>>>> > > >> > > >>>>>>> > > >> same
>>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Arvid
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid
>>>> > Wysakowicz
>>>> > > >> <
>>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>>> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
>>>> SourceReader,
>>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>>> > > >> > > >>>>>>> > > >> along
>>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
>>>> SplitReader,
>>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>>> > > >> > > >>>>>>> > > or
>>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
>>>> > AlignedSourceReader"
>>>> > > >> > and
>>>> > > >> > > >>>>>>> > > "MySplitReader
>>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
>>>> SourceReader"
>>>> > and
>>>> > > >> > > >>>>>>> "MySplitReader
>>>> > > >> > > >>>>>>> > > >> > implements
>>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
>>>> Dawid
>>>> > > >> would
>>>> > > >> > > be:
>>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
>>>> SourceReader"
>>>> > and
>>>> > > >> > > >>>>>>> "MySplitReader
>>>> > > >> > > >>>>>>> > > >> > implements
>>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
>>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
>>>> personally
>>>> > > >> > speaking I
>>>> > > >> > > >>>>>>> think it
>>>> > > >> > > >>>>>>> > is
>>>> > > >> > > >>>>>>> > > >> > simpler
>>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
>>>> reader
>>>> > with
>>>> > > >> > > >>>>>>> default impl.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to have it
>>>> > aligned as
>>>> > > >> > much
>>>> > > >> > > as
>>>> > > >> > > >>>>>>> > possible.
>>>> > > >> > > >>>>>>> > > >> I'd
>>>> > > >> > > >>>>>>> > > >> > be
>>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
>>>> > > >> > AlignedSplitReader
>>>> > > >> > > >>>>>>> with
>>>> > > >> > > >>>>>>> > > >> > SplitReader. We
>>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
>>>> > things in
>>>> > > >> > the
>>>> > > >> > > >>>>>>> future?
>>>> > > >> > > >>>>>>> > For
>>>> > > >> > > >>>>>>> > > >> > example,
>>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
>>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
>>>> limits. This
>>>> > > >> > feature
>>>> > > >> > > >>>>>>> may also
>>>> > > >> > > >>>>>>> > > need
>>>> > > >> > > >>>>>>> > > >> an
>>>> > > >> > > >>>>>>> > > >> > >> interface like this.
>>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
>>>> interface to
>>>> > > >> support
>>>> > > >> > > >>>>>>> align other
>>>> > > >> > > >>>>>>> > > >> > things, I
>>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>>>> > > >> > > >>>>>>> > > >> instead
>>>> > > >> > > >>>>>>> > > >> > of
>>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it
>>>> would
>>>> > > >> usually
>>>> > > >> > be
>>>> > > >> > > >>>>>>> more
>>>> > > >> > > >>>>>>> > > >> extensible
>>>> > > >> > > >>>>>>> > > >> > to
>>>> > > >> > > >>>>>>> > > >> > >> just describe what the method actually
>>>> does,
>>>> > > >> instead
>>>> > > >> > of
>>>> > > >> > > >>>>>>> assuming
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For example, in
>>>> this
>>>> > case,
>>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
>>>> > > >> > > >>>>>>> > > >> > >> would be more extensible because this
>>>> can be
>>>> > used
>>>> > > >> for
>>>> > > >> > > >>>>>>> any kind of
>>>> > > >> > > >>>>>>> > > >> flow
>>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark alignment or
>>>> simple
>>>> > rate
>>>> > > >> > > >>>>>>> limiting.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
>>>> think
>>>> > > >> they
>>>> > > >> > > have
>>>> > > >> > > >>>>>>> their
>>>> > > >> > > >>>>>>> > own
>>>> > > >> > > >>>>>>> > > >> > >> advantages.
>>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces
>>>> for
>>>> > > >> extending
>>>> > > >> > > >>>>>>> abilities
>>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
>>>> methods in
>>>> > > >> this
>>>> > > >> > > >>>>>>> case. When
>>>> > > >> > > >>>>>>> > you
>>>> > > >> > > >>>>>>> > > >> have
>>>> > > >> > > >>>>>>> > > >> > a
>>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
>>>> help
>>>> > to
>>>> > > >> > > >>>>>>> organize them
>>>> > > >> > > >>>>>>> > and
>>>> > > >> > > >>>>>>> > > >> make
>>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
>>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have
>>>> an
>>>> > > >> ability.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if
>>>> this is
>>>> > a
>>>> > > >> > general
>>>> > > >> > > >>>>>>> design
>>>> > > >> > > >>>>>>> > > >> pattern
>>>> > > >> > > >>>>>>> > > >> > in
>>>> > > >> > > >>>>>>> > > >> > >> the other components in Flink. But it
>>>> looks
>>>> > like
>>>> > > >> the
>>>> > > >> > > >>>>>>> current API
>>>> > > >> > > >>>>>>> > > >> > proposal
>>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> The current proposal is essentially
>>>> > > >> "MySourceReader
>>>> > > >> > > >>>>>>> implements
>>>> > > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment",
>>>> along with
>>>> > > >> > > >>>>>>> "MySplitReader
>>>> > > >> > > >>>>>>> > > >> implements
>>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd
>>>> > consider any
>>>> > > >> > one
>>>> > > >> > > >>>>>>> of the
>>>> > > >> > > >>>>>>> > > >> following
>>>> > > >> > > >>>>>>> > > >> > as
>>>> > > >> > > >>>>>>> > > >> > >> symmetric.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
>>>> SourceReader,
>>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>>> > > >> > > >>>>>>> > > >> along
>>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
>>>> SplitReader,
>>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>>> > > >> > > >>>>>>> > > or
>>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
>>>> > AlignedSourceReader"
>>>> > > >> > and
>>>> > > >> > > >>>>>>> > > "MySplitReader
>>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
>>>> SourceReader"
>>>> > and
>>>> > > >> > > >>>>>>> "MySplitReader
>>>> > > >> > > >>>>>>> > > >> > implements
>>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
>>>> Dawid
>>>> > > >> would
>>>> > > >> > > be:
>>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
>>>> SourceReader"
>>>> > and
>>>> > > >> > > >>>>>>> "MySplitReader
>>>> > > >> > > >>>>>>> > > >> > implements
>>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
>>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
>>>> personally
>>>> > > >> > speaking I
>>>> > > >> > > >>>>>>> think it
>>>> > > >> > > >>>>>>> > is
>>>> > > >> > > >>>>>>> > > >> > simpler
>>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
>>>> reader
>>>> > with
>>>> > > >> > > >>>>>>> default impl.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <
>>>> dwysakowicz@apache.org> <
>>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > > >> <
>>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>>> dwysakowicz@apache.org
>>>> > >,
>>>> > > >> > thanks
>>>> > > >> > > >>>>>>> for the
>>>> > > >> > > >>>>>>> > > reply.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
>>>> preference
>>>> > > >> and I
>>>> > > >> > > >>>>>>> agree most
>>>> > > >> > > >>>>>>> > of
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
>>>> fine
>>>> > > >> > following
>>>> > > >> > > >>>>>>> your
>>>> > > >> > > >>>>>>> > > >> suggestion
>>>> > > >> > > >>>>>>> > > >> > to
>>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
>>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
>>>> > > >> > > >>>>>>> > > >> but
>>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not
>>>> in the
>>>> > > >> Source
>>>> > > >> > to
>>>> > > >> > > >>>>>>> keep the
>>>> > > >> > > >>>>>>> > > two
>>>> > > >> > > >>>>>>> > > >> > >> methods together.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX"
>>>> in
>>>> > Source
>>>> > > >> is
>>>> > > >> > > >>>>>>> that this
>>>> > > >> > > >>>>>>> > > allows
>>>> > > >> > > >>>>>>> > > >> > some
>>>> > > >> > > >>>>>>> > > >> > >> compile time check. For example, if a
>>>> user
>>>> > enabled
>>>> > > >> > > >>>>>>> watermark
>>>> > > >> > > >>>>>>> > > >> alignment
>>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by the Source,
>>>> an
>>>> > > >> exception
>>>> > > >> > > >>>>>>> can be
>>>> > > >> > > >>>>>>> > thrown
>>>> > > >> > > >>>>>>> > > >> at
>>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general
>>>> useful. That
>>>> > > >> said,
>>>> > > >> > I
>>>> > > >> > > >>>>>>> agree that
>>>> > > >> > > >>>>>>> > > API
>>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the
>>>> two
>>>> > > >> methods
>>>> > > >> > > >>>>>>> together.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
>>>> > > >> > > >>>>>>> imjark@gmail.com> <
>>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
>>>> > > >> imjark@gmail.com>
>>>> > > >> > > >>>>>>> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182.
>>>> > Sorry I
>>>> > > >> did
>>>> > > >> > > >>>>>>> miss it.
>>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >> Jark
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
>>>> > Wysakowicz <
>>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
>>>> dwysakowicz@apache.org>
>>>> > <
>>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Jark:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
>>>> > watermarks
>>>> > > >> > when
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > source
>>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
>>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>>>> > > >> alignment
>>>> > > >> > > even
>>>> > > >> > > >>>>>>> if Kafka
>>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
>>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
>>>> somehow. I
>>>> > > >> agree
>>>> > > >> > > >>>>>>> with Becket
>>>> > > >> > > >>>>>>> > > >> > >> we may need a
>>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
>>>> users to
>>>> > > >> > > configure
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > source
>>>> > > >> > > >>>>>>> > > >> to
>>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>>>> > > >> > > maxDesiredWatermark?
>>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
>>>> maxDesiredWatermark
>>>> > will
>>>> > > >> > > >>>>>>> greatly affect
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> throughput
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
>>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
>>>> users
>>>> > > >> > > configure
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > > >> alignment
>>>> > > >> > > >>>>>>> > > >> > >> offset?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1]
>>>> which
>>>> > has
>>>> > > >> > been
>>>> > > >> > > >>>>>>> already
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> implemented
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled
>>>> with the
>>>> > > >> > > watermark
>>>> > > >> > > >>>>>>> strategy
>>>> > > >> > > >>>>>>> > > >> which
>>>> > > >> > > >>>>>>> > > >> > >> also configures drift and update
>>>> interval.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
>>>> interface to
>>>> > > >> support
>>>> > > >> > > >>>>>>> align other
>>>> > > >> > > >>>>>>> > > >> things,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>>>> > > >> > > >>>>>>> > > >> instead
>>>> > > >> > > >>>>>>> > > >> > of
>>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Becket:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the other
>>>> hand
>>>> > putting
>>>> > > >> > all
>>>> > > >> > > >>>>>>> methods,
>>>> > > >> > > >>>>>>> > > even
>>>> > > >> > > >>>>>>> > > >> > with
>>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling
>>>> certain
>>>> > > >> features,
>>>> > > >> > > >>>>>>> makes the
>>>> > > >> > > >>>>>>> > > entry
>>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new source
>>>> higher.
>>>> > > >> Instead of
>>>> > > >> > > >>>>>>> focusing on
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> basic
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> and required properties of the Source,
>>>> the
>>>> > person
>>>> > > >> > > >>>>>>> implementing a
>>>> > > >> > > >>>>>>> > > >> source
>>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to figure out
>>>> what
>>>> > all
>>>> > > >> of
>>>> > > >> > the
>>>> > > >> > > >>>>>>> extra
>>>> > > >> > > >>>>>>> > > >> features
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> are
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with them. It
>>>> makes it
>>>> > also
>>>> > > >> > > harder
>>>> > > >> > > >>>>>>> to
>>>> > > >> > > >>>>>>> > > organize
>>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
>>>> preference
>>>> > > >> and I
>>>> > > >> > > >>>>>>> agree most
>>>> > > >> > > >>>>>>> > of
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
>>>> fine
>>>> > > >> > following
>>>> > > >> > > >>>>>>> your
>>>> > > >> > > >>>>>>> > > >> suggestion
>>>> > > >> > > >>>>>>> > > >> > to
>>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
>>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
>>>> > > >> > > >>>>>>> > > >> but
>>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not
>>>> in the
>>>> > > >> Source
>>>> > > >> > to
>>>> > > >> > > >>>>>>> keep the
>>>> > > >> > > >>>>>>> > > two
>>>> > > >> > > >>>>>>> > > >> > >> methods together.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate
>>>> the
>>>> > > >> > > >>>>>>> "alignSplits"
>>>> > > >> > > >>>>>>> > methods
>>>> > > >> > > >>>>>>> > > >> > differ
>>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and
>>>> SpitReader. The
>>>> > > >> reason
>>>> > > >> > > for
>>>> > > >> > > >>>>>>> that is
>>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with
>>>> SplitIds,
>>>> > whereas
>>>> > > >> > > >>>>>>> SplitReader
>>>> > > >> > > >>>>>>> > > needs
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
>>>> > > >> discrepancy
>>>> > > >> > > >>>>>>> acceptable
>>>> > > >> > > >>>>>>> > for
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes
>>>> significantly,
>>>> > > >> especially
>>>> > > >> > > as
>>>> > > >> > > >>>>>>> they
>>>> > > >> > > >>>>>>> > would
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> highly
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we would
>>>> have to
>>>> > > >> perform
>>>> > > >> > > >>>>>>> additional
>>>> > > >> > > >>>>>>> > > >> > lookups.
>>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary
>>>> > interface.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> [1]
>>>> > https://cwiki.apache.org/confluence/x/hQYBCw
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and
>>>> Sebastian!
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I
>>>> > missed
>>>> > > >> > > >>>>>>> something).
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
>>>> > watermarks
>>>> > > >> > when
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > source
>>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
>>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>>>> > > >> alignment
>>>> > > >> > > even
>>>> > > >> > > >>>>>>> if Kafka
>>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
>>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
>>>> somehow. I
>>>> > > >> agree
>>>> > > >> > > >>>>>>> with Becket
>>>> > > >> > > >>>>>>> > > >> > >> we may need a
>>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
>>>> users to
>>>> > > >> > > configure
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > source
>>>> > > >> > > >>>>>>> > > >> to
>>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>>>> > > >> > > maxDesiredWatermark?
>>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
>>>> maxDesiredWatermark
>>>> > will
>>>> > > >> > > >>>>>>> greatly affect
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> throughput
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
>>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
>>>> users
>>>> > > >> > > configure
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > > >> alignment
>>>> > > >> > > >>>>>>> > > >> > >> offset?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
>>>> > things in
>>>> > > >> > the
>>>> > > >> > > >>>>>>> future?
>>>> > > >> > > >>>>>>> > For
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> example,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
>>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate
>>>> limits. This
>>>> > > >> > feature
>>>> > > >> > > >>>>>>> may also
>>>> > > >> > > >>>>>>> > > need
>>>> > > >> > > >>>>>>> > > >> an
>>>> > > >> > > >>>>>>> > > >> > >> interface like this.
>>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this
>>>> interface to
>>>> > > >> support
>>>> > > >> > > >>>>>>> align other
>>>> > > >> > > >>>>>>> > > >> things,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>>>> > > >> > > >>>>>>> > > >> instead
>>>> > > >> > > >>>>>>> > > >> > of
>>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
>>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
>>>> think
>>>> > > >> they
>>>> > > >> > > have
>>>> > > >> > > >>>>>>> their
>>>> > > >> > > >>>>>>> > own
>>>> > > >> > > >>>>>>> > > >> > >> advantages.
>>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces
>>>> for
>>>> > > >> extending
>>>> > > >> > > >>>>>>> abilities
>>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
>>>> methods in
>>>> > > >> this
>>>> > > >> > > >>>>>>> case. When
>>>> > > >> > > >>>>>>> > you
>>>> > > >> > > >>>>>>> > > >> have
>>>> > > >> > > >>>>>>> > > >> > a
>>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
>>>> help
>>>> > to
>>>> > > >> > > >>>>>>> organize them
>>>> > > >> > > >>>>>>> > and
>>>> > > >> > > >>>>>>> > > >> make
>>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
>>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have
>>>> an
>>>> > > >> ability.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >> Jark
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin
>>>> <
>>>> > > >> > > >>>>>>> becket.qin@gmail.com>
>>>> > > >> > > >>>>>>> > <
>>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com>
>>>> <
>>>> > > >> > > >>>>>>> becket.qin@gmail.com>
>>>> > > >> > > >>>>>>> > <
>>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies
>>>> that I
>>>> > > >> somehow
>>>> > > >> > > >>>>>>> misread a
>>>> > > >> > > >>>>>>> > > bunch
>>>> > > >> > > >>>>>>> > > >> of
>>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default implementation,
>>>> I was
>>>> > > >> > thinking
>>>> > > >> > > >>>>>>> of the
>>>> > > >> > > >>>>>>> > > default
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> no-op
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little worried
>>>> about
>>>> > the
>>>> > > >> > > >>>>>>> proliferation of
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> decorative
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most important
>>>> thing
>>>> > about
>>>> > > >> > > >>>>>>> interfaces is
>>>> > > >> > > >>>>>>> > > that
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> they
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this case, I
>>>> prefer
>>>> > > >> adding
>>>> > > >> > > >>>>>>> new method
>>>> > > >> > > >>>>>>> > to
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > >> existing interface for the following
>>>> reasons:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of
>>>> decorative
>>>> > > >> > interfaces
>>>> > > >> > > >>>>>>> is which
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> interface
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which combinations
>>>> of
>>>> > > >> multiple
>>>> > > >> > > >>>>>>> decorative
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> interfaces
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
>>>> > > >> > withSplitsAlignment
>>>> > > >> > > >>>>>>> interface
>>>> > > >> > > >>>>>>> > > is
>>>> > > >> > > >>>>>>> > > >> > only
>>>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader which
>>>> means it
>>>> > > >> can't
>>>> > > >> > > >>>>>>> decorate any
>>>> > > >> > > >>>>>>> > > >> other
>>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface design
>>>> > perspective, a
>>>> > > >> > > >>>>>>> natural
>>>> > > >> > > >>>>>>> > question
>>>> > > >> > > >>>>>>> > > >> is
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> why
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
>>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
>>>> > > >> > > >>>>>>> > > is
>>>> > > >> > > >>>>>>> > > >> > also
>>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split reader
>>>> > implementing
>>>> > > >> > both
>>>> > > >> > > >>>>>>> > SplitReader
>>>> > > >> > > >>>>>>> > > >> and
>>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a
>>>> > source
>>>> > > >> > reader
>>>> > > >> > > >>>>>>> > implementing
>>>> > > >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment
>>>> works.
>>>> > So why
>>>> > > >> > > isn't
>>>> > > >> > > >>>>>>> there an
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> interface
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if
>>>> > there
>>>> > > >> is a
>>>> > > >> > > new
>>>> > > >> > > >>>>>>> feature
>>>> > > >> > > >>>>>>> > > >> added
>>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data
>>>> aware),
>>>> > are
>>>> > > >> we
>>>> > > >> > > >>>>>>> going to
>>>> > > >> > > >>>>>>> > create
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> another
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
>>>> > > >> SortedSplitReader or
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think the
>>>> > additional
>>>> > > >> > > >>>>>>> decorative
>>>> > > >> > > >>>>>>> > > interface
>>>> > > >> > > >>>>>>> > > >> > like
>>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases
>>>> the
>>>> > > >> > > understanding
>>>> > > >> > > >>>>>>> cost of
>>>> > > >> > > >>>>>>> > > >> users
>>>> > > >> > > >>>>>>> > > >> > >> because they have to know what decorative
>>>> > > >> interfaces
>>>> > > >> > > are
>>>> > > >> > > >>>>>>> there,
>>>> > > >> > > >>>>>>> > > which
>>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and which
>>>> > > >> combinations of
>>>> > > >> > > the
>>>> > > >> > > >>>>>>> > > decorative
>>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which are not.
>>>> > Ideally we
>>>> > > >> > want
>>>> > > >> > > >>>>>>> to avoid
>>>> > > >> > > >>>>>>> > > >> that.
>>>> > > >> > > >>>>>>> > > >> > To
>>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an
>>>> > interface of
>>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> it
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an
>>>> internal
>>>> > > >> interface
>>>> > > >> > > and
>>>> > > >> > > >>>>>>> let
>>>> > > >> > > >>>>>>> > > >> > SourceReader
>>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the
>>>> SourceReader
>>>> > with a
>>>> > > >> > > default
>>>> > > >> > > >>>>>>> > > >> implementation
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> of
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic branching
>>>> in the
>>>> > > >> source
>>>> > > >> > > >>>>>>> logic,
>>>> > > >> > > >>>>>>> > > >> especially
>>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast
>>>> majority of
>>>> > the
>>>> > > >> > > >>>>>>> SourceReader
>>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all, would just
>>>> extend
>>>> > > >> from
>>>> > > >> > the
>>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
>>>> > > >> > > >>>>>>> > > >> > >> That means adding a new method to the
>>>> > interface
>>>> > > >> would
>>>> > > >> > > >>>>>>> effectively
>>>> > > >> > > >>>>>>> > > >> give
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> the
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle that may be
>>>> > worth
>>>> > > >> > > >>>>>>> discussing is how
>>>> > > >> > > >>>>>>> > > do
>>>> > > >> > > >>>>>>> > > >> we
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> let
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell Flink
>>>> what
>>>> > > >> capability
>>>> > > >> > > is
>>>> > > >> > > >>>>>>> > supported
>>>> > > >> > > >>>>>>> > > >> and
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> what
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the
>>>> most
>>>> > > >> intuitive
>>>> > > >> > > >>>>>>> place to me
>>>> > > >> > > >>>>>>> > > is
>>>> > > >> > > >>>>>>> > > >> in
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> the
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is the
>>>> entrance
>>>> > of the
>>>> > > >> > > >>>>>>> entire Source
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> connector
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> logic.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am
>>>> wondering
>>>> > if the
>>>> > > >> > > >>>>>>> following
>>>> > > >> > > >>>>>>> > > >> interface
>>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand by the
>>>> users.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to
>>>> internal
>>>> > > >> interface,
>>>> > > >> > > >>>>>>> let both
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> SourceReader
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a
>>>> default
>>>> > > >> no-op
>>>> > > >> > > >>>>>>> > > implementation.
>>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
>>>> > > >> supportSplitsAlignment()"
>>>> > > >> > > to
>>>> > > >> > > >>>>>>> the
>>>> > > >> > > >>>>>>> > Source
>>>> > > >> > > >>>>>>> > > >> > >> interface, with a default implementation
>>>> > returning
>>>> > > >> > > >>>>>>> false. Sources
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> have
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic can
>>>> change
>>>> > this to
>>>> > > >> > > >>>>>>> return true,
>>>> > > >> > > >>>>>>> > and
>>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
>>>> > > >> > SourceReader
>>>> > > >> > > /
>>>> > > >> > > >>>>>>> > > SplitReader
>>>> > > >> > > >>>>>>> > > >> if
>>>> > > >> > > >>>>>>> > > >> > >> needed.
>>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new optional
>>>> feature is
>>>> > > >> going
>>>> > > >> > to
>>>> > > >> > > >>>>>>> be added
>>>> > > >> > > >>>>>>> > to
>>>> > > >> > > >>>>>>> > > >> the
>>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires the
>>>> > awareness
>>>> > > >> from
>>>> > > >> > > >>>>>>> Flink, we
>>>> > > >> > > >>>>>>> > can
>>>> > > >> > > >>>>>>> > > >> add
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> more
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> What do you think?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid
>>>> > Wysakowicz <
>>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>>> dwysakowicz@apache.org>
>>>> > <
>>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
>>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
>>>> > dwysakowicz@apache.org>
>>>> > > >> <
>>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>>> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
>>>> `AlignedSplitReader`
>>>> > > >> > > interface
>>>> > > >> > > >>>>>>> (aka the
>>>> > > >> > > >>>>>>> > > >> stop &
>>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
>>>> > Kafka and
>>>> > > >> > > >>>>>>> Pulsar only,
>>>> > > >> > > >>>>>>> > > >> correct?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though, those
>>>> are
>>>> > the
>>>> > > >> only
>>>> > > >> > > >>>>>>> sources
>>>> > > >> > > >>>>>>> > which
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> consume
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits and
>>>> thus
>>>> > > >> alignment
>>>> > > >> > > >>>>>>> applies.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment
>>>> in
>>>> > > >> > > >>>>>>> SourceReaderBase and
>>>> > > >> > > >>>>>>> > > then
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> doing
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
>>>> implement
>>>> > > >> > > >>>>>>> AlignedSplitReader
>>>> > > >> > > >>>>>>> > > >> could
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> be
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment
>>>> can
>>>> > > >> instead
>>>> > > >> > be
>>>> > > >> > > >>>>>>> added to
>>>> > > >> > > >>>>>>> > the
>>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
>>>> > KafkaSourceReader) to
>>>> > > >> > make
>>>> > > >> > > >>>>>>> it
>>>> > > >> > > >>>>>>> > explicit
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think
>>>> we
>>>> > could
>>>> > > >> > > >>>>>>> actually do
>>>> > > >> > > >>>>>>> > that.
>>>> > > >> > > >>>>>>> > > >> Given
>>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
>>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
>>>> > > >> > > >>>>>>> > > >> rather
>>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the
>>>> corresponding
>>>> > method
>>>> > > >> of
>>>> > > >> > > >>>>>>> > SplitFetcher),
>>>> > > >> > > >>>>>>> > > >> we
>>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the actual source
>>>> > > >> > > >>>>>>> implementations. This
>>>> > > >> > > >>>>>>> > > >> solution
>>>> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors of new
>>>> > sources
>>>> > > >> would
>>>> > > >> > > >>>>>>> have to do
>>>> > > >> > > >>>>>>> > > two
>>>> > > >> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader
>>>> and
>>>> > > >> implement
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
>>>> AlignedSplitReader.
>>>> > I
>>>> > > >> would
>>>> > > >> > > be
>>>> > > >> > > >>>>>>> fine
>>>> > > >> > > >>>>>>> > with
>>>> > > >> > > >>>>>>> > > >> such
>>>> > > >> > > >>>>>>> > > >> > a
>>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Steven:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
>>>> section, is
>>>> > it
>>>> > > >> > > >>>>>>> accurate? Let's
>>>> > > >> > > >>>>>>> > > >> assume
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> one
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions
>>>> and
>>>> > one of
>>>> > > >> the
>>>> > > >> > > >>>>>>> partition
>>>> > > >> > > >>>>>>> > is
>>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
>>>> > watermark
>>>> > > >> > for
>>>> > > >> > > >>>>>>> this
>>>> > > >> > > >>>>>>> > source
>>>> > > >> > > >>>>>>> > > >> task
>>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
>>>> data
>>>> > from
>>>> > > >> > other
>>>> > > >> > > >>>>>>> two Kafka
>>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold
>>>> back
>>>> > the
>>>> > > >> > overall
>>>> > > >> > > >>>>>>> > watermark.
>>>> > > >> > > >>>>>>> > > I
>>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
>>>> late
>>>> > data
>>>> > > >> for
>>>> > > >> > > >>>>>>> this slow
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> partition.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark.
>>>> Watermark of
>>>> > an
>>>> > > >> > > >>>>>>> operator is the
>>>> > > >> > > >>>>>>> > > >> minimum
>>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
>>>> restart. Say
>>>> > > >> split
>>>> > > >> > > >>>>>>> alignment is
>>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
>>>> > failed and
>>>> > > >> > > >>>>>>> restored from
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > >> > last
>>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision
>>>> is not
>>>> > > >> > > >>>>>>> checkpointed,
>>>> > > >> > > >>>>>>> > > initially
>>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get
>>>> a
>>>> > cycle
>>>> > > >> of
>>>> > > >> > > >>>>>>> watermark
>>>> > > >> > > >>>>>>> > > >> > aggregation
>>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
>>>> > corner is
>>>> > > >> a
>>>> > > >> > > >>>>>>> problem. Just
>>>> > > >> > > >>>>>>> > > >> want
>>>> > > >> > > >>>>>>> > > >> > to
>>>> > > >> > > >>>>>>> > > >> > >> understand it more.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> @Becket:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort
>>>> of a
>>>> > > >> general
>>>> > > >> > use
>>>> > > >> > > >>>>>>> case, so
>>>> > > >> > > >>>>>>> > > >> should
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> we
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
>>>> SourceReader
>>>> > > >> directly
>>>> > > >> > > >>>>>>> instead of
>>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
>>>> > > >> WithSplitAssignment?
>>>> > > >> > > We
>>>> > > >> > > >>>>>>> can
>>>> > > >> > > >>>>>>> > > provide
>>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
>>>> > > >> compatibility
>>>> > > >> > > >>>>>>> won't be an
>>>> > > >> > > >>>>>>> > > >> issue.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a default
>>>> > > >> > implementation.
>>>> > > >> > > >>>>>>> How would
>>>> > > >> > > >>>>>>> > we
>>>> > > >> > > >>>>>>> > > >> do
>>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it
>>>> better
>>>> > than
>>>> > > >> > > having
>>>> > > >> > > >>>>>>> an
>>>> > > >> > > >>>>>>> > opt-in
>>>> > > >> > > >>>>>>> > > >> > >> interface? The default implementation
>>>> would
>>>> > have
>>>> > > >> to
>>>> > > >> > be
>>>> > > >> > > >>>>>>> added
>>>> > > >> > > >>>>>>> > > >> exclusively
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> in
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By
>>>> the way
>>>> > > >> notice
>>>> > > >> > > >>>>>>> > > SourceReaderBase
>>>> > > >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
>>>> > > >> effectively
>>>> > > >> > > all
>>>> > > >> > > >>>>>>> > > >> implementations
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> do
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I
>>>> > think
>>>> > > >> it is
>>>> > > >> > > >>>>>>> impossible
>>>> > > >> > > >>>>>>> > to
>>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader interface
>>>> directly
>>>> > by
>>>> > > >> end
>>>> > > >> > > >>>>>>> users.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
>>>> interface
>>>> > > >> > probably
>>>> > > >> > > >>>>>>> also
>>>> > > >> > > >>>>>>> > needs
>>>> > > >> > > >>>>>>> > > >> some
>>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
>>>> > > >> > granularity.
>>>> > > >> > > >>>>>>> Can you
>>>> > > >> > > >>>>>>> > add
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
>>>> interface
>>>> > > >> section as
>>>> > > >> > > >>>>>>> well?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> It has been added from the beginning. See
>>>> > > >> > > >>>>>>> *AlignedSplitReader.*
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method
>>>> name
>>>> > > >> > assignSplits
>>>> > > >> > > >>>>>>> here,
>>>> > > >> > > >>>>>>> > given
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> it
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
>>>> > assignments? It
>>>> > > >> > > seems
>>>> > > >> > > >>>>>>> > something
>>>> > > >> > > >>>>>>> > > >> like
>>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>>>> > > >> adjustSplitsThrottling() is
>>>> > > >> > > >>>>>>> more
>>>> > > >> > > >>>>>>> > > accurate.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not
>>>> > assign. Do
>>>> > > >> you
>>>> > > >> > > >>>>>>> still
>>>> > > >> > > >>>>>>> > prefer
>>>> > > >> > > >>>>>>> > > a
>>>> > > >> > > >>>>>>> > > >> > >> different name for that? Personally, I am
>>>> > open for
>>>> > > >> > > >>>>>>> suggestions
>>>> > > >> > > >>>>>>> > > here.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> [1]
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>>
>>>> > > >> > >
>>>> > > >> >
>>>> > > >>
>>>> >
>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort,
>>>> Sebastion. I
>>>> > think
>>>> > > >> the
>>>> > > >> > > >>>>>>> motivation
>>>> > > >> > > >>>>>>> > > >> makes a
>>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions /
>>>> > questions.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort
>>>> of a
>>>> > > >> general
>>>> > > >> > use
>>>> > > >> > > >>>>>>> case, so
>>>> > > >> > > >>>>>>> > > >> should
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> we
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
>>>> SourceReader
>>>> > > >> directly
>>>> > > >> > > >>>>>>> instead of
>>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
>>>> > > >> WithSplitAssignment?
>>>> > > >> > > We
>>>> > > >> > > >>>>>>> can
>>>> > > >> > > >>>>>>> > > provide
>>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
>>>> > > >> compatibility
>>>> > > >> > > >>>>>>> won't be an
>>>> > > >> > > >>>>>>> > > >> issue.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
>>>> interface
>>>> > > >> > probably
>>>> > > >> > > >>>>>>> also
>>>> > > >> > > >>>>>>> > needs
>>>> > > >> > > >>>>>>> > > >> some
>>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
>>>> > > >> > granularity.
>>>> > > >> > > >>>>>>> Can you
>>>> > > >> > > >>>>>>> > add
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >> interface change into the public
>>>> interface
>>>> > > >> section as
>>>> > > >> > > >>>>>>> well?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method
>>>> name
>>>> > > >> > assignSplits
>>>> > > >> > > >>>>>>> here,
>>>> > > >> > > >>>>>>> > given
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> it
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
>>>> > assignments? It
>>>> > > >> > > seems
>>>> > > >> > > >>>>>>> > something
>>>> > > >> > > >>>>>>> > > >> like
>>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>>>> > > >> adjustSplitsThrottling() is
>>>> > > >> > > >>>>>>> more
>>>> > > >> > > >>>>>>> > > accurate.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven
>>>> Wu <
>>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> > > <
>>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com>
>>>> <
>>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
>>>> > > >> > > >>>>>>> > <
>>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
>>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> However, a single source operator may
>>>> read
>>>> > data
>>>> > > >> from
>>>> > > >> > > >>>>>>> multiple
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
>>>> > > >> partitions,
>>>> > > >> > > such
>>>> > > >> > > >>>>>>> that
>>>> > > >> > > >>>>>>> > even
>>>> > > >> > > >>>>>>> > > >> with
>>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source operator
>>>> may
>>>> > need
>>>> > > >> to
>>>> > > >> > > >>>>>>> buffer
>>>> > > >> > > >>>>>>> > > excessive
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> amount
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data faster
>>>> than
>>>> > > >> another.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
>>>> section, is
>>>> > it
>>>> > > >> > > >>>>>>> accurate? Let's
>>>> > > >> > > >>>>>>> > > >> assume
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> one
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions
>>>> and
>>>> > one of
>>>> > > >> the
>>>> > > >> > > >>>>>>> partition
>>>> > > >> > > >>>>>>> > is
>>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
>>>> > watermark
>>>> > > >> > for
>>>> > > >> > > >>>>>>> this
>>>> > > >> > > >>>>>>> > source
>>>> > > >> > > >>>>>>> > > >> task
>>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
>>>> data
>>>> > from
>>>> > > >> > other
>>>> > > >> > > >>>>>>> two Kafka
>>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold
>>>> back
>>>> > the
>>>> > > >> > overall
>>>> > > >> > > >>>>>>> > watermark.
>>>> > > >> > > >>>>>>> > > I
>>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
>>>> late
>>>> > data
>>>> > > >> for
>>>> > > >> > > >>>>>>> this slow
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> partition.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
>>>> restart. Say
>>>> > > >> split
>>>> > > >> > > >>>>>>> alignment is
>>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
>>>> > failed and
>>>> > > >> > > >>>>>>> restored from
>>>> > > >> > > >>>>>>> > > the
>>>> > > >> > > >>>>>>> > > >> > last
>>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision
>>>> is not
>>>> > > >> > > >>>>>>> checkpointed,
>>>> > > >> > > >>>>>>> > > initially
>>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get
>>>> a
>>>> > cycle
>>>> > > >> of
>>>> > > >> > > >>>>>>> watermark
>>>> > > >> > > >>>>>>> > > >> > aggregation
>>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
>>>> > corner is
>>>> > > >> a
>>>> > > >> > > >>>>>>> problem. Just
>>>> > > >> > > >>>>>>> > > >> want
>>>> > > >> > > >>>>>>> > > >> > to
>>>> > > >> > > >>>>>>> > > >> > >> understand it more.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas
>>>> Weise <
>>>> > > >> > > >>>>>>> thw@apache.org> <
>>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
>>>> thw@apache.org>
>>>> > <
>>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
>>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment
>>>> in
>>>> > > >> > > >>>>>>> SourceReaderBase and
>>>> > > >> > > >>>>>>> > > then
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> doing
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
>>>> implement
>>>> > > >> > > >>>>>>> AlignedSplitReader
>>>> > > >> > > >>>>>>> > > >> could
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> be
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment
>>>> can
>>>> > > >> instead
>>>> > > >> > be
>>>> > > >> > > >>>>>>> added to
>>>> > > >> > > >>>>>>> > the
>>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
>>>> > KafkaSourceReader) to
>>>> > > >> > make
>>>> > > >> > > >>>>>>> it
>>>> > > >> > > >>>>>>> > explicit
>>>> > > >> > > >>>>>>> > > >> that
>>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>>> > > >> > > >>>>>>> > > >> > >> Thomas
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM
>>>> Konstantin
>>>> > Knauf <
>>>> > > >> > > >>>>>>> > > knaufk@apache.org>
>>>> > > >> > > >>>>>>> > > >> <
>>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
>>>> > > >> > knaufk@apache.org
>>>> > > >> > > >
>>>> > > >> > > >>>>>>> <
>>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
>>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
>>>> `AlignedSplitReader`
>>>> > > >> > > interface
>>>> > > >> > > >>>>>>> (aka the
>>>> > > >> > > >>>>>>> > > >> stop
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> &
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
>>>> > Kafka and
>>>> > > >> > > >>>>>>> Pulsar only,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> correct?
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable
>>>> to
>>>> > > >> complete
>>>> > > >> > the
>>>> > > >> > > >>>>>>> watermark
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> aligned
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Cheers,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Konstantin
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid
>>>> > Wysakowicz
>>>> > > >> <
>>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>>> dwysakowicz@apache.org>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I
>>>> > support it
>>>> > > >> ;)
>>>> > > >> > I
>>>> > > >> > > >>>>>>> think we
>>>> > > >> > > >>>>>>> > can
>>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there
>>>> are no
>>>> > > >> concerns
>>>> > > >> > > so
>>>> > > >> > > >>>>>>> far.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis
>>>> wrote:
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion on
>>>> FLIP 217
>>>> > [1]
>>>> > > >> for
>>>> > > >> > > an
>>>> > > >> > > >>>>>>> > extension
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> of
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment
>>>> also
>>>> > in
>>>> > > >> > > >>>>>>> SplitReaders. To
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> do
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> so,
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and
>>>> > resume
>>>> > > >> > reading
>>>> > > >> > > >>>>>>> from
>>>> > > >> > > >>>>>>> > split
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> sources
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and
>>>> > controlls
>>>> > > >> > > >>>>>>> suspend and
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> resume.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> To
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> gather information about current
>>>> watermarks
>>>> > of the
>>>> > > >> > > >>>>>>> SplitReaders,
>>>> > > >> > > >>>>>>> > we
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> extend
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer
>>>> and
>>>> > report
>>>> > > >> > > >>>>>>> watermarks to
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> the
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2],
>>>> prototyped
>>>> > by
>>>> > > >> Arvid
>>>> > > >> > > >>>>>>> Heise and
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> revised
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> and
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did
>>>> most of
>>>> > the
>>>> > > >> > work.)
>>>> > > >> > > >>>>>>> and me.
>>>> > > >> > > >>>>>>> > The
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> changes
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way that if
>>>> > affected
>>>> > > >> > > >>>>>>> components do
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> not
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> support split alignment the behavior is
>>>> as
>>>> > before.
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Best,
>>>> > > >> > > >>>>>>> > > >> > >> Sebastian
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> [1]
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>>
>>>> > > >> > >
>>>> > > >> >
>>>> > > >>
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> [2]
>>>> > > >> > > >>>>>>>
>>>> https://github.com/dawidwys/flink/tree/aligned-splits
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> --
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
>>>> > > >> > > >>>>>>> > > >>
>>>> twitter.com/snntrablehttps://github.com/knaufk
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> > >>
>>>> > > >> > > >>>>>>> > > >> >
>>>> > > >> > > >>>>>>> > > >>
>>>> > > >> > > >>>>>>> > > >
>>>> > > >> > > >>>>>>> > >
>>>> > > >> > > >>>>>>> >
>>>> > > >> > > >>>>>>>
>>>> > > >> > > >>>>>>
>>>> > > >> > >
>>>> > > >> >
>>>> > > >>
>>>> > > >
>>>> >
>>>>
>>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Sebastian Mattheis <se...@ververica.com>.
(Sorry I didn't send to the list but only to Becket. My bad and thanks
Piotr. Next attempt:)

Hi all,

Thanks for pushing the FLIP. I would drive it and would be happy to get
back to you, @Thomas, for reviews. (Sorry for low responsiveness, there
were several efforts with high priority on my side ...) As next step, I
would revise the FLIP to get the discussion concluded.

However, as Becket mentioned I feel that some things are still not clear
yet:

Re: Thomas
>
> However, from a user perspective, should the split level alignment be
>> an opt-in feature, at least for a few releases? If yes, then we would
>> require a knob similar to supportsPausingSplits(), which I understand
>> won't be part of the revised FLIP. Such control may be beneficial:
>
>
>> * Compare runtime behavior with split level alignment on/off
>> * Allow use of sources that don't implement pausing splits yet
>
>
>> The second point would, from the user's perspective, be necessary for
>> backward compatibility? While the interface aspect and source
>> compatibility has been discussed in great detail, I don't think it
>> would be desirable if an application that already uses alignment fails
>> after upgrading to the new Flink version, forcing users to lock step
>> modify sources for the new non-optional split level alignment.
>
>
>> So I think clarification of the compatibility aspect on the FLIP page
>> would be necessary.
>
>
> That is a good point. Currently FLIP-182 is marked as experimental. So
> technically speaking it could change. That said, I agree that it would be
> good to not break the existing sources.
>
> My main concern of having a supportsPausingSplits() knob is that this
> might be something requiring code change on future derepcation. I am
> thinking of two potential solutions:
>
> Solution 1:
> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
> source reader only has one split assigned, the source reader simply stops
> polling but just returns NOTHING_AVAILABLE. If there are more than one
> splits assigned, it throws an exception with a message such as "The
> unpausable SplitReader CLASS_NAME only works with watermark alignment
> when assigned a single split. There are more than one split assigned to the
> SplitReader".
> The reasoning behind this solution is that existing users should only use
> the coarse watermark alignment when a source reader only reads from a
> single split. Reading from more than one split might have unwanted side
> effects, so we might as well throw an exception in this case.
>
> Solution 2:
> Having a configuration "enable.coarse-grained.watermark.alignment", the
> default value is false. Once it is set to true, we will allow
> coarse-grained watermark alignment if a SplitReader is pausable.
> This solution allows users to keep the current FLIP-182 behavior, with the
> risk of side effects.
>
> Personally speaking, I feel solution 1 seems better because coarse-grained
> watermark alignment could be frustrating to the users when more than one
> split is assigned. So we might as well not support it at all. And also
> there is nothing to deprecate in the future with this solution.
>
> What do you think?
>

@Thomas: My understanding is that you intend a simple switch/knob to test
w/ and w/o (split) watermark alignment, right? Isn't the coarse-grained w/
vs w/o watermark alignment sufficient for that? Or do you think that
switching watermark aligment explicitly on split level is required?

@Becket: I'm not sure about the intention of solution 1. Can you explain
that a bit more? In particular, I don't understand: "The reasoning behind
this solution is that existing users should only use the coarse watermark
alignment when a source reader only reads from a single split." Why should
a user not use coarse watermark alignment when source reader reads from
multiple splits? The split alignment uses the "coarse watermark", i.e.,
maxDesiredWatermark, as described in the FLIP for alignment.

Could you please clarify?

Regards,
Sebastian

On Wed, Jun 22, 2022 at 4:23 AM Becket Qin <be...@gmail.com> wrote:

> Thanks for the feedback, Thomas and Steve. And thanks Piotr for the
> patient and detailed discussion.
>
> Let's move forward with option 1 then.
>
> Re: Thomas
>
> However, from a user perspective, should the split level alignment be
>> an opt-in feature, at least for a few releases? If yes, then we would
>> require a knob similar to supportsPausingSplits(), which I understand
>> won't be part of the revised FLIP. Such control may be beneficial:
>
>
>> * Compare runtime behavior with split level alignment on/off
>> * Allow use of sources that don't implement pausing splits yet
>
>
>> The second point would, from the user's perspective, be necessary for
>> backward compatibility? While the interface aspect and source
>> compatibility has been discussed in great detail, I don't think it
>> would be desirable if an application that already uses alignment fails
>> after upgrading to the new Flink version, forcing users to lock step
>> modify sources for the new non-optional split level alignment.
>
>
>> So I think clarification of the compatibility aspect on the FLIP page
>> would be necessary.
>
>
> That is a good point. Currently FLIP-182 is marked as experimental. So
> technically speaking it could change. That said, I agree that it would be
> good to not break the existing sources.
>
> My main concern of having a supportsPausingSplits() knob is that this
> might be something requiring code change on future derepcation. I am
> thinking of two potential solutions:
>
> Solution 1:
> In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
> source reader only has one split assigned, the source reader simply stops
> polling but just returns NOTHING_AVAILABLE. If there are more than one
> splits assigned, it throws an exception with a message such as "The
> unpausable SplitReader CLASS_NAME only works with watermark alignment when
> assigned a single split. There are more than one split assigned to the
> SplitReader".
> The reasoning behind this solution is that existing users should only use
> the coarse watermark alignment when a source reader only reads from a
> single split. Reading from more than one split might have unwanted side
> effects, so we might as well throw an exception in this case.
>
> Solution 2:
> Having a configuration "enable.coarse-grained.watermark.alignment", the
> default value is false. Once it is set to true, we will allow
> coarse-grained watermark alignment if a SplitReader is pausable.
> This solution allows users to keep the current FLIP-182 behavior, with the
> risk of side effects.
>
> Personally speaking, I feel solution 1 seems better because coarse-grained
> watermark alignment could be frustrating to the users when more than one
> split is assigned. So we might as well not support it at all. And also
> there is nothing to deprecate in the future with this solution.
>
> What do you think?
>
> Thank,
>
> Jiangjie (Becket) Qin
>
>
> On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
>> Hi,
>>
>> It looks like option 1 wins overall? So let's go with that.
>>
>> Best,
>> Piotrek
>>
>> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com> napisał(a):
>>
>>> Both option 1 (default impl in base interface) and option 2 (decorative
>>> interface) are pretty common patterns. I would also be fine with either.
>>> The important thing is that an exception is thrown if a source doesn't
>>> support the alignment capability.
>>>
>>> The other point is that we can validate the source capability if
>>> alignment
>>> is enabled in WatermarkStrategy. I believe either option can achieve this
>>> goal too.
>>> public interface WatermarkStrategy<T> {
>>>     WatermarkStrategy<T> withWatermarkAlignment(String watermarkGroup,
>>> Duration maxAllowedWatermarkDrift);
>>> }
>>>
>>> If I have to pick one, I am slightly favoring option 1 (base interface).
>>> As
>>> watermark is already an essential concept of source, maybe watermark
>>> alignment capability can also be a property of the base source/reader
>>> interface.
>>>
>>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org> wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > Thank you for all the effort that went into this discussion. The split
>>> > level watermark alignment will be an important feature for Flink that
>>> > will address operational problems for various use cases. From reading
>>> > through this thread it appears that not too much remains to bring this
>>> > FLIP to acceptance and allow development to move forward. I would like
>>> > to contribute if possible.
>>> >
>>> > Regarding option 1 vs. option 2: I don't have a strong preference,
>>> > perhaps slightly leaning towards option 1.
>>> >
>>> > However, from a user perspective, should the split level alignment be
>>> > an opt-in feature, at least for a few releases? If yes, then we would
>>> > require a knob similar to supportsPausingSplits(), which I understand
>>> > won't be part of the revised FLIP. Such control may be beneficial:
>>> >
>>> > * Compare runtime behavior with split level alignment on/off
>>> > * Allow use of sources that don't implement pausing splits yet
>>> >
>>> > The second point would, from the user's perspective, be necessary for
>>> > backward compatibility? While the interface aspect and source
>>> > compatibility has been discussed in great detail, I don't think it
>>> > would be desirable if an application that already uses alignment fails
>>> > after upgrading to the new Flink version, forcing users to lock step
>>> > modify sources for the new non-optional split level alignment.
>>> >
>>> > So I think clarification of the compatibility aspect on the FLIP page
>>> > would be necessary.
>>> >
>>> > Thanks,
>>> > Thomas
>>> >
>>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
>>> piotr.nowojski@gmail.com>
>>> > wrote:
>>> > >
>>> > > Hi Becket,
>>> > >
>>> > > Thanks for summing this up. Just one correction:
>>> > >
>>> > > > Piotr prefers option 2, his opinions are:
>>> > > >   e) It is OK that the code itself in option 2 indicates the
>>> developers
>>> > > that a feature is optional. We will rely on the documentation to
>>> correct
>>> > > that and clarify that the feature is actually obligatory.
>>> > >
>>> > > I would say based on a) and b) that feature would be still optional.
>>> So
>>> > > both the implementation and the documentation would be saying that.
>>> We
>>> > > could add a mention to the docs and release notes, that this feature
>>> will
>>> > > be obligatory in the next major release and plan such a release
>>> > accordingly.
>>> > >
>>> > > Re the option 1., as you mentioned:
>>> > > > As for option 1: For developers, the feature is still optional due
>>> to
>>> > the
>>> > > default implementation in the interface, regardless of what the
>>> default
>>> > > implementation does, because the code compiles without overriding
>>> these
>>> > > methods
>>> > >
>>> > > Also importantly, the code will work in most cases.
>>> > >
>>> > > > Obligatory: Jobs may fail if these methods are not implemented
>>> > properly.
>>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in
>>> > Java,
>>> > > e.g. Iterator.remove() by default throws
>>> "UnsupportedOperationException",
>>> > > informing the implementation that things may go wrong if this method
>>> is
>>> > not
>>> > > implemented.
>>> > >
>>> > > For me `Iterator#remove()` is an optional feature. Personally, I
>>> don't
>>> > > remember if I have ever implemented it.
>>> > >
>>> > > Best,
>>> > > Piotrek
>>> > >
>>> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com>
>>> napisał(a):
>>> > >
>>> > > > I had an offline discussion with Piotr and here is the summary.
>>> Please
>>> > > > correct me if I miss something, Piotr.
>>> > > >
>>> > > > There are two things we would like to seek more opinions from the
>>> > > > community, so we can make progress on this FLIP.
>>> > > >
>>> > > > 1. The General pattern to add obligatory features to existing
>>> > interfaces.
>>> > > >
>>> > > >
>>> >
>>> ***********************************************************************************
>>> > > > For interfaces exposed to the developers for implementation, they
>>> are
>>> > > > either intended to be *optional* or *obligatory. *While it is quite
>>> > clear
>>> > > > about how to convey that intention when creating the interfaces,
>>> it is
>>> > not
>>> > > > as commonly agreed when we are adding new features to an existing
>>> > > > interface. In general, Flink uses decorative interfaces when adding
>>> > > > optional features to existing interfaces. Both Piotr and I agree
>>> that
>>> > looks
>>> > > > good.
>>> > > >
>>> > > > Different opinions are mainly about how to add obligatory features
>>> to
>>> > the
>>> > > > existing interfaces, probably due to different understandings of
>>> > > > "obligatory".
>>> > > >
>>> > > > We have discussed about four options:
>>> > > >
>>> > > > *Option 1:*
>>> > > >
>>> > > >    - Just add a new method to the existing interface.
>>> > > >    - For backwards compatibility, the method would have a default
>>> > > >    implementation throwing "UnsupportedOperationException".
>>> > > >    - In the next major version, remove the default implementation.
>>> > > >    - For the developers, any method with a default implementation
>>> > > >    throwing an "UnsupportedOperationException" should be taken as
>>> > obligatory.
>>> > > >
>>> > > > *Option 2:*
>>> > > >
>>> > > >    - Always make the features optional by adding a decorative
>>> > interface,
>>> > > >    just like ordinary optional features.
>>> > > >    - Inform the developers via documentation that this feature is
>>> > > >    obligatory, although it looks like optional from the code.
>>> > > >    - In case the developers did not implement the decorative
>>> interface,
>>> > > >    throw an exception
>>> > > >    - In the next major version, move the methods in the decorative
>>> > > >    interface to the base interface, and deprecate the decorative
>>> > interface.
>>> > > >
>>> > > > *Option 3:*
>>> > > >
>>> > > >    - Always bump the major version when a new obligatory feature is
>>> > > >    added, even if we may have to do it frequently.
>>> > > >
>>> > > > *Option 4:*
>>> > > >
>>> > > >    - Add a V2, V3... of the interface affected by the new
>>> obligatory
>>> > > >    feature.
>>> > > >    - In the next major versions, deprecate old versions of the
>>> > interfaces.
>>> > > >
>>> > > > Both Piotr and me agreed that option 3 and option 4 have a big side
>>> > effect
>>> > > > and should be avoided. We have different preference between option
>>> 1
>>> > and
>>> > > > option 2.
>>> > > >
>>> > > > Personally I prefer option 1, the reasons are:
>>> > > >   a) simple and intuitive. Java 8 introduced the default impl in
>>> > > > interfaces exactly for interface evolving, and this is a common
>>> > pattern in
>>> > > > many projects.
>>> > > >   b) prominent to the developers that the feature is expected to be
>>> > > > implemented, because it explicitly throws an exception in the
>>> default
>>> > impl.
>>> > > >   c) low maintenance overhead - the Flink framework can always
>>> assume
>>> > the
>>> > > > method exists, so no special handling logic is needed.
>>> > > >   d) communicate a clear semantic boundary between optional and
>>> > obligatory
>>> > > > features in the Flink to the developers.
>>> > > >       - Optional: Jobs still run without exception if these
>>> methods are
>>> > > > not implemented. e.g. all the SupportsXXXPushDown interfaces.
>>> > > >       - Obligatory: Jobs may fail if these methods are not
>>> implemented
>>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a common
>>> > pattern
>>> > > > in Java, e.g. Iterator.remove() by default throws
>>> > > > "UnsupportedOperationException", informing the implementation that
>>> > things
>>> > > > may go wrong if this method is not implemented.
>>> > > >
>>> > > > As for option 2, Although the API itself sounds clean, it misleads
>>> > people
>>> > > > to think of an obligatory feature to be optional - from the code
>>> the
>>> > > > feature is optional, but the documents say it is obligatory. We
>>> > probably
>>> > > > should avoid such code-doc inconsistency, as people will be
>>> confused.
>>> > And I
>>> > > > would actually be bewildered that sometimes not implementing an
>>> > "optional"
>>> > > > feature is fine, but sometimes it causes the jobs to fail.
>>> > > >
>>> > > > In response to the argument that the method with a default
>>> > implementation
>>> > > > is always optional, if that is true, it actually means all the
>>> > interfaces
>>> > > > should be immutable once they are created. If we want to add a
>>> method
>>> > to an
>>> > > > existing interface, for backwards compatibility, we will have to
>>> > provide a
>>> > > > default implementation. And the fact it has a default
>>> implementation
>>> > > > indicates the method is optional. If that method is optional, it
>>> should
>>> > > > reside in a separate decorative interface, otherwise it clogs that
>>> > existing
>>> > > > interface. Therefore, people should never add a method to an
>>> existing
>>> > > > interface. I find this conclusion a bit extreme.
>>> > > >
>>> > > > Piotr prefers option 2, his opinions are:
>>> > > >     a) Obligatory methods are the methods that fail the code
>>> > compilation
>>> > > > if not implemented.
>>> > > >     b) All obligatory methods should reside in the base interface,
>>> > without
>>> > > > a default implementation. And all the optional methods should be in
>>> > > > decorative interfaces. This is a clean API.
>>> > > >     c) due to b), there isn't a viable solution to add an
>>> obligatory
>>> > > > method to an existing interface in a backwards compatible way.
>>> Unless
>>> > we
>>> > > > are OK with breaking backwards compatibility, all the interfaces
>>> > should be
>>> > > > treated as immutable. As a compromise, we might as well just treat
>>> all
>>> > the
>>> > > > features added later as optional features. This way we keep the API
>>> > clean.
>>> > > >     d) based on b) and c), option 2 has a clean API, while option 1
>>> > does
>>> > > > not.
>>> > > >     e) It is OK that the code itself in option 2 indicates the
>>> > developers
>>> > > > that a feature is optional. We will rely on the documentation to
>>> > correct
>>> > > > that and clarify that the feature is actually obligatory.
>>> > > >     f) Regarding the effectiveness of making people aware that the
>>> > feature
>>> > > > is obligatory, Option 1 and Option 2 are similar. For people that
>>> do
>>> > not
>>> > > > read the release note / documentation, they will mistake the
>>> feature
>>> > to be
>>> > > > optional anyways.
>>> > > >
>>> > > > As for option 1: For developers, the feature is still optional due
>>> to
>>> > the
>>> > > > default implementation in the interface, regardless of what the
>>> default
>>> > > > implementation does, because the code compiles without overriding
>>> these
>>> > > > methods. Also, another problem of this option is that for users
>>> that
>>> > do not
>>> > > > know about the history of the interface, they may be confused by
>>> the
>>> > > > default implementation throwing an exception.
>>> > > >
>>> > > >
>>> > > > 2. For this particular FLIP, should it be optional or not?
>>> > > >
>>> > > >
>>> >
>>> ***********************************************************************************
>>> > > > As mentioned in the previous email, I feel this FLIP should be
>>> > obligatory,
>>> > > > for the following reasons:
>>> > > > 1. The Flink framework exposes the watermark alignment API to the
>>> end
>>> > > > users. From the end users' perspective, the feature should be
>>> available
>>> > > > regardless of the implementation details in the pluggables. This is
>>> > true
>>> > > > for any other methods exposed as the Flink API.
>>> > > > 2. If a Source is not pausable, the end user should receive an
>>> > exception
>>> > > > when enable the watermark alignment, (both Piotr and me agree on
>>> > this). In
>>> > > > that case, it meets my criteria of obligatory feature because not
>>> > > > implementing the feature causes a framework API to throw exception
>>> and
>>> > > > fails the job.
>>> > > >
>>> > > > On the other hand, Piotr does not have a strong opinion regarding
>>> > whether
>>> > > > this feature should be optional or not.
>>> > > >
>>> > > >
>>> > > > Thanks for reading through this long email. So basically in order
>>> to
>>> > make
>>> > > > progress on this FLIP, we want to see what do people feel about the
>>> > above
>>> > > > two topics.
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Jiangjie (Becket) Qin
>>> > > >
>>> > > >
>>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
>>> pnowojski@apache.org>
>>> > > > wrote:
>>> > > >
>>> > > >> Hi Becket,
>>> > > >>
>>> > > >> I still sustain what I wrote before:
>>> > > >> > I think I would still vote soft -1 on this option, but I
>>> wouldn't
>>> > block
>>> > > >> it in case I am out-voted.
>>> > > >>
>>> > > >> > I think it might be helpful to agree on the definition of
>>> optional
>>> > in
>>> > > >> our
>>> > > >> case.
>>> > > >>
>>> > > >> For me it doesn't matter whether a default method throwing an
>>> > exception we
>>> > > >> call optional or non-optional. As long as we keep it this way, the
>>> > effect
>>> > > >> is the same. It's effectively a method that a user doesn't have to
>>> > > >> implement. If interface/system allows some methods to be not
>>> > implemented,
>>> > > >> some users will do just that, regardless if we call it and
>>> document as
>>> > > >> non-optional. And at the same time it's clogging the base
>>> interface.
>>> > > >>
>>> > > >> By the way, just the need for a java-doc/documentation explaining
>>> the
>>> > > >> existence of some construct is a bad smell (code should be
>>> > > >> self-documenting
>>> > > >> and default method throwing an UnsupportedOperationException is
>>> not).
>>> > > >>
>>> > > >> > Please note that so far we do not assume whether the feature is
>>> in
>>> > > >> > the original API or it is added later. A newly added feature can
>>> > also be
>>> > > >> > non-optional, although it might take some time for all the
>>> pluggable
>>> > > >> > developers to catch up, and they should still work if the new
>>> > feature is
>>> > > >> > not used until they catch up. In contrast, we may never expect
>>> an
>>> > > >> optional
>>> > > >> > feature to catch up, because leaving it unimplemented is also
>>> > blessed.
>>> > > >> >
>>> > > >> > Let's take the checkpointing as an example. Imagine Flink did
>>> not
>>> > > >> support
>>> > > >> > checkpointing before release 1.16. And now we are trying to add
>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
>>> configuration
>>> > to
>>> > > >> the
>>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
>>> > operators,
>>> > > >> > connectors) developers that methods like "snapshotState()" is
>>> > optional?
>>> > > >> If
>>> > > >> > we do that, the availability of checkpointing in Flink would be
>>> > severely
>>> > > >> > weakened. But apparently we should still allow the existing
>>> > > >> implementations
>>> > > >> > to work without checkpointing. It looks to me that adding the
>>> > method to
>>> > > >> the
>>> > > >> > pluggable interfaces with a default implementation throwing
>>> > > >> > "UnsupportedOperationException" would be the solution here.
>>> Please
>>> > note
>>> > > >> > that in this case, having the default implementation does not
>>> mean
>>> > this
>>> > > >> is
>>> > > >> > optional. It is just the technique to support backwards
>>> > compatibility in
>>> > > >> > the feature evolution. The fact that this method is in the base
>>> > > >> interface
>>> > > >> > suggests it is not optional, so the developers SHOULD implement
>>> it.
>>> > > >>
>>> > > >> I would soft vote -1 for having the default method throwing
>>> > > >> UnsupportedOperationException as one of thing for this (FLIP-217)
>>> > special
>>> > > >> circumstances.
>>> > > >>
>>> > > >> At the moment, without thinking this over too much, I would vote
>>> > harder -1
>>> > > >> for having this as a general rule when adding new features. If we
>>> > ever end
>>> > > >> up with an API that is littered with default methods throwing
>>> > > >> UnsupportedOperationException that are documented as "non
>>> optional" it
>>> > > >> would be IMO a big design failure. I would be ok-ish with that,
>>> only
>>> > if
>>> > > >> that was a temporary thing and we had an aggressive plan to
>>> release
>>> > more
>>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
>>> > > >> compatibility, that would get rid of those default methods. Adding
>>> > > >> checkpointing and methods like "snapshotState()" would IMO easily
>>> > justify
>>> > > >> a
>>> > > >> new major Flink release. In that case we could add those methods
>>> with
>>> > > >> default implementation for some transition period, a one or two
>>> minor
>>> > > >> releases, followed by a clean up in a major release. However I
>>> would
>>> > still
>>> > > >> argue that it would be cleaner/better to add a decorative
>>> interface
>>> > like
>>> > > >> `CheckpointedOperator` interface instead of adding those default
>>> > methods
>>> > > >> to
>>> > > >> the base `Operator` interface.
>>> > > >>
>>> > > >> I think I can sum up our disagreement here is that I would like to
>>> > keep
>>> > > >> the
>>> > > >> interfaces simpler, with only obligatory methods/features on one
>>> side
>>> > and
>>> > > >> clearly optional features on the other. While you would like to
>>> add an
>>> > > >> extra third state in between those two?
>>> > > >>
>>> > > >> Best,
>>> > > >> Piotrek
>>> > > >>
>>> > > >>
>>> > > >>
>>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com>
>>> > napisał(a):
>>> > > >>
>>> > > >> > Thanks for the clarification, Piotr and Sebastian.
>>> > > >> >
>>> > > >> > It looks like the key problem is still whether the
>>> implementation of
>>> > > >> > pausable splits in the Sources should be optional or not.
>>> > > >> >
>>> > > >> > I think it might be helpful to agree on the definition of
>>> optional
>>> > in
>>> > > >> our
>>> > > >> > case. To me:
>>> > > >> > Optional = "You CAN leave the method unimplemented, and that is
>>> > fine."
>>> > > >> > Non-Optional = "You CAN leave the method unimplemented, but you
>>> > SHOULD
>>> > > >> NOT,
>>> > > >> > because people assume this works."
>>> > > >> >
>>> > > >> > I think one sufficient condition of a Non-Optional feature is
>>> that
>>> > if
>>> > > >> the
>>> > > >> > feature is exposed through the framework API, Flink should
>>> expect
>>> > the
>>> > > >> > pluggables to support this feature by default. Otherwise the
>>> > > >> availability
>>> > > >> > of that feature becomes undefined.
>>> > > >> >
>>> > > >> > Please note that so far we do not assume whether the feature is
>>> in
>>> > > >> > the original API or it is added later. A newly added feature can
>>> > also be
>>> > > >> > non-optional, although it might take some time for all the
>>> pluggable
>>> > > >> > developers to catch up, and they should still work if the new
>>> > feature is
>>> > > >> > not used until they catch up. In contrast, we may never expect
>>> an
>>> > > >> optional
>>> > > >> > feature to catch up, because leaving it unimplemented is also
>>> > blessed.
>>> > > >> >
>>> > > >> > Let's take the checkpointing as an example. Imagine Flink did
>>> not
>>> > > >> support
>>> > > >> > checkpointing before release 1.16. And now we are trying to add
>>> > > >> > checkpointing to Flink. So we exposed the checkpoint
>>> configuration
>>> > to
>>> > > >> the
>>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
>>> > operators,
>>> > > >> > connectors) developers that methods like "snapshotState()" is
>>> > optional?
>>> > > >> If
>>> > > >> > we do that, the availability of checkpointing in Flink would be
>>> > severely
>>> > > >> > weakened. But apparently we should still allow the existing
>>> > > >> implementations
>>> > > >> > to work without checkpointing. It looks to me that adding the
>>> > method to
>>> > > >> the
>>> > > >> > pluggable interfaces with a default implementation throwing
>>> > > >> > "UnsupportedOperationException" would be the solution here.
>>> Please
>>> > note
>>> > > >> > that in this case, having the default implementation does not
>>> mean
>>> > this
>>> > > >> is
>>> > > >> > optional. It is just the technique to support backwards
>>> > compatibility in
>>> > > >> > the feature evolution. The fact that this method is in the base
>>> > > >> interface
>>> > > >> > suggests it is not optional, so the developers SHOULD implement
>>> it.
>>> > > >> >
>>> > > >> > When it comes to this FLIP, I think it meets the criteria of
>>> > > >> non-optional
>>> > > >> > features, so we should just use the evolution path of
>>> non-optional
>>> > > >> > features.
>>> > > >> >
>>> > > >> > Thanks,
>>> > > >> >
>>> > > >> > Jiangjie (Becket) Qin
>>> > > >> >
>>> > > >> >
>>> > > >> >
>>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
>>> > pnowojski@apache.org>
>>> > > >> > wrote:
>>> > > >> >
>>> > > >> > > Hi,
>>> > > >> > >
>>> > > >> > > Actually previously I thought about having a decorative
>>> interface
>>> > and
>>> > > >> > > whenever watermark alignment is enabled, checking that the
>>> source
>>> > > >> > > implements the decorative interface. If not, throwing an
>>> > exception.
>>> > > >> > >
>>> > > >> > > The option with default methods in the source interfaces
>>> throwing
>>> > > >> > > `UnsupportedOperationException` I think still suffers from the
>>> > same
>>> > > >> > > problems I mentioned before. It's still an optional
>>> implementation
>>> > > >> and at
>>> > > >> > > the same time it's clogging the base interface. I think I
>>> would
>>> > still
>>> > > >> > vote
>>> > > >> > > soft -1 on this option, but I wouldn't block it in case I am
>>> > > >> out-voted.
>>> > > >> > >
>>> > > >> > > Best,
>>> > > >> > > Piotrek
>>> > > >> > >
>>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
>>> > sebastian@ververica.com>
>>> > > >> > > napisał(a):
>>> > > >> > >
>>> > > >> > > > Hi Becket,
>>> > > >> > > >
>>> > > >> > > > Thanks a lot for your fast and detailed response. For me, it
>>> > > >> converges
>>> > > >> > > and
>>> > > >> > > > dropping the supportsX method sounds very reasonable to me.
>>> > (Side
>>> > > >> note:
>>> > > >> > > > With "pausable splits" enabled as "default" I think we
>>> > > >> misunderstood.
>>> > > >> > As
>>> > > >> > > > you described now "default" I understand as that it should
>>> be
>>> > the
>>> > > >> new
>>> > > >> > > > recommended way of implementation, and I think that is fully
>>> > valid.
>>> > > >> > > Before,
>>> > > >> > > > I understood "default" here as the default implementation,
>>> i.e.,
>>> > > >> > throwing
>>> > > >> > > > UnsupportedOperationException, which is the exact opposite.
>>> :) )
>>> > > >> > > >
>>> > > >> > > > Nevertheless: As mentioned, an open question for me is if
>>> > watermark
>>> > > >> > > > alignment should enforce pausable splits. For
>>> clarification, the
>>> > > >> > current
>>> > > >> > > > documentation [1] says:
>>> > > >> > > >
>>> > > >> > > > *Note:* As of 1.15, Flink supports aligning across tasks of
>>> the
>>> > same
>>> > > >> > > >> source and/or different sources. It does not support
>>> aligning
>>> > > >> > > >> splits/partitions/shards in the same task.
>>> > > >> > > >>
>>> > > >> > > >> In a case where there are e.g. two Kafka partitions that
>>> > produce
>>> > > >> > > >> watermarks at different pace, that get assigned to the same
>>> > task
>>> > > >> > > watermark
>>> > > >> > > >> might not behave as expected. Fortunately, worst case it
>>> > should not
>>> > > >> > > perform
>>> > > >> > > >> worse than without alignment.
>>> > > >> > > >>
>>> > > >> > > >> Given the limitation above, we suggest applying watermark
>>> > > >> alignment in
>>> > > >> > > >> two situations:
>>> > > >> > > >>
>>> > > >> > > >>    1. You have two different sources (e.g. Kafka and File)
>>> that
>>> > > >> > produce
>>> > > >> > > >>    watermarks at different speeds
>>> > > >> > > >>    2. You run your source with parallelism equal to the
>>> number
>>> > of
>>> > > >> > > >>    splits/shards/partitions, which results in every subtask
>>> > being
>>> > > >> > > assigned a
>>> > > >> > > >>    single unit of work.
>>> > > >> > > >>
>>> > > >> > > >> I personally see no issue in implementing and I see no
>>> reason
>>> > > >> against
>>> > > >> > > > implementing this dependency of watermark alignment and
>>> pausable
>>> > > >> > splits.
>>> > > >> > > (I
>>> > > >> > > > think this would even be a good path towards shaping
>>> watermark
>>> > > >> > alignment
>>> > > >> > > in
>>> > > >> > > > 1.16.) However, "I don't see" means that I would be happy to
>>> > hear
>>> > > >> > Dawid's
>>> > > >> > > > and Piotrek's opinions as they implemented watermark
>>> alignment
>>> > > >> based on
>>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
>>> > rationale/background
>>> > > >> > info
>>> > > >> > > > from their side.
>>> > > >> > > >
>>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
>>> > > >> > > >
>>> > > >> > > > Regards,
>>> > > >> > > > Sebastian
>>> > > >> > > >
>>> > > >> > > > [1]
>>> > > >> > > >
>>> > > >> > >
>>> > > >> >
>>> > > >>
>>> >
>>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
>>> > > >> > > > [2]
>>> > > >> > > >
>>> > > >> > >
>>> > > >> >
>>> > > >>
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
>>> > > >> > > >
>>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
>>> > becket.qin@gmail.com>
>>> > > >> > wrote:
>>> > > >> > > >
>>> > > >> > > >> +dev
>>> > > >> > > >>
>>> > > >> > > >> Hi Sebastian,
>>> > > >> > > >>
>>> > > >> > > >> Thank you for the summary. Please see the detailed replies
>>> > inline.
>>> > > >> As
>>> > > >> > a
>>> > > >> > > >> recap of my suggestions.
>>> > > >> > > >>
>>> > > >> > > >> 1. Pausable splits API.
>>> > > >> > > >>   a) Add default implementations to methods
>>> > "pauseOrResumeSplits"
>>> > > >> in
>>> > > >> > > both
>>> > > >> > > >> SourceReader and SplitReader where both default
>>> implementations
>>> > > >> throw
>>> > > >> > > >>  UnsupportedOperationException.
>>> > > >> > > >>
>>> > > >> > > >> 2. User story.
>>> > > >> > > >>     a) We tell users to enable the watermark alignment as
>>> they
>>> > > >> like.
>>> > > >> > > This
>>> > > >> > > >> is exactly what the current Flink API is.
>>> > > >> > > >>     b) We tell the source developers, please implement
>>> pausable
>>> > > >> > splits,
>>> > > >> > > >> otherwise bad things may happen. Think of it like you are
>>> > expected
>>> > > >> to
>>> > > >> > > >> implement SourceReader#snapshotState() properly, otherwise
>>> > > >> exceptions
>>> > > >> > > will
>>> > > >> > > >> be thrown when users enable checkpointing.
>>> > > >> > > >>
>>> > > >> > > >> Thanks,
>>> > > >> > > >>
>>> > > >> > > >> Jiangjie (Becket) Qin
>>> > > >> > > >>
>>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
>>> > > >> > > >> sebastian@ververica.com> wrote:
>>> > > >> > > >>
>>> > > >> > > >>> Hi Becket, Hi everybody,
>>> > > >> > > >>>
>>> > > >> > > >>> I'm sorry if I misread the messages but I could not
>>> derive an
>>> > > >> > agreement
>>> > > >> > > >>> from the mailing list. Nevertheless, if I understand you
>>> > right the
>>> > > >> > > >>> suggestion is:
>>> > > >> > > >>>
>>> > > >> > > >>> * Add default implementations to methods
>>> > "pauseOrResumeSplits" in
>>> > > >> > both
>>> > > >> > > >>> SourceReader and SplitReader where both default
>>> > implementations
>>> > > >> throw
>>> > > >> > > >>> UnsupportedOperationException.
>>> > > >> > > >>>
>>> > > >> > > >> Yes.
>>> > > >> > > >>
>>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source
>>> interface.
>>> > (In
>>> > > >> the
>>> > > >> > > >>> following, I refer to supporting this as "pausable
>>> splits".)
>>> > > >> > > >>>
>>> > > >> > > >> We may no longer need this if pausable splits are expected
>>> to
>>> > be
>>> > > >> > > >> implemented by the source developers, i.e. non-optional.
>>> Having
>>> > > >> this
>>> > > >> > > method
>>> > > >> > > >> would then be somewhat misleading as it looks like the
>>> sources
>>> > > >> that do
>>> > > >> > > not
>>> > > >> > > >> support pausable splits are also acceptable in the long
>>> term.
>>> > So
>>> > > >> API
>>> > > >> > > wise,
>>> > > >> > > >> I'd say maybe we should remove this for this FLIP,
>>> although I
>>> > > >> believe
>>> > > >> > > this
>>> > > >> > > >> supportXXX pattern itself is still attractive for optional
>>> > > >> features.
>>> > > >> > > >>
>>> > > >> > > >>
>>> > > >> > > >>>
>>> > > >> > > >>> To make the conclusions explicit:
>>> > > >> > > >>>
>>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in both
>>> > interfaces
>>> > > >> > > >>> SourceReader and SplitReader are optional where the
>>> default is
>>> > > >> that
>>> > > >> > it
>>> > > >> > > >>> doesn't support it. (--> This means that the
>>> implementation is
>>> > > >> still
>>> > > >> > > >>> optional for the source developer.)
>>> > > >> > > >>>
>>> > > >> > > >> It is optional for backwards compatibility with existing
>>> > sources,
>>> > > >> as
>>> > > >> > > they
>>> > > >> > > >> may still compile without code change. But starting from
>>> this
>>> > FLIP,
>>> > > >> > > Flink
>>> > > >> > > >> will always optimistically assume that all the sources
>>> support
>>> > > >> > pausable
>>> > > >> > > >> splits. If a source does not support pausable splits, it
>>> goes
>>> > to an
>>> > > >> > > error
>>> > > >> > > >> handling path when watermark alignment is enabled on it.
>>> This
>>> > is
>>> > > >> > > different
>>> > > >> > > >> from a usual optional feature, where no error is expected.
>>> > > >> > > >>
>>> > > >> > > >>
>>> > > >> > > >>> 2. If watermark alignment is enabled in the application
>>> code
>>> > by
>>> > > >> > adding
>>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
>>> > > >> SourceReader or
>>> > > >> > > >>> SplitReader do not support pausableSplits, we throw an
>>> > > >> > > >>> UnsupportedOperationException.
>>> > > >> > > >>>
>>> > > >> > > >> Yes.
>>> > > >> > > >>
>>> > > >> > > >>
>>> > > >> > > >>> 3. With regard to your statement:
>>> > > >> > > >>>
>>> > > >> > > >>>> [...] basically means watermark alignment is an
>>> non-optional
>>> > > >> feature
>>> > > >> > > to
>>> > > >> > > >>>> the end users.
>>> > > >> > > >>>
>>> > > >> > > >>> You actually mean that "pausable splits" are non-optional
>>> for
>>> > the
>>> > > >> app
>>> > > >> > > >>> developer if watermark alignment is enabled. However,
>>> > watermark
>>> > > >> > > alignment
>>> > > >> > > >>> is optional and can be enabled/disabled.
>>> > > >> > > >>>
>>> > > >> > > >> Yes, watermark alignment can be enabled/disabled in
>>> individual
>>> > > >> sources
>>> > > >> > > in
>>> > > >> > > >> Flink jobs, which basically means the code supporting
>>> watermark
>>> > > >> > > alignment
>>> > > >> > > >> has to already be there. That again means the Source
>>> > developers are
>>> > > >> > also
>>> > > >> > > >> expected to support pausable splits by default. So this
>>> way we
>>> > > >> > > essentially
>>> > > >> > > >> tell the end users that you may enable / disable this
>>> feature
>>> > as
>>> > > >> you
>>> > > >> > > wish,
>>> > > >> > > >> and tell the source developers that you SHOULD implement
>>> this
>>> > > >> because
>>> > > >> > > the
>>> > > >> > > >> end users may turn it on/off at will. And if the source
>>> does
>>> > not
>>> > > >> > support
>>> > > >> > > >> pausable splits, that goes to an error handling path when
>>> > watermark
>>> > > >> > > >> alignment is enabled on it. So users know they have to
>>> > explicitly
>>> > > >> > > exclude
>>> > > >> > > >> this source.
>>> > > >> > > >>
>>> > > >> > > >>
>>> > > >> > > >>>
>>> > > >> > > >>> So far it's totally clear to me and I hope this is what
>>> you
>>> > mean.
>>> > > >> I
>>> > > >> > > also
>>> > > >> > > >>> agree with both statements:
>>> > > >> > > >>>
>>> > > >> > > >>> So making that expectation aligned with the source
>>> developers
>>> > > >> seems
>>> > > >> > > >>>> reasonable.
>>> > > >> > > >>>>
>>> > > >> > > >>>
>>> > > >> > > >>> I think this is a simple and clean solution from both the
>>> end
>>> > user
>>> > > >> > and
>>> > > >> > > >>>> source developers' standpoint.
>>> > > >> > > >>>>
>>> > > >> > > >>>
>>> > > >> > > >>> However, a last conclusion derives from 3. and is an open
>>> > question
>>> > > >> > for
>>> > > >> > > >>> me:
>>> > > >> > > >>>
>>> > > >> > > >>> 4. The feature of "pausable splits" is now tightly bound
>>> to
>>> > > >> watermark
>>> > > >> > > >>> alignment, i.e., if sources do not support "pausable
>>> splits"
>>> > one
>>> > > >> can
>>> > > >> > > not
>>> > > >> > > >>> enable watermark alignment for these sources. This
>>> dependency
>>> > is
>>> > > >> not
>>> > > >> > > the
>>> > > >> > > >>> current status of watermark alignment implementation
>>> because
>>> > it
>>> > > >> > is/was
>>> > > >> > > >>> implemented without pausable splits. Do we want to
>>> introduce
>>> > this
>>> > > >> > > >>> dependency? (This is an open question. I cannot judge
>>> that.)
>>> > > >> > > >>>
>>> > > >> > > >> The watermark alignment basically relies on the pausable
>>> > splits,
>>> > > >> > right?
>>> > > >> > > >> So personally I found it quite reasonable that if the
>>> source
>>> > does
>>> > > >> not
>>> > > >> > > >> support pausable splits, end users cannot enable watermark
>>> > > >> alignment
>>> > > >> > on
>>> > > >> > > it.
>>> > > >> > > >>
>>> > > >> > > >>
>>> > > >> > > >>> If something is wrong, please correct me.
>>> > > >> > > >>>
>>> > > >> > > >>> Regards,
>>> > > >> > > >>> Sebastian
>>> > > >> > > >>>
>>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
>>> > becket.qin@gmail.com>
>>> > > >> > > wrote:
>>> > > >> > > >>>
>>> > > >> > > >>>> Hi Sebastian,
>>> > > >> > > >>>>
>>> > > >> > > >>>> Thanks for the reply and patient discussion. I agree
>>> this is
>>> > a
>>> > > >> > tricky
>>> > > >> > > >>>> decision.
>>> > > >> > > >>>>
>>> > > >> > > >>>>
>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
>>> > which I
>>> > > >> see
>>> > > >> > as
>>> > > >> > > >>>>> follows:
>>> > > >> > > >>>>> (1) An interface with default NOOP implementation makes
>>> the
>>> > > >> > > >>>>> implementation optional. And in my opinion, a default
>>> > > >> > implementation
>>> > > >> > > is and
>>> > > >> > > >>>>> will remain a way of making implementation optional
>>> because
>>> > > >> even in
>>> > > >> > > future
>>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
>>> without
>>> > > >> > support
>>> > > >> > > for
>>> > > >> > > >>>>> pausable splits.
>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
>>> suboptimal
>>> > > >> that
>>> > > >> > > with
>>> > > >> > > >>>>> a NOOP default implementation there is no way to check
>>> at
>>> > > >> runtime
>>> > > >> > if
>>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing.
>>> (To
>>> > do so,
>>> > > >> > one
>>> > > >> > > would
>>> > > >> > > >>>>> need a supportsX method which makes it again more
>>> > complicated.)\
>>> > > >> > > >>>>
>>> > > >> > > >>>>
>>> > > >> > > >>>> Based on the last few messages in the mailing list.
>>> Piotr
>>> > and I
>>> > > >> > > agreed
>>> > > >> > > >>>> that the default implementation should just throw an
>>> > > >> > > >>>> UnsupportedOperationException if the source is
>>> unpausable. So
>>> > > >> this
>>> > > >> > > >>>> basically tells the Source developers that this feature
>>> is
>>> > > >> expected
>>> > > >> > > to be
>>> > > >> > > >>>> supported. Because we cannot prevent end users from
>>> putting
>>> > an
>>> > > >> > > unpausable
>>> > > >> > > >>>> source into the watermark alignment group, that basically
>>> > means
>>> > > >> > > watermark
>>> > > >> > > >>>> alignment is an non-optional feature to the end users. So
>>> > making
>>> > > >> > that
>>> > > >> > > >>>> expectation aligned with the source developers seems
>>> > reasonable.
>>> > > >> > And
>>> > > >> > > if a
>>> > > >> > > >>>> source does not support this feature, the end users
>>> should
>>> > > >> > explicitly
>>> > > >> > > >>>> remove that source from the watermark alignment group.
>>> > > >> > > >>>>
>>> > > >> > > >>>> Personally speaking I think this is a simple and clean
>>> > solution
>>> > > >> from
>>> > > >> > > >>>> both the end user and source developers' standpoint.
>>> > > >> > > >>>>
>>> > > >> > > >>>> Does this address your concerns?
>>> > > >> > > >>>>
>>> > > >> > > >>>> Thanks,
>>> > > >> > > >>>>
>>> > > >> > > >>>> Jiangjie (Becket) Qin
>>> > > >> > > >>>>
>>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
>>> > > >> > > >>>> sebastian@ververica.com> wrote:
>>> > > >> > > >>>>
>>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> we, Dawid and I, discussed the various
>>> suggestions/options
>>> > and
>>> > > >> we
>>> > > >> > > >>>>> would be okay either way because we find neither
>>> solution is
>>> > > >> > perfect
>>> > > >> > > just
>>> > > >> > > >>>>> because of the already present complexity.
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
>>> SourceReader
>>> > and
>>> > > >> > > >>>>> SplitReader
>>> > > >> > > >>>>> Option a) Adding decorative interfaces to be used by
>>> > > >> SourceReader
>>> > > >> > and
>>> > > >> > > >>>>> SplitReader
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is
>>> > based on
>>> > > >> > > >>>>> Option c) which we find acceptable because the
>>> complexity
>>> > added
>>> > > >> is
>>> > > >> > > only a
>>> > > >> > > >>>>> single method.
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
>>> > which I
>>> > > >> see
>>> > > >> > as
>>> > > >> > > >>>>> follows:
>>> > > >> > > >>>>> (1) An interface with default NOOP implementation makes
>>> the
>>> > > >> > > >>>>> implementation optional. And in my opinion, a default
>>> > > >> > implementation
>>> > > >> > > is and
>>> > > >> > > >>>>> will remain a way of making implementation optional
>>> because
>>> > > >> even in
>>> > > >> > > future
>>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
>>> without
>>> > > >> > support
>>> > > >> > > for
>>> > > >> > > >>>>> pausable splits.
>>> > > >> > > >>>>> (2) It may not be too critical but I also find it
>>> suboptimal
>>> > > >> that
>>> > > >> > > with
>>> > > >> > > >>>>> a NOOP default implementation there is no way to check
>>> at
>>> > > >> runtime
>>> > > >> > if
>>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing.
>>> (To
>>> > do so,
>>> > > >> > one
>>> > > >> > > would
>>> > > >> > > >>>>> need a supportsX method which makes it again more
>>> > complicated.)
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> However, we haven't changed it because Option a) is
>>> also not
>>> > > >> > optimal
>>> > > >> > > >>>>> or straight-forward:
>>> > > >> > > >>>>> (1) We need to add two distinct yet similar decorative
>>> > > >> interfaces
>>> > > >> > > >>>>> since, as mentioned, the signatures of the methods are
>>> > > >> different.
>>> > > >> > For
>>> > > >> > > >>>>> example, we would need decorative interfaces like
>>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
>>> > > >> > `SourceReaderWithPausableSplits`.
>>> > > >> > > >>>>> (2) As a consequence, we would need to somehow document
>>> > > >> how/where
>>> > > >> > to
>>> > > >> > > >>>>> implement both interfaces and how this relates to each
>>> > other.
>>> > > >> This
>>> > > >> > > we could
>>> > > >> > > >>>>> solve by adding a note in the interface of SourceReader
>>> and
>>> > > >> > > SplitReader and
>>> > > >> > > >>>>> reference to the decorative interfaces but it still
>>> > increases
>>> > > >> > > complexity
>>> > > >> > > >>>>> too.
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> In summary, we see both as acceptable and preferred over
>>> > other
>>> > > >> > > >>>>> options. The question is if we can find a solution or
>>> > compromise
>>> > > >> > > that is
>>> > > >> > > >>>>> acceptable for everybody to reach consensus.
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> Please let us know what you think because we would be
>>> happy
>>> > if
>>> > > >> we
>>> > > >> > can
>>> > > >> > > >>>>> conclude the discussion to avoid dropping the
>>> initiative on
>>> > this
>>> > > >> > > FLIP.
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> Regards,
>>> > > >> > > >>>>> Sebastian
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> [1]
>>> > > >> > > >>>>>
>>> > > >> > >
>>> > > >> >
>>> > > >>
>>> >
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
>>> > > >> > > >>>>> (v. 12)
>>> > > >> > > >>>>>
>>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
>>> > > >> > pnowojski@apache.org
>>> > > >> > > >
>>> > > >> > > >>>>> wrote:
>>> > > >> > > >>>>>
>>> > > >> > > >>>>>> Hi Guowei,
>>> > > >> > > >>>>>>
>>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
>>> > > >> > > >>>>>>
>>> > > >> > > >>>>>> > This is covered in the previous FLIP[1] which has
>>> been
>>> > > >> already
>>> > > >> > > >>>>>> implemented in 1.15. In short, it must be enabled with
>>> the
>>> > > >> > watermark
>>> > > >> > > >>>>>> strategy which also configures drift and update
>>> interval
>>> > > >> > > >>>>>>
>>> > > >> > > >>>>>> So by default watermark alignment is disabled,
>>> regardless
>>> > if a
>>> > > >> > > source
>>> > > >> > > >>>>>> supports it or not.
>>> > > >> > > >>>>>>
>>> > > >> > > >>>>>> Best,
>>> > > >> > > >>>>>> Piotrek
>>> > > >> > > >>>>>>
>>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
>>> guowei.mgw@gmail.com>
>>> > > >> > > napisał(a):
>>> > > >> > > >>>>>>
>>> > > >> > > >>>>>>> Hi,
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>> We know that in the case of Bounded input Flink
>>> supports
>>> > the
>>> > > >> > Batch
>>> > > >> > > >>>>>>> execution mode. Currently in Batch execution mode,
>>> flink
>>> > is
>>> > > >> > > executed
>>> > > >> > > >>>>>>> on a
>>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark
>>> > alignment
>>> > > >> > > might
>>> > > >> > > >>>>>>> not
>>> > > >> > > >>>>>>> gain much.
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>> So my question is: Is watermark alignment the default
>>> > > >> > behavior(for
>>> > > >> > > >>>>>>> implemented source only)? If so, have you considered
>>> > > >> evaluating
>>> > > >> > the
>>> > > >> > > >>>>>>> impact
>>> > > >> > > >>>>>>> of this behavior on the Batch execution mode? Or
>>> thinks
>>> > it is
>>> > > >> not
>>> > > >> > > >>>>>>> necessary.
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>> Correct me if I miss something.
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>> Best,
>>> > > >> > > >>>>>>> Guowei
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
>>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
>>> > > >> > > >>>>>>> wrote:
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>> > Hi Becket and Dawid,
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > > I feel that no matter which option we choose this
>>> can
>>> > not
>>> > > >> be
>>> > > >> > > >>>>>>> solved
>>> > > >> > > >>>>>>> > entirely in either of the options, because of the
>>> point
>>> > > >> above
>>> > > >> > and
>>> > > >> > > >>>>>>> because
>>> > > >> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
>>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly
>>> different
>>> > (one
>>> > > >> > > >>>>>>> identifies
>>> > > >> > > >>>>>>> > splits with splitId the other one passes the splits
>>> > > >> directly).
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > Yes, that's a good point in this case and for
>>> features
>>> > that
>>> > > >> > need
>>> > > >> > > >>>>>>> to be
>>> > > >> > > >>>>>>> > implemented in more than one place.
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
>>> split
>>> > an
>>> > > >> > > optional
>>> > > >> > > >>>>>>> feature,
>>> > > >> > > >>>>>>> > > other than that this was not included in the
>>> original
>>> > > >> > > interface?
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > An additional argument in favor of making it
>>> optional
>>> > is to
>>> > > >> > > >>>>>>> simplify source
>>> > > >> > > >>>>>>> > implementation. But on its own I'm not sure if that
>>> > would be
>>> > > >> > > >>>>>>> enough to
>>> > > >> > > >>>>>>> > justify making this feature optional. Maybe.
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to
>>> just
>>> > let
>>> > > >> end
>>> > > >> > > >>>>>>> users and
>>> > > >> > > >>>>>>> > Flink
>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
>>> feature.
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > As I wrote above that would be an interesting
>>> choice to
>>> > make
>>> > > >> > > (ease
>>> > > >> > > >>>>>>> of
>>> > > >> > > >>>>>>> > implementation for new users, vs system
>>> consistency).
>>> > > >> > Regardless
>>> > > >> > > >>>>>>> of that,
>>> > > >> > > >>>>>>> > yes, for me the main argument is the API backward
>>> > > >> > compatibility.
>>> > > >> > > >>>>>>> But let's
>>> > > >> > > >>>>>>> > clear a couple of points:
>>> > > >> > > >>>>>>> > - The current proposal adding methods to the base
>>> > interface
>>> > > >> > with
>>> > > >> > > >>>>>>> default
>>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
>>> > > >> decorative
>>> > > >> > > >>>>>>> version
>>> > > >> > > >>>>>>> > would be.
>>> > > >> > > >>>>>>> > - Decorative version could implement "throw
>>> > > >> > > >>>>>>> UnsupportedOperationException"
>>> > > >> > > >>>>>>> > if user enabled watermark alignment just as well
>>> and I
>>> > agree
>>> > > >> > > >>>>>>> that's a
>>> > > >> > > >>>>>>> > better option compared to logging a warning.
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > Best,
>>> > > >> > > >>>>>>> > Piotrek
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
>>> > becket.qin@gmail.com>
>>> > > >> > > >>>>>>> napisał(a):
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>> > > Thanks for the reply and patient discussion,
>>> Piotr and
>>> > > >> Dawid.
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
>>> split
>>> > an
>>> > > >> > > optional
>>> > > >> > > >>>>>>> feature,
>>> > > >> > > >>>>>>> > > other than that this was not included in the
>>> original
>>> > > >> > > interface?
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > To be honest I am really worried about the
>>> complexity
>>> > of
>>> > > >> the
>>> > > >> > > >>>>>>> user story
>>> > > >> > > >>>>>>> > > here. Optional features like this have a high
>>> > overhead.
>>> > > >> > Imagine
>>> > > >> > > >>>>>>> this
>>> > > >> > > >>>>>>> > > feature is optional, now a user enabled watermark
>>> > > >> alignment
>>> > > >> > and
>>> > > >> > > >>>>>>> defined a
>>> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that
>>> > depends on
>>> > > >> > > >>>>>>> whether the
>>> > > >> > > >>>>>>> > > involved Source has implmemented this feature. If
>>> the
>>> > > >> Sources
>>> > > >> > > >>>>>>> are well
>>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end users may
>>> have to
>>> > > >> look
>>> > > >> > > into
>>> > > >> > > >>>>>>> the code
>>> > > >> > > >>>>>>> > > of the Source to see whether the feature is
>>> supported.
>>> > > >> Which
>>> > > >> > is
>>> > > >> > > >>>>>>> something
>>> > > >> > > >>>>>>> > > they shouldn't have to do.
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to
>>> just
>>> > let
>>> > > >> end
>>> > > >> > > >>>>>>> users and
>>> > > >> > > >>>>>>> > Flink
>>> > > >> > > >>>>>>> > > assume all the connectors will implement this
>>> feature.
>>> > > >> After
>>> > > >> > > all
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > > watermark group is not optinoal to the end users.
>>> If
>>> > in
>>> > > >> some
>>> > > >> > > >>>>>>> rare cases,
>>> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
>>> > > >> > > >>>>>>> UnsupportedOperationException
>>> > > >> > > >>>>>>> > will
>>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly remove this
>>> > Source
>>> > > >> from
>>> > > >> > > the
>>> > > >> > > >>>>>>> > watermark
>>> > > >> > > >>>>>>> > > group. I don't think we should have a warning
>>> message
>>> > > >> here,
>>> > > >> > as
>>> > > >> > > >>>>>>> they tend
>>> > > >> > > >>>>>>> > to
>>> > > >> > > >>>>>>> > > be ignored in many cases. If we do this, we don't
>>> even
>>> > > >> need
>>> > > >> > the
>>> > > >> > > >>>>>>> > supportXXX
>>> > > >> > > >>>>>>> > > method in the Source for this feature. In fact
>>> this is
>>> > > >> > exactly
>>> > > >> > > >>>>>>> how many
>>> > > >> > > >>>>>>> > > interfaces works today. For example,
>>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
>>> > > >> > > >>>>>>> > not
>>> > > >> > > >>>>>>> > > supported by Pravega source because it does not
>>> > support
>>> > > >> > partial
>>> > > >> > > >>>>>>> failover.
>>> > > >> > > >>>>>>> > > In that case, it simply throws an exception to
>>> > trigger a
>>> > > >> > global
>>> > > >> > > >>>>>>> recovery.
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > The reason we add a default implementation in this
>>> > case
>>> > > >> would
>>> > > >> > > >>>>>>> just for
>>> > > >> > > >>>>>>> > the
>>> > > >> > > >>>>>>> > > sake of backwards compatibility so the old source
>>> can
>>> > > >> still
>>> > > >> > > >>>>>>> compile.
>>> > > >> > > >>>>>>> > Sure,
>>> > > >> > > >>>>>>> > > in short term, this feature might not be
>>> supported by
>>> > many
>>> > > >> > > >>>>>>> existing
>>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite visible to
>>> the
>>> > source
>>> > > >> > > >>>>>>> developers
>>> > > >> > > >>>>>>> > that
>>> > > >> > > >>>>>>> > > they did not override the default impl which
>>> throws an
>>> > > >> > > >>>>>>> > > UnsupportedOperationException.
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > @Dawid,
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in the
>>> Source
>>> > > >> would
>>> > > >> > be
>>> > > >> > > >>>>>>> the single
>>> > > >> > > >>>>>>> > > >> source of truth regarding how to implement this
>>> > > >> feature.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > I also don't find it entirely true. Half of the
>>> > classes
>>> > > >> are
>>> > > >> > > >>>>>>> theoretically
>>> > > >> > > >>>>>>> > > > optional and are utility classes from the point
>>> of
>>> > view
>>> > > >> how
>>> > > >> > > the
>>> > > >> > > >>>>>>> > > interfaces
>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need
>>> to
>>> > use
>>> > > >> any
>>> > > >> > of
>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird
>>> to
>>> > list
>>> > > >> > their
>>> > > >> > > >>>>>>> methods in
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > > Source interface.
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is to guide
>>> > users
>>> > > >> to
>>> > > >> > > >>>>>>> implement the
>>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way
>>> to
>>> > > >> > implement a
>>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning that.
>>> Even the
>>> > > >> Java
>>> > > >> > > >>>>>>> language
>>> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
>>> > implementations
>>> > > >> [1]
>>> > > >> > so
>>> > > >> > > >>>>>>> people
>>> > > >> > > >>>>>>> > can
>>> > > >> > > >>>>>>> > > leverage them. But for this particular case, if we
>>> > make
>>> > > >> the
>>> > > >> > > >>>>>>> feature
>>> > > >> > > >>>>>>> > > non-optional, we don't even need the supportXXX()
>>> > method
>>> > > >> for
>>> > > >> > > now.
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > Thanks,
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>>> > > >> > > >>>>>>> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > wrote:
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > First of all, let me say I am happy with
>>> whichever
>>> > > >> option
>>> > > >> > is
>>> > > >> > > >>>>>>> agreed in
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > > discussion.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from the
>>> discussion
>>> > > >> > though:
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > @Becket:
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > The main argument for adding the methods to the
>>> > > >> > SourceReader
>>> > > >> > > >>>>>>> is that
>>> > > >> > > >>>>>>> > > these
>>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
>>> SourceReader
>>> > > >> impl,
>>> > > >> > > i.e.
>>> > > >> > > >>>>>>> > starting
>>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are
>>> > expected
>>> > > >> to
>>> > > >> > > >>>>>>> support this
>>> > > >> > > >>>>>>> > > > method, although some old implementations may
>>> not
>>> > have
>>> > > >> > > >>>>>>> implemented this
>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
>>> > features
>>> > > >> > from
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > optional
>>> > > >> > > >>>>>>> > > > features. While the public decorative interface
>>> is a
>>> > > >> > solution
>>> > > >> > > >>>>>>> to the
>>> > > >> > > >>>>>>> > > > optional features. We should not use it for the
>>> > features
>>> > > >> > that
>>> > > >> > > >>>>>>> are
>>> > > >> > > >>>>>>> > > > non-optional.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL.
>>> > Even
>>> > > >> > though
>>> > > >> > > >>>>>>> > preferred, I
>>> > > >> > > >>>>>>> > > > still think it can be simply optional.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the
>>> > Source
>>> > > >> would
>>> > > >> > > be
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > single
>>> > > >> > > >>>>>>> > > > source of truth regarding how to implement this
>>> > feature.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > I also don't find it entirely true. Half of the
>>> > classes
>>> > > >> are
>>> > > >> > > >>>>>>> > theoretically
>>> > > >> > > >>>>>>> > > > optional and are utility classes from the point
>>> of
>>> > view
>>> > > >> how
>>> > > >> > > the
>>> > > >> > > >>>>>>> > > interfaces
>>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need
>>> to
>>> > use
>>> > > >> any
>>> > > >> > of
>>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird
>>> to
>>> > list
>>> > > >> > their
>>> > > >> > > >>>>>>> methods in
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > > Source interface.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > @Piotr
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > If we have all of the methods with default
>>> > > >> implementation
>>> > > >> > in
>>> > > >> > > >>>>>>> the base
>>> > > >> > > >>>>>>> > > > interface, the API doesn't give any clue to the
>>> user
>>> > > >> which
>>> > > >> > > set
>>> > > >> > > >>>>>>> of
>>> > > >> > > >>>>>>> > methods
>>> > > >> > > >>>>>>> > > > are required to be implemented at the same time.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > I feel that no matter which option we choose
>>> this
>>> > can
>>> > > >> not
>>> > > >> > be
>>> > > >> > > >>>>>>> solved
>>> > > >> > > >>>>>>> > > > entirely in either of the options, because of
>>> the
>>> > point
>>> > > >> > above
>>> > > >> > > >>>>>>> and
>>> > > >> > > >>>>>>> > because
>>> > > >> > > >>>>>>> > > > the signature of
>>> SplitReader#pauseOrResumeSplits and
>>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly
>>> > different
>>> > > >> > (one
>>> > > >> > > >>>>>>> identifies
>>> > > >> > > >>>>>>> > > > splits with splitId the other one passes the
>>> splits
>>> > > >> > > directly).
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Best,
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Dawid
>>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Hi Piotr,
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Thanks for the comment.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the decorative
>>> > > >> > interfaces,
>>> > > >> > > >>>>>>> but I do
>>> > > >> > > >>>>>>> > > > think we should use them with caution. The main
>>> > argument
>>> > > >> > for
>>> > > >> > > >>>>>>> adding the
>>> > > >> > > >>>>>>> > > > methods to the SourceReader is that these
>>> methods
>>> > are
>>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl,
>>> i.e.
>>> > > >> > starting
>>> > > >> > > >>>>>>> from this
>>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to
>>> > support
>>> > > >> > this
>>> > > >> > > >>>>>>> > > > method, although some old implementations may
>>> not
>>> > have
>>> > > >> > > >>>>>>> implemented this
>>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
>>> > features
>>> > > >> > from
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > optional
>>> > > >> > > >>>>>>> > > > features. While the public decorative interface
>>> is a
>>> > > >> > solution
>>> > > >> > > >>>>>>> to the
>>> > > >> > > >>>>>>> > > > optional features. We should not use it for the
>>> > features
>>> > > >> > that
>>> > > >> > > >>>>>>> are
>>> > > >> > > >>>>>>> > > > non-optional.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > That said, this feature is optional for
>>> > SplitReaders.
>>> > > >> > > Arguably
>>> > > >> > > >>>>>>> we can
>>> > > >> > > >>>>>>> > > have
>>> > > >> > > >>>>>>> > > > a decorative interface for that, but for
>>> simplicity
>>> > and
>>> > > >> > > >>>>>>> symmetry of the
>>> > > >> > > >>>>>>> > > > interface, personally I prefer just adding a new
>>> > method.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned about the
>>> > > >> decorative
>>> > > >> > > >>>>>>> interfaces,
>>> > > >> > > >>>>>>> > > > they would make sense if:
>>> > > >> > > >>>>>>> > > > 1. The feature is optional.
>>> > > >> > > >>>>>>> > > > 2. There is only one decorative interface
>>> involved
>>> > for a
>>> > > >> > > >>>>>>> feature.
>>> > > >> > > >>>>>>> > > > Otherwise the argument that all the methods are
>>> > grouped
>>> > > >> > > >>>>>>> together will
>>> > > >> > > >>>>>>> > not
>>> > > >> > > >>>>>>> > > > stand.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Compared with that, I think the current solution
>>> > works
>>> > > >> fine
>>> > > >> > > in
>>> > > >> > > >>>>>>> all
>>> > > >> > > >>>>>>> > cases,
>>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and
>>> > default
>>> > > >> > > >>>>>>> methods /
>>> > > >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > The advantages are:
>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > In the current approach, the Java doc of the
>>> > > >> SupportXXX()
>>> > > >> > > >>>>>>> method in the
>>> > > >> > > >>>>>>> > > > Source would be the single source of truth
>>> regarding
>>> > > >> how to
>>> > > >> > > >>>>>>> implement
>>> > > >> > > >>>>>>> > > this
>>> > > >> > > >>>>>>> > > > feature. It lists the method that has to be
>>> > implemented
>>> > > >> to
>>> > > >> > > >>>>>>> support this
>>> > > >> > > >>>>>>> > > > feature, regardless of how many classes /
>>> > interfaces are
>>> > > >> > > >>>>>>> involved.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > When implementing the base interface, users do
>>> not
>>> > need
>>> > > >> to
>>> > > >> > > >>>>>>> implement a
>>> > > >> > > >>>>>>> > > > method with default implementation. If they are
>>> > curious
>>> > > >> > what
>>> > > >> > > >>>>>>> the method
>>> > > >> > > >>>>>>> > > is
>>> > > >> > > >>>>>>> > > > for, the java doc of that method simply points
>>> > users to
>>> > > >> the
>>> > > >> > > >>>>>>> > SupportXXX()
>>> > > >> > > >>>>>>> > > > method in the Source. It seems not adding work
>>> to
>>> > the
>>> > > >> users
>>> > > >> > > >>>>>>> compared
>>> > > >> > > >>>>>>> > with
>>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much better
>>> > > >> > discoverability.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > - all of the methods from a single feature are
>>> > grouped
>>> > > >> in a
>>> > > >> > > >>>>>>> single
>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
>>> dedicated
>>> > java
>>> > > >> > doc.
>>> > > >> > > >>>>>>> It's also
>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
>>> > decorator
>>> > > >> name
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > - if an optional feature requires two methods
>>> to be
>>> > > >> > > >>>>>>> implemented at
>>> > > >> > > >>>>>>> > once,
>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > These two points are not true when multiple
>>> > components
>>> > > >> and
>>> > > >> > > >>>>>>> classes are
>>> > > >> > > >>>>>>> > > > involved collaboratively to provide a feature.
>>> In
>>> > our
>>> > > >> case,
>>> > > >> > > we
>>> > > >> > > >>>>>>> have
>>> > > >> > > >>>>>>> > both
>>> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved. And there
>>> > might
>>> > > >> be
>>> > > >> > > other
>>> > > >> > > >>>>>>> > > interfaces
>>> > > >> > > >>>>>>> > > > on the JM side involved for some future
>>> features.
>>> > So the
>>> > > >> > > >>>>>>> relevant
>>> > > >> > > >>>>>>> > methods
>>> > > >> > > >>>>>>> > > > can actually be scattered over the places. That
>>> > said, we
>>> > > >> > may
>>> > > >> > > >>>>>>> still use
>>> > > >> > > >>>>>>> > > > decorative interfaces for each component, if the
>>> > > >> feature is
>>> > > >> > > >>>>>>> optional,
>>> > > >> > > >>>>>>> > > given
>>> > > >> > > >>>>>>> > > > there is a single source of truth for the
>>> feature.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Here I would strongly lean towards making life
>>> > easier
>>> > > >> for
>>> > > >> > new
>>> > > >> > > >>>>>>> users,
>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
>>> > expense
>>> > > >> > for
>>> > > >> > > >>>>>>> the power
>>> > > >> > > >>>>>>> > > >> users.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > I actually think the current approach is
>>> simpler,
>>> > more
>>> > > >> > > >>>>>>> extensible and
>>> > > >> > > >>>>>>> > > more
>>> > > >> > > >>>>>>> > > > general for all the users. Can you articulate a
>>> bit
>>> > > >> more on
>>> > > >> > > >>>>>>> which part
>>> > > >> > > >>>>>>> > > you
>>> > > >> > > >>>>>>> > > > think makes users harder to understand?
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > There is another benefit of the decorative
>>> > interfaces
>>> > > >> which
>>> > > >> > > is
>>> > > >> > > >>>>>>> not
>>> > > >> > > >>>>>>> > > > mentioned, but might be worth considering here.
>>> > Usually
>>> > > >> the
>>> > > >> > > >>>>>>> decorative
>>> > > >> > > >>>>>>> > > > interfaces give slightly better backwards
>>> > compatibility
>>> > > >> > than
>>> > > >> > > >>>>>>> the new
>>> > > >> > > >>>>>>> > > > default method in the interfaces. That is when
>>> > users are
>>> > > >> > > using
>>> > > >> > > >>>>>>> a jar
>>> > > >> > > >>>>>>> > that
>>> > > >> > > >>>>>>> > > > was compiled with an older version of Flink
>>> which
>>> > does
>>> > > >> not
>>> > > >> > > >>>>>>> have the
>>> > > >> > > >>>>>>> > > default
>>> > > >> > > >>>>>>> > > > method in the interfaces in question. A
>>> decorative
>>> > > >> > interface
>>> > > >> > > >>>>>>> may still
>>> > > >> > > >>>>>>> > > > provide backwards compatibility in that case,
>>> while
>>> > > >> default
>>> > > >> > > >>>>>>> method impl
>>> > > >> > > >>>>>>> > > > cannot.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > I think in Flink we in general do not guarantee
>>> > custom
>>> > > >> > > >>>>>>> components
>>> > > >> > > >>>>>>> > > compiled
>>> > > >> > > >>>>>>> > > > with an older version can run with a newer
>>> version
>>> > of
>>> > > >> > Flink.
>>> > > >> > > A
>>> > > >> > > >>>>>>> > recompile
>>> > > >> > > >>>>>>> > > > with a newer version would be required. That
>>> said,
>>> > if
>>> > > >> we do
>>> > > >> > > >>>>>>> care about
>>> > > >> > > >>>>>>> > > > this, we can just change the "supportXXX()"
>>> method
>>> > in
>>> > > >> the
>>> > > >> > > >>>>>>> Source
>>> > > >> > > >>>>>>> > > interface
>>> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave the
>>> other
>>> > parts
>>> > > >> > > >>>>>>> unchanged.
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Thanks,
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>>> > > >> > > >>>>>>> pnowojski@apache.org>
>>> > > >> > > >>>>>>> > > > wrote:
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > >> Hi,
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO
>>> for
>>> > the
>>> > > >> last
>>> > > >> > > two
>>> > > >> > > >>>>>>> weeks.
>>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be actually
>>> > against
>>> > > >> > adding
>>> > > >> > > >>>>>>> those
>>> > > >> > > >>>>>>> > > methods
>>> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons
>>> mentioned
>>> > above.
>>> > > >> > > >>>>>>> Clogging the
>>> > > >> > > >>>>>>> > > base
>>> > > >> > > >>>>>>> > > >> interface for new users with tons of methods
>>> that
>>> > they
>>> > > >> do
>>> > > >> > > not
>>> > > >> > > >>>>>>> need, do
>>> > > >> > > >>>>>>> > > not
>>> > > >> > > >>>>>>> > > >> understand and do not know what to do with
>>> them.
>>> > > >> Moreover,
>>> > > >> > > >>>>>>> such
>>> > > >> > > >>>>>>> > > decorative
>>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a feature
>>> > requires
>>> > > >> two
>>> > > >> > > or
>>> > > >> > > >>>>>>> more
>>> > > >> > > >>>>>>> > > methods
>>> > > >> > > >>>>>>> > > >> to be implemented at the same time. If we have
>>> all
>>> > of
>>> > > >> the
>>> > > >> > > >>>>>>> methods with
>>> > > >> > > >>>>>>> > > >> default implementation in the base interface,
>>> the
>>> > API
>>> > > >> > > doesn't
>>> > > >> > > >>>>>>> give any
>>> > > >> > > >>>>>>> > > >> clue
>>> > > >> > > >>>>>>> > > >> to the user which set of methods are required
>>> to be
>>> > > >> > > >>>>>>> implemented at the
>>> > > >> > > >>>>>>> > > >> same
>>> > > >> > > >>>>>>> > > >> time.
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
>>> > > >> interfaces
>>> > > >> > is
>>> > > >> > > >>>>>>> which
>>> > > >> > > >>>>>>> > > >> interface
>>> > > >> > > >>>>>>> > > >> > they can decorate and which combinations of
>>> > multiple
>>> > > >> > > >>>>>>> decorative
>>> > > >> > > >>>>>>> > > >> interfaces
>>> > > >> > > >>>>>>> > > >> > are valid. (...)
>>> > > >> > > >>>>>>> > > >> > In the future, if there is a new feature
>>> added
>>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware),
>>> are
>>> > we
>>> > > >> > going
>>> > > >> > > >>>>>>> to create
>>> > > >> > > >>>>>>> > > >> another
>>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
>>> > SortedSplitReader or
>>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
>>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the
>>> additional
>>> > > >> > decorative
>>> > > >> > > >>>>>>> interface
>>> > > >> > > >>>>>>> > > >> like
>>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
>>> > > >> understanding
>>> > > >> > > >>>>>>> cost of
>>> > > >> > > >>>>>>> > users
>>> > > >> > > >>>>>>> > > >> > because they have to know what decorative
>>> > interfaces
>>> > > >> are
>>> > > >> > > >>>>>>> there,
>>> > > >> > > >>>>>>> > which
>>> > > >> > > >>>>>>> > > >> > interface they can decorate and which
>>> > combinations of
>>> > > >> > the
>>> > > >> > > >>>>>>> decorative
>>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are not.
>>> Ideally
>>> > we
>>> > > >> want
>>> > > >> > to
>>> > > >> > > >>>>>>> avoid
>>> > > >> > > >>>>>>> > that.
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how embedding
>>> default
>>> > > >> methods
>>> > > >> > > in
>>> > > >> > > >>>>>>> the base
>>> > > >> > > >>>>>>> > > >> interface is solving the problem: what can be
>>> > combined
>>> > > >> or
>>> > > >> > > >>>>>>> not? If
>>> > > >> > > >>>>>>> > there
>>> > > >> > > >>>>>>> > > >> are
>>> > > >> > > >>>>>>> > > >> two conflicting features, having decorative
>>> > interfaces
>>> > > >> > that
>>> > > >> > > >>>>>>> can not be
>>> > > >> > > >>>>>>> > > >> mixed together actually makes much more sense
>>> to me
>>> > > >> rather
>>> > > >> > > >>>>>>> than having
>>> > > >> > > >>>>>>> > > >> them
>>> > > >> > > >>>>>>> > > >> all in one base class. How would you allow
>>> users to
>>> > > >> > > implement
>>> > > >> > > >>>>>>> only one
>>> > > >> > > >>>>>>> > > of
>>> > > >> > > >>>>>>> > > >> those two features?
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are
>>> > drawbacks:
>>> > > >> > > >>>>>>> > > >> - how can a user discover what decorators are
>>> > there?
>>> > > >> > > >>>>>>> > > >> - how can a user know where the decorator can
>>> be
>>> > > >> applied
>>> > > >> > to?
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> However those are drawbacks for more power
>>> users,
>>> > that
>>> > > >> can
>>> > > >> > > be
>>> > > >> > > >>>>>>> > mitigated
>>> > > >> > > >>>>>>> > > by
>>> > > >> > > >>>>>>> > > >> the documentation. For example listing all of
>>> the
>>> > > >> > decorators
>>> > > >> > > >>>>>>> with
>>> > > >> > > >>>>>>> > > >> detailed explanation both in the docs and in
>>> the
>>> > java
>>> > > >> > docs.
>>> > > >> > > >>>>>>> More
>>> > > >> > > >>>>>>> > > >> experienced users will be able to deal with
>>> those
>>> > > >> issues
>>> > > >> > > >>>>>>> easier, as
>>> > > >> > > >>>>>>> > they
>>> > > >> > > >>>>>>> > > >> will already have some basic understanding of
>>> > Flink.
>>> > > >> Also
>>> > > >> > if
>>> > > >> > > >>>>>>> user has
>>> > > >> > > >>>>>>> > a
>>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he will google
>>> > search a
>>> > > >> > > >>>>>>> potential
>>> > > >> > > >>>>>>> > > solution
>>> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing that he
>>> is
>>> > very
>>> > > >> > > likely
>>> > > >> > > >>>>>>> to
>>> > > >> > > >>>>>>> > > discover
>>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in the docs.
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> The advantages are:
>>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
>>> > > >> > > >>>>>>> > > >> - all of the methods from a single feature are
>>> > grouped
>>> > > >> in
>>> > > >> > a
>>> > > >> > > >>>>>>> single
>>> > > >> > > >>>>>>> > > >> decorator interface, together with their
>>> dedicated
>>> > java
>>> > > >> > doc.
>>> > > >> > > >>>>>>> It's also
>>> > > >> > > >>>>>>> > > >> easier to google search for help using the
>>> > decorator
>>> > > >> name
>>> > > >> > > >>>>>>> > > >> - if an optional feature requires two methods
>>> to be
>>> > > >> > > >>>>>>> implemented at
>>> > > >> > > >>>>>>> > once,
>>> > > >> > > >>>>>>> > > >> decorator can guarantee that
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards making life
>>> > easier
>>> > > >> for
>>> > > >> > > new
>>> > > >> > > >>>>>>> users,
>>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
>>> > expense
>>> > > >> > for
>>> > > >> > > >>>>>>> the power
>>> > > >> > > >>>>>>> > > >> users.
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> Best,
>>> > > >> > > >>>>>>> > > >> Piotrek
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
>>> > > >> becket.qin@gmail.com
>>> > > >> > >
>>> > > >> > > >>>>>>> > napisał(a):
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good summary. This
>>> is a
>>> > > >> really
>>> > > >> > > >>>>>>> helpful
>>> > > >> > > >>>>>>> > > >> discussion.
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might
>>> still
>>> > be
>>> > > >> > better
>>> > > >> > > >>>>>>> to add the
>>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source rather
>>> than
>>> > > >> > > SourceReader.
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are trying to do
>>> > here is
>>> > > >> to
>>> > > >> > > let
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > Flink
>>> > > >> > > >>>>>>> > > >> > framework know what the Source is capable
>>> of. In
>>> > this
>>> > > >> > > FLIP,
>>> > > >> > > >>>>>>> it
>>> > > >> > > >>>>>>> > happens
>>> > > >> > > >>>>>>> > > >> to
>>> > > >> > > >>>>>>> > > >> > be the capability that only involves
>>> > SourceReader.
>>> > > >> But
>>> > > >> > in
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > future,
>>> > > >> > > >>>>>>> > > >> it is
>>> > > >> > > >>>>>>> > > >> > possible that another functionality involves
>>> > both the
>>> > > >> > > >>>>>>> > SplitEnumerator
>>> > > >> > > >>>>>>> > > >> and
>>> > > >> > > >>>>>>> > > >> > SourceReader. In that case, following the
>>> current
>>> > > >> > > approach,
>>> > > >> > > >>>>>>> we
>>> > > >> > > >>>>>>> > should
>>> > > >> > > >>>>>>> > > >> put
>>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
>>> > SplitEnumerator
>>> > > >> and
>>> > > >> > > >>>>>>> SourceReader.
>>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
>>> SourceReader,
>>> > then
>>> > > >> > the
>>> > > >> > > >>>>>>> JM would
>>> > > >> > > >>>>>>> > > have
>>> > > >> > > >>>>>>> > > >> to
>>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to know
>>> whether
>>> > this
>>> > > >> > > feature
>>> > > >> > > >>>>>>> is
>>> > > >> > > >>>>>>> > > >> supported,
>>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we put the
>>> > > >> > "supportsXXX()"
>>> > > >> > > >>>>>>> method in
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > >> > Source, we will break the "symmetric" design
>>> > because
>>> > > >> > this
>>> > > >> > > >>>>>>> FLIP
>>> > > >> > > >>>>>>> > chose a
>>> > > >> > > >>>>>>> > > >> > different way.
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > This is also why I think supportsXXX() method
>>> > seems a
>>> > > >> > good
>>> > > >> > > >>>>>>> thing to
>>> > > >> > > >>>>>>> > > >> have,
>>> > > >> > > >>>>>>> > > >> > because when there are a few interfaces /
>>> methods
>>> > > >> that
>>> > > >> > are
>>> > > >> > > >>>>>>> expected
>>> > > >> > > >>>>>>> > to
>>> > > >> > > >>>>>>> > > >> be
>>> > > >> > > >>>>>>> > > >> > implemented at the same time in order to
>>> deliver
>>> > a
>>> > > >> > > feature,
>>> > > >> > > >>>>>>> it is
>>> > > >> > > >>>>>>> > > always
>>> > > >> > > >>>>>>> > > >> > good to have a single source of truth to
>>> tell the
>>> > > >> > > framework
>>> > > >> > > >>>>>>> what to
>>> > > >> > > >>>>>>> > > do,
>>> > > >> > > >>>>>>> > > >> so
>>> > > >> > > >>>>>>> > > >> > the framework can do consistent things in
>>> > different
>>> > > >> > parts.
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <sebastian@ververica.com
>>> >
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
>>> > > >> AlignedSourceReader
>>> > > >> > +
>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward
>>> about is
>>> > > >> that we
>>> > > >> > > are
>>> > > >> > > >>>>>>> > > essentially
>>> > > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
>>> > > >> implementations to
>>> > > >> > > >>>>>>> extend
>>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
>>> > > >> > SourceReader
>>> > > >> > > >>>>>>> interface
>>> > > >> > > >>>>>>> > > >> > without the pausing support useless. So this
>>> > > >> indicates
>>> > > >> > > that
>>> > > >> > > >>>>>>> public
>>> > > >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for
>>> the
>>> > same
>>> > > >> > > >>>>>>> purpose) only
>>> > > >> > > >>>>>>> > > >> > make sense if the original interface is also
>>> > > >> expected to
>>> > > >> > > be
>>> > > >> > > >>>>>>> used.
>>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add
>>> the
>>> > > >> method
>>> > > >> > to
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > original
>>> > > >> > > >>>>>>> > > >> > interface itself.
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > Cheers,
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid
>>> Wysakowicz
>>> > <
>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > >> > wrote:
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > I think most of your points aligned with
>>> the
>>> > > >> > suggestions
>>> > > >> > > >>>>>>> I made to
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I
>>> > believe
>>> > > >> we
>>> > > >> > > hit
>>> > > >> > > >>>>>>> enter
>>> > > >> > > >>>>>>> > > >> nearly at
>>> > > >> > > >>>>>>> > > >> > > the same time ;) )
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
>>> > > >> supportsWatermarksSplitAlignment
>>> > > >> > > to
>>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the
>>> general
>>> > > >> > capability
>>> > > >> > > of
>>> > > >> > > >>>>>>> > pausing.
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
>>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
>>> > > >> > > >>>>>>> > > (option
>>> > > >> > > >>>>>>> > > >> b)
>>> > > >> > > >>>>>>> > > >> > > or if we should just add the methods
>>> (option
>>> > c), I
>>> > > >> > > >>>>>>> suggest to
>>> > > >> > > >>>>>>> > simply
>>> > > >> > > >>>>>>> > > >> add
>>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is much
>>> > preferred
>>> > > >> > > approach
>>> > > >> > > >>>>>>> Becket,
>>> > > >> > > >>>>>>> > > >> which
>>> > > >> > > >>>>>>> > > >> > > others do not object. Unless there is an
>>> > opposition
>>> > > >> > > let's
>>> > > >> > > >>>>>>> go with
>>> > > >> > > >>>>>>> > > this
>>> > > >> > > >>>>>>> > > >> > > option c.
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Best,
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Dawid
>>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis
>>> wrote:
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Hi folks,
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks
>>> for
>>> > all
>>> > > >> the
>>> > > >> > > >>>>>>> input and
>>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed
>>> some
>>> > time
>>> > > >> to
>>> > > >> > > >>>>>>> catch up and
>>> > > >> > > >>>>>>> > > >> > structure
>>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted to
>>> > find a
>>> > > >> way
>>> > > >> > > to
>>> > > >> > > >>>>>>> > structure
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the
>>> feeling
>>> > that
>>> > > >> some
>>> > > >> > > >>>>>>> concerns
>>> > > >> > > >>>>>>> > got
>>> > > >> > > >>>>>>> > > >> lost
>>> > > >> > > >>>>>>> > > >> > in
>>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt and
>>> please
>>> > > >> correct
>>> > > >> > > me
>>> > > >> > > >>>>>>> if
>>> > > >> > > >>>>>>> > > >> something is
>>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect
>>> and
>>> > > >> > assemble
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > > opinions,
>>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best
>>> of my
>>> > > >> > > >>>>>>> knowledge):
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
>>> (pause/resume
>>> > > >> > behavior)
>>> > > >> > > >>>>>>> be a
>>> > > >> > > >>>>>>> > general
>>> > > >> > > >>>>>>> > > >> > > capability?
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > I personally don't see any reason no to
>>> have
>>> > it a
>>> > > >> > > general
>>> > > >> > > >>>>>>> > capability
>>> > > >> > > >>>>>>> > > >> > > because for the alignSplit method it is
>>> > actually
>>> > > >> > > >>>>>>> independent of
>>> > > >> > > >>>>>>> > the
>>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a
>>> > general
>>> > > >> > > >>>>>>> capability, we
>>> > > >> > > >>>>>>> > > >> should
>>> > > >> > > >>>>>>> > > >> > > also agree on the right wording. Does
>>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
>>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then
>>> actually
>>> > > >> meant?
>>> > > >> > (I
>>> > > >> > > >>>>>>> see it as
>>> > > >> > > >>>>>>> > > >> okay.
>>> > > >> > > >>>>>>> > > >> > I
>>> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid
>>> > suggested
>>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
>>> > enable/disable
>>> > > >> split
>>> > > >> > > >>>>>>> alignment?
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > I would personally not disable the split
>>> > alignment
>>> > > >> on
>>> > > >> > > the
>>> > > >> > > >>>>>>> source
>>> > > >> > > >>>>>>> > > >> reader
>>> > > >> > > >>>>>>> > > >> > > side because if split alignment is used for
>>> > some
>>> > > >> other
>>> > > >> > > >>>>>>> use case
>>> > > >> > > >>>>>>> > (see
>>> > > >> > > >>>>>>> > > >> A)
>>> > > >> > > >>>>>>> > > >> > it
>>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
>>> other/future
>>> > use
>>> > > >> > cases.
>>> > > >> > > >>>>>>> Instead,
>>> > > >> > > >>>>>>> > I
>>> > > >> > > >>>>>>> > > >> would
>>> > > >> > > >>>>>>> > > >> > > disable "watermark split alignment" where I
>>> > think
>>> > > >> it
>>> > > >> > > >>>>>>> should
>>> > > >> > > >>>>>>> > disable
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for split
>>> > alignment.
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > I find it difficult to define the scope of
>>> a
>>> > > >> supportsX
>>> > > >> > > >>>>>>> method
>>> > > >> > > >>>>>>> > w.r.t.
>>> > > >> > > >>>>>>> > > >> to
>>> > > >> > > >>>>>>> > > >> > > the following questions: a) Where is it
>>> used?
>>> > and
>>> > > >> b)
>>> > > >> > > What
>>> > > >> > > >>>>>>> is the
>>> > > >> > > >>>>>>> > > >> expected
>>> > > >> > > >>>>>>> > > >> > > output? To b), it's not straight-forward to
>>> > > >> provide a
>>> > > >> > > >>>>>>> meaningful
>>> > > >> > > >>>>>>> > > >> output,
>>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split
>>> alignment
>>> > but
>>> > > >> > > >>>>>>> SplitReader
>>> > > >> > > >>>>>>> > not.
>>> > > >> > > >>>>>>> > > >> This
>>> > > >> > > >>>>>>> > > >> > is
>>> > > >> > > >>>>>>> > > >> > > because with the current implementation,
>>> we can
>>> > > >> > > determine
>>> > > >> > > >>>>>>> whether
>>> > > >> > > >>>>>>> > > >> split
>>> > > >> > > >>>>>>> > > >> > > alignment is fully supported only during
>>> > runtime
>>> > > >> and
>>> > > >> > > >>>>>>> specifically
>>> > > >> > > >>>>>>> > > >> > actually
>>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down the call
>>> > > >> hierarchy
>>> > > >> > up
>>> > > >> > > >>>>>>> to the
>>> > > >> > > >>>>>>> > > actual
>>> > > >> > > >>>>>>> > > >> > > SplitReaders.
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either raise
>>> an
>>> > > >> error or
>>> > > >> > > >>>>>>> warning if
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > >> > > alignment is called but not supported at
>>> some
>>> > > >> point. I
>>> > > >> > > >>>>>>> know we
>>> > > >> > > >>>>>>> > > should
>>> > > >> > > >>>>>>> > > >> > > carefully think about when this could be
>>> the
>>> > case
>>> > > >> > > because
>>> > > >> > > >>>>>>> we don't
>>> > > >> > > >>>>>>> > > >> want
>>> > > >> > > >>>>>>> > > >> > to
>>> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings. However,
>>> > warnings
>>> > > >> > > could
>>> > > >> > > >>>>>>> be an
>>> > > >> > > >>>>>>> > > >> indicator
>>> > > >> > > >>>>>>> > > >> > > for the user that for watermark split
>>> > alignment use
>>> > > >> > case
>>> > > >> > > >>>>>>> split
>>> > > >> > > >>>>>>> > > >> reading is
>>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to either
>>> > disable
>>> > > >> the
>>> > > >> > > >>>>>>> trigger for
>>> > > >> > > >>>>>>> > > >> > watermark
>>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
>>> > use/implement a
>>> > > >> > source
>>> > > >> > > >>>>>>> and
>>> > > >> > > >>>>>>> > reader
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the discussion with
>>> the
>>> > the
>>> > > >> > > various
>>> > > >> > > >>>>>>> > > >> possibilities
>>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I
>>> would
>>> > > >> like to
>>> > > >> > > >>>>>>> summarize
>>> > > >> > > >>>>>>> > the
>>> > > >> > > >>>>>>> > > >> > > following requirements:
>>> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
>>> > ("symmetric"),
>>> > > >> i.e.,
>>> > > >> > > >>>>>>> similar
>>> > > >> > > >>>>>>> > > >> semantics
>>> > > >> > > >>>>>>> > > >> > > should have similar interfaces with similar
>>> > usage.
>>> > > >> > > >>>>>>> > > >> > > - Make explicit which implementations
>>> implement
>>> > > >> > > >>>>>>> interfaces/support
>>> > > >> > > >>>>>>> > > >> > > behavior.
>>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
>>> implementations
>>> > and
>>> > > >> how
>>> > > >> > to
>>> > > >> > > >>>>>>> implement
>>> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the relations
>>> > between
>>> > > >> > > >>>>>>> relevant
>>> > > >> > > >>>>>>> > classes
>>> > > >> > > >>>>>>> > > of
>>> > > >> > > >>>>>>> > > >> > the
>>> > > >> > > >>>>>>> > > >> > > PoC implementation:
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase
>>> > > >> (Internal)
>>> > > >> > > >>>>>>> <|-- ..
>>> > > >> > > >>>>>>> > <|--
>>> > > >> > > >>>>>>> > > >> > > MySourceReader
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager
>>> > (Internal)
>>> > > >> > <>--
>>> > > >> > > >>>>>>> > SplitFetcher
>>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
>>> > > >> > MySplitReader
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A
>>> "has
>>> > a"
>>> > > >> B)
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
>>> > SplitFetcherManager
>>> > > >> > > >>>>>>> implement most
>>> > > >> > > >>>>>>> > of
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just because we
>>> > wanted
>>> > > >> to
>>> > > >> > > >>>>>>> implement
>>> > > >> > > >>>>>>> > > split
>>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be available as
>>> > kind of
>>> > > >> a
>>> > > >> > > >>>>>>> default. As a
>>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
>>> > implementation" for
>>> > > >> > > >>>>>>> SourceReader
>>> > > >> > > >>>>>>> > > that
>>> > > >> > > >>>>>>> > > >> > > implements the actual logic for split
>>> > alignment.
>>> > > >> For
>>> > > >> > > that
>>> > > >> > > >>>>>>> reason,
>>> > > >> > > >>>>>>> > I
>>> > > >> > > >>>>>>> > > >> find
>>> > > >> > > >>>>>>> > > >> > it
>>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
>>> > > >> implementation
>>> > > >> > in
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > > interface
>>> > > >> > > >>>>>>> > > >> for
>>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence,
>>> interface
>>> > > >> strategy
>>> > > >> > > c)
>>> > > >> > > >>>>>>> is
>>> > > >> > > >>>>>>> > > difficult
>>> > > >> > > >>>>>>> > > >> > > because this would require NOOP default
>>> > > >> > implementations
>>> > > >> > > >>>>>>> in the
>>> > > >> > > >>>>>>> > > public
>>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader.
>>> > This is
>>> > > >> > the
>>> > > >> > > >>>>>>> same for
>>> > > >> > > >>>>>>> > > >> strategy
>>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP default
>>> > > >> > implementation
>>> > > >> > > >>>>>>> in the
>>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described
>>> > method
>>> > > >> > > >>>>>>> signatures of
>>> > > >> > > >>>>>>> > > >> alignSplit
>>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ
>>> and it
>>> > > >> would
>>> > > >> > be
>>> > > >> > > >>>>>>> extremely
>>> > > >> > > >>>>>>> > > >> > difficult
>>> > > >> > > >>>>>>> > > >> > > to make the signatures the same (with even
>>> > > >> potential
>>> > > >> > > >>>>>>> performance
>>> > > >> > > >>>>>>> > > >> impact
>>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of split
>>> ids).
>>> > > >> > Therefore,
>>> > > >> > > >>>>>>> having a
>>> > > >> > > >>>>>>> > > >> > symmetric
>>> > > >> > > >>>>>>> > > >> > > decorative interface as of strategy a) is
>>> > actually
>>> > > >> not
>>> > > >> > > >>>>>>> possible
>>> > > >> > > >>>>>>> > and
>>> > > >> > > >>>>>>> > > >> > having
>>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with different
>>> method
>>> > > >> > > >>>>>>> signatures is
>>> > > >> > > >>>>>>> > > >> confusing.
>>> > > >> > > >>>>>>> > > >> > My
>>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best with
>>> strategy b)
>>> > > >> which
>>> > > >> > > >>>>>>> means to
>>> > > >> > > >>>>>>> > have
>>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit
>>> from
>>> > the
>>> > > >> > parent
>>> > > >> > > >>>>>>> > interface:
>>> > > >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader,
>>> > SplitReader
>>> > > >> > <|--
>>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
>>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what
>>> the
>>> > > >> > > implications
>>> > > >> > > >>>>>>> are and
>>> > > >> > > >>>>>>> > if
>>> > > >> > > >>>>>>> > > >> this
>>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid
>>> > and I
>>> > > >> just
>>> > > >> > > >>>>>>> try to
>>> > > >> > > >>>>>>> > > >> implement
>>> > > >> > > >>>>>>> > > >> > and
>>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > # Summary
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know your
>>> > > >> perspectives.
>>> > > >> > > >>>>>>> Please
>>> > > >> > > >>>>>>> > correct
>>> > > >> > > >>>>>>> > > >> me,
>>> > > >> > > >>>>>>> > > >> > if
>>> > > >> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
>>> > > >> something. My
>>> > > >> > > >>>>>>> perspective
>>> > > >> > > >>>>>>> > > >> would
>>> > > >> > > >>>>>>> > > >> > be:
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Top A: Yes
>>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger
>>> for
>>> > split
>>> > > >> > > >>>>>>> alignment)
>>> > > >> > > >>>>>>> > > >> > > Top C: No
>>> > > >> > > >>>>>>> > > >> > > Top D: b)
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > Best,
>>> > > >> > > >>>>>>> > > >> > > Sebastian
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid
>>> > Wysakowicz <
>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > > wrote:
>>> > > >> > > >>>>>>> > > >> > >
>>> > > >> > > >>>>>>> > > >> > >> @Arvid:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
>>> > approach, I
>>> > > >> > fear
>>> > > >> > > >>>>>>> that it
>>> > > >> > > >>>>>>> > > >> doesn't
>>> > > >> > > >>>>>>> > > >> > work
>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
>>> > always
>>> > > >> be
>>> > > >> > > >>>>>>> aligned
>>> > > >> > > >>>>>>> > > >> cross-task
>>> > > >> > > >>>>>>> > > >> > and
>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment.
>>> So
>>> > it's
>>> > > >> > > >>>>>>> plausible to put
>>> > > >> > > >>>>>>> > > >> > sources
>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they
>>> do
>>> > not
>>> > > >> use
>>> > > >> > any
>>> > > >> > > >>>>>>> of the
>>> > > >> > > >>>>>>> > > >> presented
>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>>> > > >> warning, if
>>> > > >> > > >>>>>>> they handle
>>> > > >> > > >>>>>>> > > >> > multiple
>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be
>>> for
>>> > > >> telling
>>> > > >> > > if
>>> > > >> > > >>>>>>> it
>>> > > >> > > >>>>>>> > > supports
>>> > > >> > > >>>>>>> > > >> > that
>>> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment
>>> > would
>>> > > >> > always
>>> > > >> > > >>>>>>> be
>>> > > >> > > >>>>>>> > > supported.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I believe to
>>> be
>>> > > >> closest
>>> > > >> > > to a
>>> > > >> > > >>>>>>> > consensus
>>> > > >> > > >>>>>>> > > >> > >> between all participants. Do you mind
>>> taking a
>>> > > >> look?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the
>>> nits?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully
>>> agree
>>> > > >> that
>>> > > >> > > >>>>>>> decorator
>>> > > >> > > >>>>>>> > > >> interfaces
>>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big
>>> fan
>>> > of
>>> > > >> > > >>>>>>> overloading the
>>> > > >> > > >>>>>>> > > base
>>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as
>>> > is). The
>>> > > >> > > usual
>>> > > >> > > >>>>>>> feedback
>>> > > >> > > >>>>>>> > > to
>>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are always that
>>> > they are
>>> > > >> > > >>>>>>> overwhelming
>>> > > >> > > >>>>>>> > and
>>> > > >> > > >>>>>>> > > >> too
>>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not
>>> > oppose
>>> > > >> 1c)
>>> > > >> > as
>>> > > >> > > >>>>>>> scattered
>>> > > >> > > >>>>>>> > > >> > interfaces
>>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a)
>>> and
>>> > 1d).
>>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
>>> > approach, I
>>> > > >> > fear
>>> > > >> > > >>>>>>> that it
>>> > > >> > > >>>>>>> > > >> doesn't
>>> > > >> > > >>>>>>> > > >> > work
>>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
>>> > always
>>> > > >> be
>>> > > >> > > >>>>>>> aligned
>>> > > >> > > >>>>>>> > > >> cross-task
>>> > > >> > > >>>>>>> > > >> > and
>>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment.
>>> So
>>> > it's
>>> > > >> > > >>>>>>> plausible to put
>>> > > >> > > >>>>>>> > > >> > sources
>>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they
>>> do
>>> > not
>>> > > >> use
>>> > > >> > any
>>> > > >> > > >>>>>>> of the
>>> > > >> > > >>>>>>> > > >> presented
>>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>>> > > >> warning, if
>>> > > >> > > >>>>>>> they handle
>>> > > >> > > >>>>>>> > > >> > multiple
>>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate
>>> > future
>>> > > >> use
>>> > > >> > > >>>>>>> cases makes
>>> > > >> > > >>>>>>> > > >> sense
>>> > > >> > > >>>>>>> > > >> > but
>>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen)
>>> should
>>> > be
>>> > > >> > > adjusted
>>> > > >> > > >>>>>>> > > >> accordingly.
>>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
>>> > PausingSourceReader
>>> > > >> and
>>> > > >> > > I'd
>>> > > >> > > >>>>>>> go for
>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal
>>> > afaik). We
>>> > > >> > could
>>> > > >> > > >>>>>>> also
>>> > > >> > > >>>>>>> > split
>>> > > >> > > >>>>>>> > > it
>>> > > >> > > >>>>>>> > > >> > into
>>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
>>> > > >> pauseOrResumeSplits
>>> > > >> > > >>>>>>> may allow
>>> > > >> > > >>>>>>> > > >> Sources
>>> > > >> > > >>>>>>> > > >> > to
>>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as
>>> > written
>>> > > >> in
>>> > > >> > > the
>>> > > >> > > >>>>>>> > Javadoc),
>>> > > >> > > >>>>>>> > > >> both
>>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not
>>> > sure if
>>> > > >> > there
>>> > > >> > > >>>>>>> is a
>>> > > >> > > >>>>>>> > system
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >> can.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
>>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
>>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the
>>> capability
>>> > > >> > section:
>>> > > >> > > >>>>>>> Sources
>>> > > >> > > >>>>>>> > that
>>> > > >> > > >>>>>>> > > >> > adopt
>>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in earlier
>>> > > >> versions. So
>>> > > >> > > it
>>> > > >> > > >>>>>>> feels
>>> > > >> > > >>>>>>> > like
>>> > > >> > > >>>>>>> > > >> we
>>> > > >> > > >>>>>>> > > >> > are
>>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old sources can
>>> be
>>> > used
>>> > > >> > after
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > change);
>>> > > >> > > >>>>>>> > > >> but
>>> > > >> > > >>>>>>> > > >> > I
>>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API addition.
>>> > > >> > > >>>>>>> > > >> > >> - You might want to add what happens when
>>> all
>>> > > >> splits
>>> > > >> > > are
>>> > > >> > > >>>>>>> paused.
>>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how the 3
>>> flavors
>>> > of
>>> > > >> > > >>>>>>> SourceReaderBase
>>> > > >> > > >>>>>>> > > >> > interact
>>> > > >> > > >>>>>>> > > >> > >> with the interface.
>>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to
>>> include
>>> > Kafka
>>> > > >> and
>>> > > >> > > >>>>>>> Pulsar in
>>> > > >> > > >>>>>>> > the
>>> > > >> > > >>>>>>> > > >> > FLIP.
>>> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate follow-up
>>> > work.
>>> > > >> > (could
>>> > > >> > > >>>>>>> be in the
>>> > > >> > > >>>>>>> > > >> same
>>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Arvid
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid
>>> > Wysakowicz
>>> > > >> <
>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
>>> SourceReader,
>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>> > > >> > > >>>>>>> > > >> along
>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
>>> SplitReader,
>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>> > > >> > > >>>>>>> > > or
>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
>>> > AlignedSourceReader"
>>> > > >> > and
>>> > > >> > > >>>>>>> > > "MySplitReader
>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
>>> SourceReader"
>>> > and
>>> > > >> > > >>>>>>> "MySplitReader
>>> > > >> > > >>>>>>> > > >> > implements
>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
>>> Dawid
>>> > > >> would
>>> > > >> > > be:
>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
>>> SourceReader"
>>> > and
>>> > > >> > > >>>>>>> "MySplitReader
>>> > > >> > > >>>>>>> > > >> > implements
>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
>>> personally
>>> > > >> > speaking I
>>> > > >> > > >>>>>>> think it
>>> > > >> > > >>>>>>> > is
>>> > > >> > > >>>>>>> > > >> > simpler
>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
>>> reader
>>> > with
>>> > > >> > > >>>>>>> default impl.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to have it
>>> > aligned as
>>> > > >> > much
>>> > > >> > > as
>>> > > >> > > >>>>>>> > possible.
>>> > > >> > > >>>>>>> > > >> I'd
>>> > > >> > > >>>>>>> > > >> > be
>>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
>>> > > >> > AlignedSplitReader
>>> > > >> > > >>>>>>> with
>>> > > >> > > >>>>>>> > > >> > SplitReader. We
>>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
>>> > things in
>>> > > >> > the
>>> > > >> > > >>>>>>> future?
>>> > > >> > > >>>>>>> > For
>>> > > >> > > >>>>>>> > > >> > example,
>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits.
>>> This
>>> > > >> > feature
>>> > > >> > > >>>>>>> may also
>>> > > >> > > >>>>>>> > > need
>>> > > >> > > >>>>>>> > > >> an
>>> > > >> > > >>>>>>> > > >> > >> interface like this.
>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface
>>> to
>>> > > >> support
>>> > > >> > > >>>>>>> align other
>>> > > >> > > >>>>>>> > > >> > things, I
>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>>> > > >> > > >>>>>>> > > >> instead
>>> > > >> > > >>>>>>> > > >> > of
>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it
>>> would
>>> > > >> usually
>>> > > >> > be
>>> > > >> > > >>>>>>> more
>>> > > >> > > >>>>>>> > > >> extensible
>>> > > >> > > >>>>>>> > > >> > to
>>> > > >> > > >>>>>>> > > >> > >> just describe what the method actually
>>> does,
>>> > > >> instead
>>> > > >> > of
>>> > > >> > > >>>>>>> assuming
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For example, in
>>> this
>>> > case,
>>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
>>> > > >> > > >>>>>>> > > >> > >> would be more extensible because this can
>>> be
>>> > used
>>> > > >> for
>>> > > >> > > >>>>>>> any kind of
>>> > > >> > > >>>>>>> > > >> flow
>>> > > >> > > >>>>>>> > > >> > >> control, be it watermark alignment or
>>> simple
>>> > rate
>>> > > >> > > >>>>>>> limiting.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
>>> think
>>> > > >> they
>>> > > >> > > have
>>> > > >> > > >>>>>>> their
>>> > > >> > > >>>>>>> > own
>>> > > >> > > >>>>>>> > > >> > >> advantages.
>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces
>>> for
>>> > > >> extending
>>> > > >> > > >>>>>>> abilities
>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
>>> methods in
>>> > > >> this
>>> > > >> > > >>>>>>> case. When
>>> > > >> > > >>>>>>> > you
>>> > > >> > > >>>>>>> > > >> have
>>> > > >> > > >>>>>>> > > >> > a
>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
>>> help
>>> > to
>>> > > >> > > >>>>>>> organize them
>>> > > >> > > >>>>>>> > and
>>> > > >> > > >>>>>>> > > >> make
>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
>>> > > >> ability.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if
>>> this is
>>> > a
>>> > > >> > general
>>> > > >> > > >>>>>>> design
>>> > > >> > > >>>>>>> > > >> pattern
>>> > > >> > > >>>>>>> > > >> > in
>>> > > >> > > >>>>>>> > > >> > >> the other components in Flink. But it
>>> looks
>>> > like
>>> > > >> the
>>> > > >> > > >>>>>>> current API
>>> > > >> > > >>>>>>> > > >> > proposal
>>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> The current proposal is essentially
>>> > > >> "MySourceReader
>>> > > >> > > >>>>>>> implements
>>> > > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along
>>> with
>>> > > >> > > >>>>>>> "MySplitReader
>>> > > >> > > >>>>>>> > > >> implements
>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd
>>> > consider any
>>> > > >> > one
>>> > > >> > > >>>>>>> of the
>>> > > >> > > >>>>>>> > > >> following
>>> > > >> > > >>>>>>> > > >> > as
>>> > > >> > > >>>>>>> > > >> > >> symmetric.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements
>>> SourceReader,
>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>> > > >> > > >>>>>>> > > >> along
>>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements
>>> SplitReader,
>>> > > >> > > >>>>>>> WithSplitsAlignment",
>>> > > >> > > >>>>>>> > > or
>>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
>>> > AlignedSourceReader"
>>> > > >> > and
>>> > > >> > > >>>>>>> > > "MySplitReader
>>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements
>>> SourceReader"
>>> > and
>>> > > >> > > >>>>>>> "MySplitReader
>>> > > >> > > >>>>>>> > > >> > implements
>>> > > >> > > >>>>>>> > > >> > >> SplitReader".
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
>>> Dawid
>>> > > >> would
>>> > > >> > > be:
>>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements
>>> SourceReader"
>>> > and
>>> > > >> > > >>>>>>> "MySplitReader
>>> > > >> > > >>>>>>> > > >> > implements
>>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
>>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
>>> personally
>>> > > >> > speaking I
>>> > > >> > > >>>>>>> think it
>>> > > >> > > >>>>>>> > is
>>> > > >> > > >>>>>>> > > >> > simpler
>>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
>>> reader
>>> > with
>>> > > >> > > >>>>>>> default impl.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org>
>>> <
>>> > > >> > > >>>>>>> > dwysakowicz@apache.org
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > > >> <
>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>> dwysakowicz@apache.org
>>> > >,
>>> > > >> > thanks
>>> > > >> > > >>>>>>> for the
>>> > > >> > > >>>>>>> > > reply.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
>>> preference
>>> > > >> and I
>>> > > >> > > >>>>>>> agree most
>>> > > >> > > >>>>>>> > of
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
>>> fine
>>> > > >> > following
>>> > > >> > > >>>>>>> your
>>> > > >> > > >>>>>>> > > >> suggestion
>>> > > >> > > >>>>>>> > > >> > to
>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
>>> > > >> > > >>>>>>> > > >> but
>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in
>>> the
>>> > > >> Source
>>> > > >> > to
>>> > > >> > > >>>>>>> keep the
>>> > > >> > > >>>>>>> > > two
>>> > > >> > > >>>>>>> > > >> > >> methods together.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in
>>> > Source
>>> > > >> is
>>> > > >> > > >>>>>>> that this
>>> > > >> > > >>>>>>> > > allows
>>> > > >> > > >>>>>>> > > >> > some
>>> > > >> > > >>>>>>> > > >> > >> compile time check. For example, if a user
>>> > enabled
>>> > > >> > > >>>>>>> watermark
>>> > > >> > > >>>>>>> > > >> alignment
>>> > > >> > > >>>>>>> > > >> > >> while it is not supported by the Source,
>>> an
>>> > > >> exception
>>> > > >> > > >>>>>>> can be
>>> > > >> > > >>>>>>> > thrown
>>> > > >> > > >>>>>>> > > >> at
>>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general useful.
>>> That
>>> > > >> said,
>>> > > >> > I
>>> > > >> > > >>>>>>> agree that
>>> > > >> > > >>>>>>> > > API
>>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the
>>> two
>>> > > >> methods
>>> > > >> > > >>>>>>> together.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
>>> > > >> > > >>>>>>> imjark@gmail.com> <
>>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
>>> > > >> imjark@gmail.com>
>>> > > >> > > >>>>>>> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182.
>>> > Sorry I
>>> > > >> did
>>> > > >> > > >>>>>>> miss it.
>>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >> Jark
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
>>> > Wysakowicz <
>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
>>> dwysakowicz@apache.org>
>>> > <
>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Jark:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
>>> > watermarks
>>> > > >> > when
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > source
>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>>> > > >> alignment
>>> > > >> > > even
>>> > > >> > > >>>>>>> if Kafka
>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
>>> somehow. I
>>> > > >> agree
>>> > > >> > > >>>>>>> with Becket
>>> > > >> > > >>>>>>> > > >> > >> we may need a
>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
>>> users to
>>> > > >> > > configure
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > source
>>> > > >> > > >>>>>>> > > >> to
>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>>> > > >> > > maxDesiredWatermark?
>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
>>> maxDesiredWatermark
>>> > will
>>> > > >> > > >>>>>>> greatly affect
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> throughput
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
>>> users
>>> > > >> > > configure
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > > >> alignment
>>> > > >> > > >>>>>>> > > >> > >> offset?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1]
>>> which
>>> > has
>>> > > >> > been
>>> > > >> > > >>>>>>> already
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> implemented
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled
>>> with the
>>> > > >> > > watermark
>>> > > >> > > >>>>>>> strategy
>>> > > >> > > >>>>>>> > > >> which
>>> > > >> > > >>>>>>> > > >> > >> also configures drift and update interval.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface
>>> to
>>> > > >> support
>>> > > >> > > >>>>>>> align other
>>> > > >> > > >>>>>>> > > >> things,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>>> > > >> > > >>>>>>> > > >> instead
>>> > > >> > > >>>>>>> > > >> > of
>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Becket:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the other hand
>>> > putting
>>> > > >> > all
>>> > > >> > > >>>>>>> methods,
>>> > > >> > > >>>>>>> > > even
>>> > > >> > > >>>>>>> > > >> > with
>>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain
>>> > > >> features,
>>> > > >> > > >>>>>>> makes the
>>> > > >> > > >>>>>>> > > entry
>>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new source higher.
>>> > > >> Instead of
>>> > > >> > > >>>>>>> focusing on
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> basic
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> and required properties of the Source, the
>>> > person
>>> > > >> > > >>>>>>> implementing a
>>> > > >> > > >>>>>>> > > >> source
>>> > > >> > > >>>>>>> > > >> > >> must bother with and need to figure out
>>> what
>>> > all
>>> > > >> of
>>> > > >> > the
>>> > > >> > > >>>>>>> extra
>>> > > >> > > >>>>>>> > > >> features
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> are
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> about and how to deal with them. It makes
>>> it
>>> > also
>>> > > >> > > harder
>>> > > >> > > >>>>>>> to
>>> > > >> > > >>>>>>> > > organize
>>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
>>> preference
>>> > > >> and I
>>> > > >> > > >>>>>>> agree most
>>> > > >> > > >>>>>>> > of
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
>>> fine
>>> > > >> > following
>>> > > >> > > >>>>>>> your
>>> > > >> > > >>>>>>> > > >> suggestion
>>> > > >> > > >>>>>>> > > >> > to
>>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
>>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
>>> > > >> > > >>>>>>> > > >> but
>>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in
>>> the
>>> > > >> Source
>>> > > >> > to
>>> > > >> > > >>>>>>> keep the
>>> > > >> > > >>>>>>> > > two
>>> > > >> > > >>>>>>> > > >> > >> methods together.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate
>>> the
>>> > > >> > > >>>>>>> "alignSplits"
>>> > > >> > > >>>>>>> > methods
>>> > > >> > > >>>>>>> > > >> > differ
>>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader.
>>> The
>>> > > >> reason
>>> > > >> > > for
>>> > > >> > > >>>>>>> that is
>>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds,
>>> > whereas
>>> > > >> > > >>>>>>> SplitReader
>>> > > >> > > >>>>>>> > > needs
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
>>> > > >> discrepancy
>>> > > >> > > >>>>>>> acceptable
>>> > > >> > > >>>>>>> > for
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes significantly,
>>> > > >> especially
>>> > > >> > > as
>>> > > >> > > >>>>>>> they
>>> > > >> > > >>>>>>> > would
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> highly
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we would
>>> have to
>>> > > >> perform
>>> > > >> > > >>>>>>> additional
>>> > > >> > > >>>>>>> > > >> > lookups.
>>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary
>>> > interface.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> [1]
>>> > https://cwiki.apache.org/confluence/x/hQYBCw
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and
>>> Sebastian!
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I
>>> > missed
>>> > > >> > > >>>>>>> something).
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
>>> > watermarks
>>> > > >> > when
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > source
>>> > > >> > > >>>>>>> > > >> > >> implements the interface?
>>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>>> > > >> alignment
>>> > > >> > > even
>>> > > >> > > >>>>>>> if Kafka
>>> > > >> > > >>>>>>> > > >> > >> implements the interface,
>>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
>>> somehow. I
>>> > > >> agree
>>> > > >> > > >>>>>>> with Becket
>>> > > >> > > >>>>>>> > > >> > >> we may need a
>>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
>>> users to
>>> > > >> > > configure
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > source
>>> > > >> > > >>>>>>> > > >> to
>>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>>> > > >> > > maxDesiredWatermark?
>>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
>>> maxDesiredWatermark
>>> > will
>>> > > >> > > >>>>>>> greatly affect
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> throughput
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
>>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
>>> users
>>> > > >> > > configure
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > > >> alignment
>>> > > >> > > >>>>>>> > > >> > >> offset?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
>>> > things in
>>> > > >> > the
>>> > > >> > > >>>>>>> future?
>>> > > >> > > >>>>>>> > For
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> example,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
>>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits.
>>> This
>>> > > >> > feature
>>> > > >> > > >>>>>>> may also
>>> > > >> > > >>>>>>> > > need
>>> > > >> > > >>>>>>> > > >> an
>>> > > >> > > >>>>>>> > > >> > >> interface like this.
>>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface
>>> to
>>> > > >> support
>>> > > >> > > >>>>>>> align other
>>> > > >> > > >>>>>>> > > >> things,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>>> > > >> > > >>>>>>> > > >> instead
>>> > > >> > > >>>>>>> > > >> > of
>>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
>>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
>>> think
>>> > > >> they
>>> > > >> > > have
>>> > > >> > > >>>>>>> their
>>> > > >> > > >>>>>>> > own
>>> > > >> > > >>>>>>> > > >> > >> advantages.
>>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces
>>> for
>>> > > >> extending
>>> > > >> > > >>>>>>> abilities
>>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
>>> methods in
>>> > > >> this
>>> > > >> > > >>>>>>> case. When
>>> > > >> > > >>>>>>> > you
>>> > > >> > > >>>>>>> > > >> have
>>> > > >> > > >>>>>>> > > >> > a
>>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
>>> help
>>> > to
>>> > > >> > > >>>>>>> organize them
>>> > > >> > > >>>>>>> > and
>>> > > >> > > >>>>>>> > > >> make
>>> > > >> > > >>>>>>> > > >> > >> users clear which methods
>>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
>>> > > >> ability.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >> Jark
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>>> > > >> > > >>>>>>> becket.qin@gmail.com>
>>> > > >> > > >>>>>>> > <
>>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com>
>>> <
>>> > > >> > > >>>>>>> becket.qin@gmail.com>
>>> > > >> > > >>>>>>> > <
>>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies
>>> that I
>>> > > >> somehow
>>> > > >> > > >>>>>>> misread a
>>> > > >> > > >>>>>>> > > bunch
>>> > > >> > > >>>>>>> > > >> of
>>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default implementation, I
>>> was
>>> > > >> > thinking
>>> > > >> > > >>>>>>> of the
>>> > > >> > > >>>>>>> > > default
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> no-op
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little worried
>>> about
>>> > the
>>> > > >> > > >>>>>>> proliferation of
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> decorative
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most important
>>> thing
>>> > about
>>> > > >> > > >>>>>>> interfaces is
>>> > > >> > > >>>>>>> > > that
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> they
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this case, I
>>> prefer
>>> > > >> adding
>>> > > >> > > >>>>>>> new method
>>> > > >> > > >>>>>>> > to
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > >> existing interface for the following
>>> reasons:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of
>>> decorative
>>> > > >> > interfaces
>>> > > >> > > >>>>>>> is which
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> interface
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> they can decorate and which combinations
>>> of
>>> > > >> multiple
>>> > > >> > > >>>>>>> decorative
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> interfaces
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
>>> > > >> > withSplitsAlignment
>>> > > >> > > >>>>>>> interface
>>> > > >> > > >>>>>>> > > is
>>> > > >> > > >>>>>>> > > >> > only
>>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader which
>>> means it
>>> > > >> can't
>>> > > >> > > >>>>>>> decorate any
>>> > > >> > > >>>>>>> > > >> other
>>> > > >> > > >>>>>>> > > >> > >> interface. From an interface design
>>> > perspective, a
>>> > > >> > > >>>>>>> natural
>>> > > >> > > >>>>>>> > question
>>> > > >> > > >>>>>>> > > >> is
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> why
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
>>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
>>> > > >> > > >>>>>>> > > is
>>> > > >> > > >>>>>>> > > >> > also
>>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split reader
>>> > implementing
>>> > > >> > both
>>> > > >> > > >>>>>>> > SplitReader
>>> > > >> > > >>>>>>> > > >> and
>>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a
>>> > source
>>> > > >> > reader
>>> > > >> > > >>>>>>> > implementing
>>> > > >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment
>>> works.
>>> > So why
>>> > > >> > > isn't
>>> > > >> > > >>>>>>> there an
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> interface
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if
>>> > there
>>> > > >> is a
>>> > > >> > > new
>>> > > >> > > >>>>>>> feature
>>> > > >> > > >>>>>>> > > >> added
>>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data
>>> aware),
>>> > are
>>> > > >> we
>>> > > >> > > >>>>>>> going to
>>> > > >> > > >>>>>>> > create
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> another
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
>>> > > >> SortedSplitReader or
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think the
>>> > additional
>>> > > >> > > >>>>>>> decorative
>>> > > >> > > >>>>>>> > > interface
>>> > > >> > > >>>>>>> > > >> > like
>>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
>>> > > >> > > understanding
>>> > > >> > > >>>>>>> cost of
>>> > > >> > > >>>>>>> > > >> users
>>> > > >> > > >>>>>>> > > >> > >> because they have to know what decorative
>>> > > >> interfaces
>>> > > >> > > are
>>> > > >> > > >>>>>>> there,
>>> > > >> > > >>>>>>> > > which
>>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and which
>>> > > >> combinations of
>>> > > >> > > the
>>> > > >> > > >>>>>>> > > decorative
>>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which are not.
>>> > Ideally we
>>> > > >> > want
>>> > > >> > > >>>>>>> to avoid
>>> > > >> > > >>>>>>> > > >> that.
>>> > > >> > > >>>>>>> > > >> > To
>>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an
>>> > interface of
>>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> it
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an internal
>>> > > >> interface
>>> > > >> > > and
>>> > > >> > > >>>>>>> let
>>> > > >> > > >>>>>>> > > >> > SourceReader
>>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader
>>> > with a
>>> > > >> > > default
>>> > > >> > > >>>>>>> > > >> implementation
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> of
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic branching in
>>> the
>>> > > >> source
>>> > > >> > > >>>>>>> logic,
>>> > > >> > > >>>>>>> > > >> especially
>>> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast
>>> majority of
>>> > the
>>> > > >> > > >>>>>>> SourceReader
>>> > > >> > > >>>>>>> > > >> > >> implementations, if not all, would just
>>> extend
>>> > > >> from
>>> > > >> > the
>>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
>>> > > >> > > >>>>>>> > > >> > >> That means adding a new method to the
>>> > interface
>>> > > >> would
>>> > > >> > > >>>>>>> effectively
>>> > > >> > > >>>>>>> > > >> give
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> the
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> c) A related design principle that may be
>>> > worth
>>> > > >> > > >>>>>>> discussing is how
>>> > > >> > > >>>>>>> > > do
>>> > > >> > > >>>>>>> > > >> we
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> let
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell Flink what
>>> > > >> capability
>>> > > >> > > is
>>> > > >> > > >>>>>>> > supported
>>> > > >> > > >>>>>>> > > >> and
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> what
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the
>>> most
>>> > > >> intuitive
>>> > > >> > > >>>>>>> place to me
>>> > > >> > > >>>>>>> > > is
>>> > > >> > > >>>>>>> > > >> in
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> the
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is the
>>> entrance
>>> > of the
>>> > > >> > > >>>>>>> entire Source
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> connector
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> logic.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am
>>> wondering
>>> > if the
>>> > > >> > > >>>>>>> following
>>> > > >> > > >>>>>>> > > >> interface
>>> > > >> > > >>>>>>> > > >> > >> would be easier to understand by the
>>> users.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal
>>> > > >> interface,
>>> > > >> > > >>>>>>> let both
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> SourceReader
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a
>>> default
>>> > > >> no-op
>>> > > >> > > >>>>>>> > > implementation.
>>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
>>> > > >> supportSplitsAlignment()"
>>> > > >> > > to
>>> > > >> > > >>>>>>> the
>>> > > >> > > >>>>>>> > Source
>>> > > >> > > >>>>>>> > > >> > >> interface, with a default implementation
>>> > returning
>>> > > >> > > >>>>>>> false. Sources
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> have
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic can change
>>> > this to
>>> > > >> > > >>>>>>> return true,
>>> > > >> > > >>>>>>> > and
>>> > > >> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
>>> > > >> > SourceReader
>>> > > >> > > /
>>> > > >> > > >>>>>>> > > SplitReader
>>> > > >> > > >>>>>>> > > >> if
>>> > > >> > > >>>>>>> > > >> > >> needed.
>>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new optional
>>> feature is
>>> > > >> going
>>> > > >> > to
>>> > > >> > > >>>>>>> be added
>>> > > >> > > >>>>>>> > to
>>> > > >> > > >>>>>>> > > >> the
>>> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires the
>>> > awareness
>>> > > >> from
>>> > > >> > > >>>>>>> Flink, we
>>> > > >> > > >>>>>>> > can
>>> > > >> > > >>>>>>> > > >> add
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> more
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> What do you think?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid
>>> > Wysakowicz <
>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>> dwysakowicz@apache.org>
>>> > <
>>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
>>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
>>> > dwysakowicz@apache.org>
>>> > > >> <
>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
>>> `AlignedSplitReader`
>>> > > >> > > interface
>>> > > >> > > >>>>>>> (aka the
>>> > > >> > > >>>>>>> > > >> stop &
>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
>>> > Kafka and
>>> > > >> > > >>>>>>> Pulsar only,
>>> > > >> > > >>>>>>> > > >> correct?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though, those
>>> are
>>> > the
>>> > > >> only
>>> > > >> > > >>>>>>> sources
>>> > > >> > > >>>>>>> > which
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> consume
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits and thus
>>> > > >> alignment
>>> > > >> > > >>>>>>> applies.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Thomas:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment
>>> in
>>> > > >> > > >>>>>>> SourceReaderBase and
>>> > > >> > > >>>>>>> > > then
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> doing
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
>>> implement
>>> > > >> > > >>>>>>> AlignedSplitReader
>>> > > >> > > >>>>>>> > > >> could
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> be
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment
>>> can
>>> > > >> instead
>>> > > >> > be
>>> > > >> > > >>>>>>> added to
>>> > > >> > > >>>>>>> > the
>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
>>> > KafkaSourceReader) to
>>> > > >> > make
>>> > > >> > > >>>>>>> it
>>> > > >> > > >>>>>>> > explicit
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we
>>> > could
>>> > > >> > > >>>>>>> actually do
>>> > > >> > > >>>>>>> > that.
>>> > > >> > > >>>>>>> > > >> Given
>>> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
>>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
>>> > > >> > > >>>>>>> > > >> rather
>>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the corresponding
>>> > method
>>> > > >> of
>>> > > >> > > >>>>>>> > SplitFetcher),
>>> > > >> > > >>>>>>> > > >> we
>>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the actual source
>>> > > >> > > >>>>>>> implementations. This
>>> > > >> > > >>>>>>> > > >> solution
>>> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors of new
>>> > sources
>>> > > >> would
>>> > > >> > > >>>>>>> have to do
>>> > > >> > > >>>>>>> > > two
>>> > > >> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and
>>> > > >> implement
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> instead of just extending
>>> AlignedSplitReader.
>>> > I
>>> > > >> would
>>> > > >> > > be
>>> > > >> > > >>>>>>> fine
>>> > > >> > > >>>>>>> > with
>>> > > >> > > >>>>>>> > > >> such
>>> > > >> > > >>>>>>> > > >> > a
>>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Steven:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
>>> section, is
>>> > it
>>> > > >> > > >>>>>>> accurate? Let's
>>> > > >> > > >>>>>>> > > >> assume
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> one
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
>>> > one of
>>> > > >> the
>>> > > >> > > >>>>>>> partition
>>> > > >> > > >>>>>>> > is
>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
>>> > watermark
>>> > > >> > for
>>> > > >> > > >>>>>>> this
>>> > > >> > > >>>>>>> > source
>>> > > >> > > >>>>>>> > > >> task
>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
>>> data
>>> > from
>>> > > >> > other
>>> > > >> > > >>>>>>> two Kafka
>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold
>>> back
>>> > the
>>> > > >> > overall
>>> > > >> > > >>>>>>> > watermark.
>>> > > >> > > >>>>>>> > > I
>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
>>> late
>>> > data
>>> > > >> for
>>> > > >> > > >>>>>>> this slow
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> partition.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark.
>>> Watermark of
>>> > an
>>> > > >> > > >>>>>>> operator is the
>>> > > >> > > >>>>>>> > > >> minimum
>>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
>>> restart. Say
>>> > > >> split
>>> > > >> > > >>>>>>> alignment is
>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
>>> > failed and
>>> > > >> > > >>>>>>> restored from
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > >> > last
>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is
>>> not
>>> > > >> > > >>>>>>> checkpointed,
>>> > > >> > > >>>>>>> > > initially
>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
>>> > cycle
>>> > > >> of
>>> > > >> > > >>>>>>> watermark
>>> > > >> > > >>>>>>> > > >> > aggregation
>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
>>> > corner is
>>> > > >> a
>>> > > >> > > >>>>>>> problem. Just
>>> > > >> > > >>>>>>> > > >> want
>>> > > >> > > >>>>>>> > > >> > to
>>> > > >> > > >>>>>>> > > >> > >> understand it more.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> @Becket:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of
>>> a
>>> > > >> general
>>> > > >> > use
>>> > > >> > > >>>>>>> case, so
>>> > > >> > > >>>>>>> > > >> should
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> we
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
>>> SourceReader
>>> > > >> directly
>>> > > >> > > >>>>>>> instead of
>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
>>> > > >> WithSplitAssignment?
>>> > > >> > > We
>>> > > >> > > >>>>>>> can
>>> > > >> > > >>>>>>> > > provide
>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
>>> > > >> compatibility
>>> > > >> > > >>>>>>> won't be an
>>> > > >> > > >>>>>>> > > >> issue.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a default
>>> > > >> > implementation.
>>> > > >> > > >>>>>>> How would
>>> > > >> > > >>>>>>> > we
>>> > > >> > > >>>>>>> > > >> do
>>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it
>>> better
>>> > than
>>> > > >> > > having
>>> > > >> > > >>>>>>> an
>>> > > >> > > >>>>>>> > opt-in
>>> > > >> > > >>>>>>> > > >> > >> interface? The default implementation
>>> would
>>> > have
>>> > > >> to
>>> > > >> > be
>>> > > >> > > >>>>>>> added
>>> > > >> > > >>>>>>> > > >> exclusively
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> in
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the
>>> way
>>> > > >> notice
>>> > > >> > > >>>>>>> > > SourceReaderBase
>>> > > >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
>>> > > >> effectively
>>> > > >> > > all
>>> > > >> > > >>>>>>> > > >> implementations
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> do
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I
>>> > think
>>> > > >> it is
>>> > > >> > > >>>>>>> impossible
>>> > > >> > > >>>>>>> > to
>>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader interface
>>> directly
>>> > by
>>> > > >> end
>>> > > >> > > >>>>>>> users.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
>>> interface
>>> > > >> > probably
>>> > > >> > > >>>>>>> also
>>> > > >> > > >>>>>>> > needs
>>> > > >> > > >>>>>>> > > >> some
>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
>>> > > >> > granularity.
>>> > > >> > > >>>>>>> Can you
>>> > > >> > > >>>>>>> > add
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >> interface change into the public interface
>>> > > >> section as
>>> > > >> > > >>>>>>> well?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> It has been added from the beginning. See
>>> > > >> > > >>>>>>> *AlignedSplitReader.*
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
>>> > > >> > assignSplits
>>> > > >> > > >>>>>>> here,
>>> > > >> > > >>>>>>> > given
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> it
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
>>> > assignments? It
>>> > > >> > > seems
>>> > > >> > > >>>>>>> > something
>>> > > >> > > >>>>>>> > > >> like
>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>>> > > >> adjustSplitsThrottling() is
>>> > > >> > > >>>>>>> more
>>> > > >> > > >>>>>>> > > accurate.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not
>>> > assign. Do
>>> > > >> you
>>> > > >> > > >>>>>>> still
>>> > > >> > > >>>>>>> > prefer
>>> > > >> > > >>>>>>> > > a
>>> > > >> > > >>>>>>> > > >> > >> different name for that? Personally, I am
>>> > open for
>>> > > >> > > >>>>>>> suggestions
>>> > > >> > > >>>>>>> > > here.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> [1]
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>>
>>> > > >> > >
>>> > > >> >
>>> > > >>
>>> >
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion.
>>> I
>>> > think
>>> > > >> the
>>> > > >> > > >>>>>>> motivation
>>> > > >> > > >>>>>>> > > >> makes a
>>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions /
>>> > questions.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of
>>> a
>>> > > >> general
>>> > > >> > use
>>> > > >> > > >>>>>>> case, so
>>> > > >> > > >>>>>>> > > >> should
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> we
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
>>> SourceReader
>>> > > >> directly
>>> > > >> > > >>>>>>> instead of
>>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
>>> > > >> WithSplitAssignment?
>>> > > >> > > We
>>> > > >> > > >>>>>>> can
>>> > > >> > > >>>>>>> > > provide
>>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
>>> > > >> compatibility
>>> > > >> > > >>>>>>> won't be an
>>> > > >> > > >>>>>>> > > >> issue.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
>>> interface
>>> > > >> > probably
>>> > > >> > > >>>>>>> also
>>> > > >> > > >>>>>>> > needs
>>> > > >> > > >>>>>>> > > >> some
>>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
>>> > > >> > granularity.
>>> > > >> > > >>>>>>> Can you
>>> > > >> > > >>>>>>> > add
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >> interface change into the public interface
>>> > > >> section as
>>> > > >> > > >>>>>>> well?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
>>> > > >> > assignSplits
>>> > > >> > > >>>>>>> here,
>>> > > >> > > >>>>>>> > given
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> it
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
>>> > assignments? It
>>> > > >> > > seems
>>> > > >> > > >>>>>>> > something
>>> > > >> > > >>>>>>> > > >> like
>>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>>> > > >> adjustSplitsThrottling() is
>>> > > >> > > >>>>>>> more
>>> > > >> > > >>>>>>> > > accurate.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven
>>> Wu <
>>> > > >> > > >>>>>>> stevenz3wu@gmail.com
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> > > <
>>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com>
>>> <
>>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
>>> > > >> > > >>>>>>> > <
>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
>>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> However, a single source operator may read
>>> > data
>>> > > >> from
>>> > > >> > > >>>>>>> multiple
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
>>> > > >> partitions,
>>> > > >> > > such
>>> > > >> > > >>>>>>> that
>>> > > >> > > >>>>>>> > even
>>> > > >> > > >>>>>>> > > >> with
>>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source operator
>>> may
>>> > need
>>> > > >> to
>>> > > >> > > >>>>>>> buffer
>>> > > >> > > >>>>>>> > > excessive
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> amount
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data faster
>>> than
>>> > > >> another.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation
>>> section, is
>>> > it
>>> > > >> > > >>>>>>> accurate? Let's
>>> > > >> > > >>>>>>> > > >> assume
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> one
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
>>> > one of
>>> > > >> the
>>> > > >> > > >>>>>>> partition
>>> > > >> > > >>>>>>> > is
>>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
>>> > watermark
>>> > > >> > for
>>> > > >> > > >>>>>>> this
>>> > > >> > > >>>>>>> > source
>>> > > >> > > >>>>>>> > > >> task
>>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
>>> data
>>> > from
>>> > > >> > other
>>> > > >> > > >>>>>>> two Kafka
>>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold
>>> back
>>> > the
>>> > > >> > overall
>>> > > >> > > >>>>>>> > watermark.
>>> > > >> > > >>>>>>> > > I
>>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
>>> late
>>> > data
>>> > > >> for
>>> > > >> > > >>>>>>> this slow
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> partition.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I have another question about the
>>> restart. Say
>>> > > >> split
>>> > > >> > > >>>>>>> alignment is
>>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
>>> > failed and
>>> > > >> > > >>>>>>> restored from
>>> > > >> > > >>>>>>> > > the
>>> > > >> > > >>>>>>> > > >> > last
>>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is
>>> not
>>> > > >> > > >>>>>>> checkpointed,
>>> > > >> > > >>>>>>> > > initially
>>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
>>> > cycle
>>> > > >> of
>>> > > >> > > >>>>>>> watermark
>>> > > >> > > >>>>>>> > > >> > aggregation
>>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
>>> > corner is
>>> > > >> a
>>> > > >> > > >>>>>>> problem. Just
>>> > > >> > > >>>>>>> > > >> want
>>> > > >> > > >>>>>>> > > >> > to
>>> > > >> > > >>>>>>> > > >> > >> understand it more.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas
>>> Weise <
>>> > > >> > > >>>>>>> thw@apache.org> <
>>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
>>> thw@apache.org>
>>> > <
>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
>>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment
>>> in
>>> > > >> > > >>>>>>> SourceReaderBase and
>>> > > >> > > >>>>>>> > > then
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> doing
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
>>> implement
>>> > > >> > > >>>>>>> AlignedSplitReader
>>> > > >> > > >>>>>>> > > >> could
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> be
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment
>>> can
>>> > > >> instead
>>> > > >> > be
>>> > > >> > > >>>>>>> added to
>>> > > >> > > >>>>>>> > the
>>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
>>> > KafkaSourceReader) to
>>> > > >> > make
>>> > > >> > > >>>>>>> it
>>> > > >> > > >>>>>>> > explicit
>>> > > >> > > >>>>>>> > > >> that
>>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Thanks,
>>> > > >> > > >>>>>>> > > >> > >> Thomas
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin
>>> > Knauf <
>>> > > >> > > >>>>>>> > > knaufk@apache.org>
>>> > > >> > > >>>>>>> > > >> <
>>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
>>> > > >> > knaufk@apache.org
>>> > > >> > > >
>>> > > >> > > >>>>>>> <
>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
>>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
>>> `AlignedSplitReader`
>>> > > >> > > interface
>>> > > >> > > >>>>>>> (aka the
>>> > > >> > > >>>>>>> > > >> stop
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> &
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
>>> > Kafka and
>>> > > >> > > >>>>>>> Pulsar only,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> correct?
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to
>>> > > >> complete
>>> > > >> > the
>>> > > >> > > >>>>>>> watermark
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> aligned
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Cheers,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Konstantin
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid
>>> > Wysakowicz
>>> > > >> <
>>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>>> dwysakowicz@apache.org>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I
>>> > support it
>>> > > >> ;)
>>> > > >> > I
>>> > > >> > > >>>>>>> think we
>>> > > >> > > >>>>>>> > can
>>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there are
>>> no
>>> > > >> concerns
>>> > > >> > > so
>>> > > >> > > >>>>>>> far.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Dawid
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis
>>> wrote:
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion on FLIP
>>> 217
>>> > [1]
>>> > > >> for
>>> > > >> > > an
>>> > > >> > > >>>>>>> > extension
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> of
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment
>>> also
>>> > in
>>> > > >> > > >>>>>>> SplitReaders. To
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> do
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> so,
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and
>>> > resume
>>> > > >> > reading
>>> > > >> > > >>>>>>> from
>>> > > >> > > >>>>>>> > split
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> sources
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and
>>> > controlls
>>> > > >> > > >>>>>>> suspend and
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> resume.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> To
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> gather information about current
>>> watermarks
>>> > of the
>>> > > >> > > >>>>>>> SplitReaders,
>>> > > >> > > >>>>>>> > we
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> extend
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer
>>> and
>>> > report
>>> > > >> > > >>>>>>> watermarks to
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> the
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2],
>>> prototyped
>>> > by
>>> > > >> Arvid
>>> > > >> > > >>>>>>> Heise and
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> revised
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> and
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most
>>> of
>>> > the
>>> > > >> > work.)
>>> > > >> > > >>>>>>> and me.
>>> > > >> > > >>>>>>> > The
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> changes
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way that if
>>> > affected
>>> > > >> > > >>>>>>> components do
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> not
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> support split alignment the behavior is as
>>> > before.
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Best,
>>> > > >> > > >>>>>>> > > >> > >> Sebastian
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> [1]
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>>
>>> > > >> > >
>>> > > >> >
>>> > > >>
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> [2]
>>> > > >> > > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> --
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
>>> > > >> > > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> > >>
>>> > > >> > > >>>>>>> > > >> >
>>> > > >> > > >>>>>>> > > >>
>>> > > >> > > >>>>>>> > > >
>>> > > >> > > >>>>>>> > >
>>> > > >> > > >>>>>>> >
>>> > > >> > > >>>>>>>
>>> > > >> > > >>>>>>
>>> > > >> > >
>>> > > >> >
>>> > > >>
>>> > > >
>>> >
>>>
>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Thanks for the feedback, Thomas and Steve. And thanks Piotr for the patient
and detailed discussion.

Let's move forward with option 1 then.

Re: Thomas

However, from a user perspective, should the split level alignment be
> an opt-in feature, at least for a few releases? If yes, then we would
> require a knob similar to supportsPausingSplits(), which I understand
> won't be part of the revised FLIP. Such control may be beneficial:


> * Compare runtime behavior with split level alignment on/off
> * Allow use of sources that don't implement pausing splits yet


> The second point would, from the user's perspective, be necessary for
> backward compatibility? While the interface aspect and source
> compatibility has been discussed in great detail, I don't think it
> would be desirable if an application that already uses alignment fails
> after upgrading to the new Flink version, forcing users to lock step
> modify sources for the new non-optional split level alignment.


> So I think clarification of the compatibility aspect on the FLIP page
> would be necessary.


That is a good point. Currently FLIP-182 is marked as experimental. So
technically speaking it could change. That said, I agree that it would be
good to not break the existing sources.

My main concern of having a supportsPausingSplits() knob is that this might
be something requiring code change on future derepcation. I am thinking of
two potential solutions:

Solution 1:
In the SourceReaderBase, when pauseOrResumeSplits() is invoked, if the
source reader only has one split assigned, the source reader simply stops
polling but just returns NOTHING_AVAILABLE. If there are more than one
splits assigned, it throws an exception with a message such as "The
unpausable SplitReader CLASS_NAME only works with watermark alignment when
assigned a single split. There are more than one split assigned to the
SplitReader".
The reasoning behind this solution is that existing users should only use
the coarse watermark alignment when a source reader only reads from a
single split. Reading from more than one split might have unwanted side
effects, so we might as well throw an exception in this case.

Solution 2:
Having a configuration "enable.coarse-grained.watermark.alignment", the
default value is false. Once it is set to true, we will allow
coarse-grained watermark alignment if a SplitReader is pausable.
This solution allows users to keep the current FLIP-182 behavior, with the
risk of side effects.

Personally speaking, I feel solution 1 seems better because coarse-grained
watermark alignment could be frustrating to the users when more than one
split is assigned. So we might as well not support it at all. And also
there is nothing to deprecate in the future with this solution.

What do you think?

Thank,

Jiangjie (Becket) Qin


On Tue, Jun 21, 2022 at 8:48 PM Piotr Nowojski <pn...@apache.org> wrote:

> Hi,
>
> It looks like option 1 wins overall? So let's go with that.
>
> Best,
> Piotrek
>
> śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com> napisał(a):
>
>> Both option 1 (default impl in base interface) and option 2 (decorative
>> interface) are pretty common patterns. I would also be fine with either.
>> The important thing is that an exception is thrown if a source doesn't
>> support the alignment capability.
>>
>> The other point is that we can validate the source capability if alignment
>> is enabled in WatermarkStrategy. I believe either option can achieve this
>> goal too.
>> public interface WatermarkStrategy<T> {
>>     WatermarkStrategy<T> withWatermarkAlignment(String watermarkGroup,
>> Duration maxAllowedWatermarkDrift);
>> }
>>
>> If I have to pick one, I am slightly favoring option 1 (base interface).
>> As
>> watermark is already an essential concept of source, maybe watermark
>> alignment capability can also be a property of the base source/reader
>> interface.
>>
>> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org> wrote:
>>
>> > Hi everyone,
>> >
>> > Thank you for all the effort that went into this discussion. The split
>> > level watermark alignment will be an important feature for Flink that
>> > will address operational problems for various use cases. From reading
>> > through this thread it appears that not too much remains to bring this
>> > FLIP to acceptance and allow development to move forward. I would like
>> > to contribute if possible.
>> >
>> > Regarding option 1 vs. option 2: I don't have a strong preference,
>> > perhaps slightly leaning towards option 1.
>> >
>> > However, from a user perspective, should the split level alignment be
>> > an opt-in feature, at least for a few releases? If yes, then we would
>> > require a knob similar to supportsPausingSplits(), which I understand
>> > won't be part of the revised FLIP. Such control may be beneficial:
>> >
>> > * Compare runtime behavior with split level alignment on/off
>> > * Allow use of sources that don't implement pausing splits yet
>> >
>> > The second point would, from the user's perspective, be necessary for
>> > backward compatibility? While the interface aspect and source
>> > compatibility has been discussed in great detail, I don't think it
>> > would be desirable if an application that already uses alignment fails
>> > after upgrading to the new Flink version, forcing users to lock step
>> > modify sources for the new non-optional split level alignment.
>> >
>> > So I think clarification of the compatibility aspect on the FLIP page
>> > would be necessary.
>> >
>> > Thanks,
>> > Thomas
>> >
>> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <
>> piotr.nowojski@gmail.com>
>> > wrote:
>> > >
>> > > Hi Becket,
>> > >
>> > > Thanks for summing this up. Just one correction:
>> > >
>> > > > Piotr prefers option 2, his opinions are:
>> > > >   e) It is OK that the code itself in option 2 indicates the
>> developers
>> > > that a feature is optional. We will rely on the documentation to
>> correct
>> > > that and clarify that the feature is actually obligatory.
>> > >
>> > > I would say based on a) and b) that feature would be still optional.
>> So
>> > > both the implementation and the documentation would be saying that. We
>> > > could add a mention to the docs and release notes, that this feature
>> will
>> > > be obligatory in the next major release and plan such a release
>> > accordingly.
>> > >
>> > > Re the option 1., as you mentioned:
>> > > > As for option 1: For developers, the feature is still optional due
>> to
>> > the
>> > > default implementation in the interface, regardless of what the
>> default
>> > > implementation does, because the code compiles without overriding
>> these
>> > > methods
>> > >
>> > > Also importantly, the code will work in most cases.
>> > >
>> > > > Obligatory: Jobs may fail if these methods are not implemented
>> > properly.
>> > > e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in
>> > Java,
>> > > e.g. Iterator.remove() by default throws
>> "UnsupportedOperationException",
>> > > informing the implementation that things may go wrong if this method
>> is
>> > not
>> > > implemented.
>> > >
>> > > For me `Iterator#remove()` is an optional feature. Personally, I don't
>> > > remember if I have ever implemented it.
>> > >
>> > > Best,
>> > > Piotrek
>> > >
>> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com>
>> napisał(a):
>> > >
>> > > > I had an offline discussion with Piotr and here is the summary.
>> Please
>> > > > correct me if I miss something, Piotr.
>> > > >
>> > > > There are two things we would like to seek more opinions from the
>> > > > community, so we can make progress on this FLIP.
>> > > >
>> > > > 1. The General pattern to add obligatory features to existing
>> > interfaces.
>> > > >
>> > > >
>> >
>> ***********************************************************************************
>> > > > For interfaces exposed to the developers for implementation, they
>> are
>> > > > either intended to be *optional* or *obligatory. *While it is quite
>> > clear
>> > > > about how to convey that intention when creating the interfaces, it
>> is
>> > not
>> > > > as commonly agreed when we are adding new features to an existing
>> > > > interface. In general, Flink uses decorative interfaces when adding
>> > > > optional features to existing interfaces. Both Piotr and I agree
>> that
>> > looks
>> > > > good.
>> > > >
>> > > > Different opinions are mainly about how to add obligatory features
>> to
>> > the
>> > > > existing interfaces, probably due to different understandings of
>> > > > "obligatory".
>> > > >
>> > > > We have discussed about four options:
>> > > >
>> > > > *Option 1:*
>> > > >
>> > > >    - Just add a new method to the existing interface.
>> > > >    - For backwards compatibility, the method would have a default
>> > > >    implementation throwing "UnsupportedOperationException".
>> > > >    - In the next major version, remove the default implementation.
>> > > >    - For the developers, any method with a default implementation
>> > > >    throwing an "UnsupportedOperationException" should be taken as
>> > obligatory.
>> > > >
>> > > > *Option 2:*
>> > > >
>> > > >    - Always make the features optional by adding a decorative
>> > interface,
>> > > >    just like ordinary optional features.
>> > > >    - Inform the developers via documentation that this feature is
>> > > >    obligatory, although it looks like optional from the code.
>> > > >    - In case the developers did not implement the decorative
>> interface,
>> > > >    throw an exception
>> > > >    - In the next major version, move the methods in the decorative
>> > > >    interface to the base interface, and deprecate the decorative
>> > interface.
>> > > >
>> > > > *Option 3:*
>> > > >
>> > > >    - Always bump the major version when a new obligatory feature is
>> > > >    added, even if we may have to do it frequently.
>> > > >
>> > > > *Option 4:*
>> > > >
>> > > >    - Add a V2, V3... of the interface affected by the new obligatory
>> > > >    feature.
>> > > >    - In the next major versions, deprecate old versions of the
>> > interfaces.
>> > > >
>> > > > Both Piotr and me agreed that option 3 and option 4 have a big side
>> > effect
>> > > > and should be avoided. We have different preference between option 1
>> > and
>> > > > option 2.
>> > > >
>> > > > Personally I prefer option 1, the reasons are:
>> > > >   a) simple and intuitive. Java 8 introduced the default impl in
>> > > > interfaces exactly for interface evolving, and this is a common
>> > pattern in
>> > > > many projects.
>> > > >   b) prominent to the developers that the feature is expected to be
>> > > > implemented, because it explicitly throws an exception in the
>> default
>> > impl.
>> > > >   c) low maintenance overhead - the Flink framework can always
>> assume
>> > the
>> > > > method exists, so no special handling logic is needed.
>> > > >   d) communicate a clear semantic boundary between optional and
>> > obligatory
>> > > > features in the Flink to the developers.
>> > > >       - Optional: Jobs still run without exception if these methods
>> are
>> > > > not implemented. e.g. all the SupportsXXXPushDown interfaces.
>> > > >       - Obligatory: Jobs may fail if these methods are not
>> implemented
>> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a common
>> > pattern
>> > > > in Java, e.g. Iterator.remove() by default throws
>> > > > "UnsupportedOperationException", informing the implementation that
>> > things
>> > > > may go wrong if this method is not implemented.
>> > > >
>> > > > As for option 2, Although the API itself sounds clean, it misleads
>> > people
>> > > > to think of an obligatory feature to be optional - from the code the
>> > > > feature is optional, but the documents say it is obligatory. We
>> > probably
>> > > > should avoid such code-doc inconsistency, as people will be
>> confused.
>> > And I
>> > > > would actually be bewildered that sometimes not implementing an
>> > "optional"
>> > > > feature is fine, but sometimes it causes the jobs to fail.
>> > > >
>> > > > In response to the argument that the method with a default
>> > implementation
>> > > > is always optional, if that is true, it actually means all the
>> > interfaces
>> > > > should be immutable once they are created. If we want to add a
>> method
>> > to an
>> > > > existing interface, for backwards compatibility, we will have to
>> > provide a
>> > > > default implementation. And the fact it has a default implementation
>> > > > indicates the method is optional. If that method is optional, it
>> should
>> > > > reside in a separate decorative interface, otherwise it clogs that
>> > existing
>> > > > interface. Therefore, people should never add a method to an
>> existing
>> > > > interface. I find this conclusion a bit extreme.
>> > > >
>> > > > Piotr prefers option 2, his opinions are:
>> > > >     a) Obligatory methods are the methods that fail the code
>> > compilation
>> > > > if not implemented.
>> > > >     b) All obligatory methods should reside in the base interface,
>> > without
>> > > > a default implementation. And all the optional methods should be in
>> > > > decorative interfaces. This is a clean API.
>> > > >     c) due to b), there isn't a viable solution to add an obligatory
>> > > > method to an existing interface in a backwards compatible way.
>> Unless
>> > we
>> > > > are OK with breaking backwards compatibility, all the interfaces
>> > should be
>> > > > treated as immutable. As a compromise, we might as well just treat
>> all
>> > the
>> > > > features added later as optional features. This way we keep the API
>> > clean.
>> > > >     d) based on b) and c), option 2 has a clean API, while option 1
>> > does
>> > > > not.
>> > > >     e) It is OK that the code itself in option 2 indicates the
>> > developers
>> > > > that a feature is optional. We will rely on the documentation to
>> > correct
>> > > > that and clarify that the feature is actually obligatory.
>> > > >     f) Regarding the effectiveness of making people aware that the
>> > feature
>> > > > is obligatory, Option 1 and Option 2 are similar. For people that do
>> > not
>> > > > read the release note / documentation, they will mistake the feature
>> > to be
>> > > > optional anyways.
>> > > >
>> > > > As for option 1: For developers, the feature is still optional due
>> to
>> > the
>> > > > default implementation in the interface, regardless of what the
>> default
>> > > > implementation does, because the code compiles without overriding
>> these
>> > > > methods. Also, another problem of this option is that for users that
>> > do not
>> > > > know about the history of the interface, they may be confused by the
>> > > > default implementation throwing an exception.
>> > > >
>> > > >
>> > > > 2. For this particular FLIP, should it be optional or not?
>> > > >
>> > > >
>> >
>> ***********************************************************************************
>> > > > As mentioned in the previous email, I feel this FLIP should be
>> > obligatory,
>> > > > for the following reasons:
>> > > > 1. The Flink framework exposes the watermark alignment API to the
>> end
>> > > > users. From the end users' perspective, the feature should be
>> available
>> > > > regardless of the implementation details in the pluggables. This is
>> > true
>> > > > for any other methods exposed as the Flink API.
>> > > > 2. If a Source is not pausable, the end user should receive an
>> > exception
>> > > > when enable the watermark alignment, (both Piotr and me agree on
>> > this). In
>> > > > that case, it meets my criteria of obligatory feature because not
>> > > > implementing the feature causes a framework API to throw exception
>> and
>> > > > fails the job.
>> > > >
>> > > > On the other hand, Piotr does not have a strong opinion regarding
>> > whether
>> > > > this feature should be optional or not.
>> > > >
>> > > >
>> > > > Thanks for reading through this long email. So basically in order to
>> > make
>> > > > progress on this FLIP, we want to see what do people feel about the
>> > above
>> > > > two topics.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > >
>> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <
>> pnowojski@apache.org>
>> > > > wrote:
>> > > >
>> > > >> Hi Becket,
>> > > >>
>> > > >> I still sustain what I wrote before:
>> > > >> > I think I would still vote soft -1 on this option, but I wouldn't
>> > block
>> > > >> it in case I am out-voted.
>> > > >>
>> > > >> > I think it might be helpful to agree on the definition of
>> optional
>> > in
>> > > >> our
>> > > >> case.
>> > > >>
>> > > >> For me it doesn't matter whether a default method throwing an
>> > exception we
>> > > >> call optional or non-optional. As long as we keep it this way, the
>> > effect
>> > > >> is the same. It's effectively a method that a user doesn't have to
>> > > >> implement. If interface/system allows some methods to be not
>> > implemented,
>> > > >> some users will do just that, regardless if we call it and
>> document as
>> > > >> non-optional. And at the same time it's clogging the base
>> interface.
>> > > >>
>> > > >> By the way, just the need for a java-doc/documentation explaining
>> the
>> > > >> existence of some construct is a bad smell (code should be
>> > > >> self-documenting
>> > > >> and default method throwing an UnsupportedOperationException is
>> not).
>> > > >>
>> > > >> > Please note that so far we do not assume whether the feature is
>> in
>> > > >> > the original API or it is added later. A newly added feature can
>> > also be
>> > > >> > non-optional, although it might take some time for all the
>> pluggable
>> > > >> > developers to catch up, and they should still work if the new
>> > feature is
>> > > >> > not used until they catch up. In contrast, we may never expect an
>> > > >> optional
>> > > >> > feature to catch up, because leaving it unimplemented is also
>> > blessed.
>> > > >> >
>> > > >> > Let's take the checkpointing as an example. Imagine Flink did not
>> > > >> support
>> > > >> > checkpointing before release 1.16. And now we are trying to add
>> > > >> > checkpointing to Flink. So we exposed the checkpoint
>> configuration
>> > to
>> > > >> the
>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
>> > operators,
>> > > >> > connectors) developers that methods like "snapshotState()" is
>> > optional?
>> > > >> If
>> > > >> > we do that, the availability of checkpointing in Flink would be
>> > severely
>> > > >> > weakened. But apparently we should still allow the existing
>> > > >> implementations
>> > > >> > to work without checkpointing. It looks to me that adding the
>> > method to
>> > > >> the
>> > > >> > pluggable interfaces with a default implementation throwing
>> > > >> > "UnsupportedOperationException" would be the solution here.
>> Please
>> > note
>> > > >> > that in this case, having the default implementation does not
>> mean
>> > this
>> > > >> is
>> > > >> > optional. It is just the technique to support backwards
>> > compatibility in
>> > > >> > the feature evolution. The fact that this method is in the base
>> > > >> interface
>> > > >> > suggests it is not optional, so the developers SHOULD implement
>> it.
>> > > >>
>> > > >> I would soft vote -1 for having the default method throwing
>> > > >> UnsupportedOperationException as one of thing for this (FLIP-217)
>> > special
>> > > >> circumstances.
>> > > >>
>> > > >> At the moment, without thinking this over too much, I would vote
>> > harder -1
>> > > >> for having this as a general rule when adding new features. If we
>> > ever end
>> > > >> up with an API that is littered with default methods throwing
>> > > >> UnsupportedOperationException that are documented as "non
>> optional" it
>> > > >> would be IMO a big design failure. I would be ok-ish with that,
>> only
>> > if
>> > > >> that was a temporary thing and we had an aggressive plan to release
>> > more
>> > > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
>> > > >> compatibility, that would get rid of those default methods. Adding
>> > > >> checkpointing and methods like "snapshotState()" would IMO easily
>> > justify
>> > > >> a
>> > > >> new major Flink release. In that case we could add those methods
>> with
>> > > >> default implementation for some transition period, a one or two
>> minor
>> > > >> releases, followed by a clean up in a major release. However I
>> would
>> > still
>> > > >> argue that it would be cleaner/better to add a decorative interface
>> > like
>> > > >> `CheckpointedOperator` interface instead of adding those default
>> > methods
>> > > >> to
>> > > >> the base `Operator` interface.
>> > > >>
>> > > >> I think I can sum up our disagreement here is that I would like to
>> > keep
>> > > >> the
>> > > >> interfaces simpler, with only obligatory methods/features on one
>> side
>> > and
>> > > >> clearly optional features on the other. While you would like to
>> add an
>> > > >> extra third state in between those two?
>> > > >>
>> > > >> Best,
>> > > >> Piotrek
>> > > >>
>> > > >>
>> > > >>
>> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com>
>> > napisał(a):
>> > > >>
>> > > >> > Thanks for the clarification, Piotr and Sebastian.
>> > > >> >
>> > > >> > It looks like the key problem is still whether the
>> implementation of
>> > > >> > pausable splits in the Sources should be optional or not.
>> > > >> >
>> > > >> > I think it might be helpful to agree on the definition of
>> optional
>> > in
>> > > >> our
>> > > >> > case. To me:
>> > > >> > Optional = "You CAN leave the method unimplemented, and that is
>> > fine."
>> > > >> > Non-Optional = "You CAN leave the method unimplemented, but you
>> > SHOULD
>> > > >> NOT,
>> > > >> > because people assume this works."
>> > > >> >
>> > > >> > I think one sufficient condition of a Non-Optional feature is
>> that
>> > if
>> > > >> the
>> > > >> > feature is exposed through the framework API, Flink should expect
>> > the
>> > > >> > pluggables to support this feature by default. Otherwise the
>> > > >> availability
>> > > >> > of that feature becomes undefined.
>> > > >> >
>> > > >> > Please note that so far we do not assume whether the feature is
>> in
>> > > >> > the original API or it is added later. A newly added feature can
>> > also be
>> > > >> > non-optional, although it might take some time for all the
>> pluggable
>> > > >> > developers to catch up, and they should still work if the new
>> > feature is
>> > > >> > not used until they catch up. In contrast, we may never expect an
>> > > >> optional
>> > > >> > feature to catch up, because leaving it unimplemented is also
>> > blessed.
>> > > >> >
>> > > >> > Let's take the checkpointing as an example. Imagine Flink did not
>> > > >> support
>> > > >> > checkpointing before release 1.16. And now we are trying to add
>> > > >> > checkpointing to Flink. So we exposed the checkpoint
>> configuration
>> > to
>> > > >> the
>> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
>> > operators,
>> > > >> > connectors) developers that methods like "snapshotState()" is
>> > optional?
>> > > >> If
>> > > >> > we do that, the availability of checkpointing in Flink would be
>> > severely
>> > > >> > weakened. But apparently we should still allow the existing
>> > > >> implementations
>> > > >> > to work without checkpointing. It looks to me that adding the
>> > method to
>> > > >> the
>> > > >> > pluggable interfaces with a default implementation throwing
>> > > >> > "UnsupportedOperationException" would be the solution here.
>> Please
>> > note
>> > > >> > that in this case, having the default implementation does not
>> mean
>> > this
>> > > >> is
>> > > >> > optional. It is just the technique to support backwards
>> > compatibility in
>> > > >> > the feature evolution. The fact that this method is in the base
>> > > >> interface
>> > > >> > suggests it is not optional, so the developers SHOULD implement
>> it.
>> > > >> >
>> > > >> > When it comes to this FLIP, I think it meets the criteria of
>> > > >> non-optional
>> > > >> > features, so we should just use the evolution path of
>> non-optional
>> > > >> > features.
>> > > >> >
>> > > >> > Thanks,
>> > > >> >
>> > > >> > Jiangjie (Becket) Qin
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
>> > pnowojski@apache.org>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Hi,
>> > > >> > >
>> > > >> > > Actually previously I thought about having a decorative
>> interface
>> > and
>> > > >> > > whenever watermark alignment is enabled, checking that the
>> source
>> > > >> > > implements the decorative interface. If not, throwing an
>> > exception.
>> > > >> > >
>> > > >> > > The option with default methods in the source interfaces
>> throwing
>> > > >> > > `UnsupportedOperationException` I think still suffers from the
>> > same
>> > > >> > > problems I mentioned before. It's still an optional
>> implementation
>> > > >> and at
>> > > >> > > the same time it's clogging the base interface. I think I would
>> > still
>> > > >> > vote
>> > > >> > > soft -1 on this option, but I wouldn't block it in case I am
>> > > >> out-voted.
>> > > >> > >
>> > > >> > > Best,
>> > > >> > > Piotrek
>> > > >> > >
>> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
>> > sebastian@ververica.com>
>> > > >> > > napisał(a):
>> > > >> > >
>> > > >> > > > Hi Becket,
>> > > >> > > >
>> > > >> > > > Thanks a lot for your fast and detailed response. For me, it
>> > > >> converges
>> > > >> > > and
>> > > >> > > > dropping the supportsX method sounds very reasonable to me.
>> > (Side
>> > > >> note:
>> > > >> > > > With "pausable splits" enabled as "default" I think we
>> > > >> misunderstood.
>> > > >> > As
>> > > >> > > > you described now "default" I understand as that it should be
>> > the
>> > > >> new
>> > > >> > > > recommended way of implementation, and I think that is fully
>> > valid.
>> > > >> > > Before,
>> > > >> > > > I understood "default" here as the default implementation,
>> i.e.,
>> > > >> > throwing
>> > > >> > > > UnsupportedOperationException, which is the exact opposite.
>> :) )
>> > > >> > > >
>> > > >> > > > Nevertheless: As mentioned, an open question for me is if
>> > watermark
>> > > >> > > > alignment should enforce pausable splits. For clarification,
>> the
>> > > >> > current
>> > > >> > > > documentation [1] says:
>> > > >> > > >
>> > > >> > > > *Note:* As of 1.15, Flink supports aligning across tasks of
>> the
>> > same
>> > > >> > > >> source and/or different sources. It does not support
>> aligning
>> > > >> > > >> splits/partitions/shards in the same task.
>> > > >> > > >>
>> > > >> > > >> In a case where there are e.g. two Kafka partitions that
>> > produce
>> > > >> > > >> watermarks at different pace, that get assigned to the same
>> > task
>> > > >> > > watermark
>> > > >> > > >> might not behave as expected. Fortunately, worst case it
>> > should not
>> > > >> > > perform
>> > > >> > > >> worse than without alignment.
>> > > >> > > >>
>> > > >> > > >> Given the limitation above, we suggest applying watermark
>> > > >> alignment in
>> > > >> > > >> two situations:
>> > > >> > > >>
>> > > >> > > >>    1. You have two different sources (e.g. Kafka and File)
>> that
>> > > >> > produce
>> > > >> > > >>    watermarks at different speeds
>> > > >> > > >>    2. You run your source with parallelism equal to the
>> number
>> > of
>> > > >> > > >>    splits/shards/partitions, which results in every subtask
>> > being
>> > > >> > > assigned a
>> > > >> > > >>    single unit of work.
>> > > >> > > >>
>> > > >> > > >> I personally see no issue in implementing and I see no
>> reason
>> > > >> against
>> > > >> > > > implementing this dependency of watermark alignment and
>> pausable
>> > > >> > splits.
>> > > >> > > (I
>> > > >> > > > think this would even be a good path towards shaping
>> watermark
>> > > >> > alignment
>> > > >> > > in
>> > > >> > > > 1.16.) However, "I don't see" means that I would be happy to
>> > hear
>> > > >> > Dawid's
>> > > >> > > > and Piotrek's opinions as they implemented watermark
>> alignment
>> > > >> based on
>> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
>> > rationale/background
>> > > >> > info
>> > > >> > > > from their side.
>> > > >> > > >
>> > > >> > > > *@Piotrek* *@Dawid *What do you think?
>> > > >> > > >
>> > > >> > > > Regards,
>> > > >> > > > Sebastian
>> > > >> > > >
>> > > >> > > > [1]
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> >
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
>> > > >> > > > [2]
>> > > >> > > >
>> > > >> > >
>> > > >> >
>> > > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
>> > > >> > > >
>> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
>> > becket.qin@gmail.com>
>> > > >> > wrote:
>> > > >> > > >
>> > > >> > > >> +dev
>> > > >> > > >>
>> > > >> > > >> Hi Sebastian,
>> > > >> > > >>
>> > > >> > > >> Thank you for the summary. Please see the detailed replies
>> > inline.
>> > > >> As
>> > > >> > a
>> > > >> > > >> recap of my suggestions.
>> > > >> > > >>
>> > > >> > > >> 1. Pausable splits API.
>> > > >> > > >>   a) Add default implementations to methods
>> > "pauseOrResumeSplits"
>> > > >> in
>> > > >> > > both
>> > > >> > > >> SourceReader and SplitReader where both default
>> implementations
>> > > >> throw
>> > > >> > > >>  UnsupportedOperationException.
>> > > >> > > >>
>> > > >> > > >> 2. User story.
>> > > >> > > >>     a) We tell users to enable the watermark alignment as
>> they
>> > > >> like.
>> > > >> > > This
>> > > >> > > >> is exactly what the current Flink API is.
>> > > >> > > >>     b) We tell the source developers, please implement
>> pausable
>> > > >> > splits,
>> > > >> > > >> otherwise bad things may happen. Think of it like you are
>> > expected
>> > > >> to
>> > > >> > > >> implement SourceReader#snapshotState() properly, otherwise
>> > > >> exceptions
>> > > >> > > will
>> > > >> > > >> be thrown when users enable checkpointing.
>> > > >> > > >>
>> > > >> > > >> Thanks,
>> > > >> > > >>
>> > > >> > > >> Jiangjie (Becket) Qin
>> > > >> > > >>
>> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
>> > > >> > > >> sebastian@ververica.com> wrote:
>> > > >> > > >>
>> > > >> > > >>> Hi Becket, Hi everybody,
>> > > >> > > >>>
>> > > >> > > >>> I'm sorry if I misread the messages but I could not derive
>> an
>> > > >> > agreement
>> > > >> > > >>> from the mailing list. Nevertheless, if I understand you
>> > right the
>> > > >> > > >>> suggestion is:
>> > > >> > > >>>
>> > > >> > > >>> * Add default implementations to methods
>> > "pauseOrResumeSplits" in
>> > > >> > both
>> > > >> > > >>> SourceReader and SplitReader where both default
>> > implementations
>> > > >> throw
>> > > >> > > >>> UnsupportedOperationException.
>> > > >> > > >>>
>> > > >> > > >> Yes.
>> > > >> > > >>
>> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source interface.
>> > (In
>> > > >> the
>> > > >> > > >>> following, I refer to supporting this as "pausable
>> splits".)
>> > > >> > > >>>
>> > > >> > > >> We may no longer need this if pausable splits are expected
>> to
>> > be
>> > > >> > > >> implemented by the source developers, i.e. non-optional.
>> Having
>> > > >> this
>> > > >> > > method
>> > > >> > > >> would then be somewhat misleading as it looks like the
>> sources
>> > > >> that do
>> > > >> > > not
>> > > >> > > >> support pausable splits are also acceptable in the long
>> term.
>> > So
>> > > >> API
>> > > >> > > wise,
>> > > >> > > >> I'd say maybe we should remove this for this FLIP, although
>> I
>> > > >> believe
>> > > >> > > this
>> > > >> > > >> supportXXX pattern itself is still attractive for optional
>> > > >> features.
>> > > >> > > >>
>> > > >> > > >>
>> > > >> > > >>>
>> > > >> > > >>> To make the conclusions explicit:
>> > > >> > > >>>
>> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in both
>> > interfaces
>> > > >> > > >>> SourceReader and SplitReader are optional where the
>> default is
>> > > >> that
>> > > >> > it
>> > > >> > > >>> doesn't support it. (--> This means that the
>> implementation is
>> > > >> still
>> > > >> > > >>> optional for the source developer.)
>> > > >> > > >>>
>> > > >> > > >> It is optional for backwards compatibility with existing
>> > sources,
>> > > >> as
>> > > >> > > they
>> > > >> > > >> may still compile without code change. But starting from
>> this
>> > FLIP,
>> > > >> > > Flink
>> > > >> > > >> will always optimistically assume that all the sources
>> support
>> > > >> > pausable
>> > > >> > > >> splits. If a source does not support pausable splits, it
>> goes
>> > to an
>> > > >> > > error
>> > > >> > > >> handling path when watermark alignment is enabled on it.
>> This
>> > is
>> > > >> > > different
>> > > >> > > >> from a usual optional feature, where no error is expected.
>> > > >> > > >>
>> > > >> > > >>
>> > > >> > > >>> 2. If watermark alignment is enabled in the application
>> code
>> > by
>> > > >> > adding
>> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
>> > > >> SourceReader or
>> > > >> > > >>> SplitReader do not support pausableSplits, we throw an
>> > > >> > > >>> UnsupportedOperationException.
>> > > >> > > >>>
>> > > >> > > >> Yes.
>> > > >> > > >>
>> > > >> > > >>
>> > > >> > > >>> 3. With regard to your statement:
>> > > >> > > >>>
>> > > >> > > >>>> [...] basically means watermark alignment is an
>> non-optional
>> > > >> feature
>> > > >> > > to
>> > > >> > > >>>> the end users.
>> > > >> > > >>>
>> > > >> > > >>> You actually mean that "pausable splits" are non-optional
>> for
>> > the
>> > > >> app
>> > > >> > > >>> developer if watermark alignment is enabled. However,
>> > watermark
>> > > >> > > alignment
>> > > >> > > >>> is optional and can be enabled/disabled.
>> > > >> > > >>>
>> > > >> > > >> Yes, watermark alignment can be enabled/disabled in
>> individual
>> > > >> sources
>> > > >> > > in
>> > > >> > > >> Flink jobs, which basically means the code supporting
>> watermark
>> > > >> > > alignment
>> > > >> > > >> has to already be there. That again means the Source
>> > developers are
>> > > >> > also
>> > > >> > > >> expected to support pausable splits by default. So this way
>> we
>> > > >> > > essentially
>> > > >> > > >> tell the end users that you may enable / disable this
>> feature
>> > as
>> > > >> you
>> > > >> > > wish,
>> > > >> > > >> and tell the source developers that you SHOULD implement
>> this
>> > > >> because
>> > > >> > > the
>> > > >> > > >> end users may turn it on/off at will. And if the source does
>> > not
>> > > >> > support
>> > > >> > > >> pausable splits, that goes to an error handling path when
>> > watermark
>> > > >> > > >> alignment is enabled on it. So users know they have to
>> > explicitly
>> > > >> > > exclude
>> > > >> > > >> this source.
>> > > >> > > >>
>> > > >> > > >>
>> > > >> > > >>>
>> > > >> > > >>> So far it's totally clear to me and I hope this is what you
>> > mean.
>> > > >> I
>> > > >> > > also
>> > > >> > > >>> agree with both statements:
>> > > >> > > >>>
>> > > >> > > >>> So making that expectation aligned with the source
>> developers
>> > > >> seems
>> > > >> > > >>>> reasonable.
>> > > >> > > >>>>
>> > > >> > > >>>
>> > > >> > > >>> I think this is a simple and clean solution from both the
>> end
>> > user
>> > > >> > and
>> > > >> > > >>>> source developers' standpoint.
>> > > >> > > >>>>
>> > > >> > > >>>
>> > > >> > > >>> However, a last conclusion derives from 3. and is an open
>> > question
>> > > >> > for
>> > > >> > > >>> me:
>> > > >> > > >>>
>> > > >> > > >>> 4. The feature of "pausable splits" is now tightly bound to
>> > > >> watermark
>> > > >> > > >>> alignment, i.e., if sources do not support "pausable
>> splits"
>> > one
>> > > >> can
>> > > >> > > not
>> > > >> > > >>> enable watermark alignment for these sources. This
>> dependency
>> > is
>> > > >> not
>> > > >> > > the
>> > > >> > > >>> current status of watermark alignment implementation
>> because
>> > it
>> > > >> > is/was
>> > > >> > > >>> implemented without pausable splits. Do we want to
>> introduce
>> > this
>> > > >> > > >>> dependency? (This is an open question. I cannot judge
>> that.)
>> > > >> > > >>>
>> > > >> > > >> The watermark alignment basically relies on the pausable
>> > splits,
>> > > >> > right?
>> > > >> > > >> So personally I found it quite reasonable that if the source
>> > does
>> > > >> not
>> > > >> > > >> support pausable splits, end users cannot enable watermark
>> > > >> alignment
>> > > >> > on
>> > > >> > > it.
>> > > >> > > >>
>> > > >> > > >>
>> > > >> > > >>> If something is wrong, please correct me.
>> > > >> > > >>>
>> > > >> > > >>> Regards,
>> > > >> > > >>> Sebastian
>> > > >> > > >>>
>> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
>> > becket.qin@gmail.com>
>> > > >> > > wrote:
>> > > >> > > >>>
>> > > >> > > >>>> Hi Sebastian,
>> > > >> > > >>>>
>> > > >> > > >>>> Thanks for the reply and patient discussion. I agree this
>> is
>> > a
>> > > >> > tricky
>> > > >> > > >>>> decision.
>> > > >> > > >>>>
>> > > >> > > >>>>
>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
>> > which I
>> > > >> see
>> > > >> > as
>> > > >> > > >>>>> follows:
>> > > >> > > >>>>> (1) An interface with default NOOP implementation makes
>> the
>> > > >> > > >>>>> implementation optional. And in my opinion, a default
>> > > >> > implementation
>> > > >> > > is and
>> > > >> > > >>>>> will remain a way of making implementation optional
>> because
>> > > >> even in
>> > > >> > > future
>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
>> without
>> > > >> > support
>> > > >> > > for
>> > > >> > > >>>>> pausable splits.
>> > > >> > > >>>>> (2) It may not be too critical but I also find it
>> suboptimal
>> > > >> that
>> > > >> > > with
>> > > >> > > >>>>> a NOOP default implementation there is no way to check at
>> > > >> runtime
>> > > >> > if
>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing. (To
>> > do so,
>> > > >> > one
>> > > >> > > would
>> > > >> > > >>>>> need a supportsX method which makes it again more
>> > complicated.)\
>> > > >> > > >>>>
>> > > >> > > >>>>
>> > > >> > > >>>> Based on the last few messages in the mailing list.  Piotr
>> > and I
>> > > >> > > agreed
>> > > >> > > >>>> that the default implementation should just throw an
>> > > >> > > >>>> UnsupportedOperationException if the source is
>> unpausable. So
>> > > >> this
>> > > >> > > >>>> basically tells the Source developers that this feature is
>> > > >> expected
>> > > >> > > to be
>> > > >> > > >>>> supported. Because we cannot prevent end users from
>> putting
>> > an
>> > > >> > > unpausable
>> > > >> > > >>>> source into the watermark alignment group, that basically
>> > means
>> > > >> > > watermark
>> > > >> > > >>>> alignment is an non-optional feature to the end users. So
>> > making
>> > > >> > that
>> > > >> > > >>>> expectation aligned with the source developers seems
>> > reasonable.
>> > > >> > And
>> > > >> > > if a
>> > > >> > > >>>> source does not support this feature, the end users should
>> > > >> > explicitly
>> > > >> > > >>>> remove that source from the watermark alignment group.
>> > > >> > > >>>>
>> > > >> > > >>>> Personally speaking I think this is a simple and clean
>> > solution
>> > > >> from
>> > > >> > > >>>> both the end user and source developers' standpoint.
>> > > >> > > >>>>
>> > > >> > > >>>> Does this address your concerns?
>> > > >> > > >>>>
>> > > >> > > >>>> Thanks,
>> > > >> > > >>>>
>> > > >> > > >>>> Jiangjie (Becket) Qin
>> > > >> > > >>>>
>> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
>> > > >> > > >>>> sebastian@ververica.com> wrote:
>> > > >> > > >>>>
>> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
>> > > >> > > >>>>>
>> > > >> > > >>>>> we, Dawid and I, discussed the various
>> suggestions/options
>> > and
>> > > >> we
>> > > >> > > >>>>> would be okay either way because we find neither
>> solution is
>> > > >> > perfect
>> > > >> > > just
>> > > >> > > >>>>> because of the already present complexity.
>> > > >> > > >>>>>
>> > > >> > > >>>>> Option c) Adding methods to the interfaces of
>> SourceReader
>> > and
>> > > >> > > >>>>> SplitReader
>> > > >> > > >>>>> Option a) Adding decorative interfaces to be used by
>> > > >> SourceReader
>> > > >> > and
>> > > >> > > >>>>> SplitReader
>> > > >> > > >>>>>
>> > > >> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is
>> > based on
>> > > >> > > >>>>> Option c) which we find acceptable because the complexity
>> > added
>> > > >> is
>> > > >> > > only a
>> > > >> > > >>>>> single method.
>> > > >> > > >>>>>
>> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
>> > which I
>> > > >> see
>> > > >> > as
>> > > >> > > >>>>> follows:
>> > > >> > > >>>>> (1) An interface with default NOOP implementation makes
>> the
>> > > >> > > >>>>> implementation optional. And in my opinion, a default
>> > > >> > implementation
>> > > >> > > is and
>> > > >> > > >>>>> will remain a way of making implementation optional
>> because
>> > > >> even in
>> > > >> > > future
>> > > >> > > >>>>> a developer can decide to implement the "old flavor"
>> without
>> > > >> > support
>> > > >> > > for
>> > > >> > > >>>>> pausable splits.
>> > > >> > > >>>>> (2) It may not be too critical but I also find it
>> suboptimal
>> > > >> that
>> > > >> > > with
>> > > >> > > >>>>> a NOOP default implementation there is no way to check at
>> > > >> runtime
>> > > >> > if
>> > > >> > > >>>>> SourceReader or SplitReader actually support pausing. (To
>> > do so,
>> > > >> > one
>> > > >> > > would
>> > > >> > > >>>>> need a supportsX method which makes it again more
>> > complicated.)
>> > > >> > > >>>>>
>> > > >> > > >>>>> However, we haven't changed it because Option a) is also
>> not
>> > > >> > optimal
>> > > >> > > >>>>> or straight-forward:
>> > > >> > > >>>>> (1) We need to add two distinct yet similar decorative
>> > > >> interfaces
>> > > >> > > >>>>> since, as mentioned, the signatures of the methods are
>> > > >> different.
>> > > >> > For
>> > > >> > > >>>>> example, we would need decorative interfaces like
>> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
>> > > >> > `SourceReaderWithPausableSplits`.
>> > > >> > > >>>>> (2) As a consequence, we would need to somehow document
>> > > >> how/where
>> > > >> > to
>> > > >> > > >>>>> implement both interfaces and how this relates to each
>> > other.
>> > > >> This
>> > > >> > > we could
>> > > >> > > >>>>> solve by adding a note in the interface of SourceReader
>> and
>> > > >> > > SplitReader and
>> > > >> > > >>>>> reference to the decorative interfaces but it still
>> > increases
>> > > >> > > complexity
>> > > >> > > >>>>> too.
>> > > >> > > >>>>>
>> > > >> > > >>>>> In summary, we see both as acceptable and preferred over
>> > other
>> > > >> > > >>>>> options. The question is if we can find a solution or
>> > compromise
>> > > >> > > that is
>> > > >> > > >>>>> acceptable for everybody to reach consensus.
>> > > >> > > >>>>>
>> > > >> > > >>>>> Please let us know what you think because we would be
>> happy
>> > if
>> > > >> we
>> > > >> > can
>> > > >> > > >>>>> conclude the discussion to avoid dropping the initiative
>> on
>> > this
>> > > >> > > FLIP.
>> > > >> > > >>>>>
>> > > >> > > >>>>> Regards,
>> > > >> > > >>>>> Sebastian
>> > > >> > > >>>>>
>> > > >> > > >>>>> [1]
>> > > >> > > >>>>>
>> > > >> > >
>> > > >> >
>> > > >>
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
>> > > >> > > >>>>> (v. 12)
>> > > >> > > >>>>>
>> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
>> > > >> > pnowojski@apache.org
>> > > >> > > >
>> > > >> > > >>>>> wrote:
>> > > >> > > >>>>>
>> > > >> > > >>>>>> Hi Guowei,
>> > > >> > > >>>>>>
>> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
>> > > >> > > >>>>>>
>> > > >> > > >>>>>> > This is covered in the previous FLIP[1] which has been
>> > > >> already
>> > > >> > > >>>>>> implemented in 1.15. In short, it must be enabled with
>> the
>> > > >> > watermark
>> > > >> > > >>>>>> strategy which also configures drift and update interval
>> > > >> > > >>>>>>
>> > > >> > > >>>>>> So by default watermark alignment is disabled,
>> regardless
>> > if a
>> > > >> > > source
>> > > >> > > >>>>>> supports it or not.
>> > > >> > > >>>>>>
>> > > >> > > >>>>>> Best,
>> > > >> > > >>>>>> Piotrek
>> > > >> > > >>>>>>
>> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <
>> guowei.mgw@gmail.com>
>> > > >> > > napisał(a):
>> > > >> > > >>>>>>
>> > > >> > > >>>>>>> Hi,
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>> We know that in the case of Bounded input Flink
>> supports
>> > the
>> > > >> > Batch
>> > > >> > > >>>>>>> execution mode. Currently in Batch execution mode,
>> flink
>> > is
>> > > >> > > executed
>> > > >> > > >>>>>>> on a
>> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark
>> > alignment
>> > > >> > > might
>> > > >> > > >>>>>>> not
>> > > >> > > >>>>>>> gain much.
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>> So my question is: Is watermark alignment the default
>> > > >> > behavior(for
>> > > >> > > >>>>>>> implemented source only)? If so, have you considered
>> > > >> evaluating
>> > > >> > the
>> > > >> > > >>>>>>> impact
>> > > >> > > >>>>>>> of this behavior on the Batch execution mode? Or thinks
>> > it is
>> > > >> not
>> > > >> > > >>>>>>> necessary.
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>> Correct me if I miss something.
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>> Best,
>> > > >> > > >>>>>>> Guowei
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
>> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
>> > > >> > > >>>>>>> wrote:
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>> > Hi Becket and Dawid,
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > > I feel that no matter which option we choose this
>> can
>> > not
>> > > >> be
>> > > >> > > >>>>>>> solved
>> > > >> > > >>>>>>> > entirely in either of the options, because of the
>> point
>> > > >> above
>> > > >> > and
>> > > >> > > >>>>>>> because
>> > > >> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
>> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly
>> different
>> > (one
>> > > >> > > >>>>>>> identifies
>> > > >> > > >>>>>>> > splits with splitId the other one passes the splits
>> > > >> directly).
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > Yes, that's a good point in this case and for
>> features
>> > that
>> > > >> > need
>> > > >> > > >>>>>>> to be
>> > > >> > > >>>>>>> > implemented in more than one place.
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
>> split
>> > an
>> > > >> > > optional
>> > > >> > > >>>>>>> feature,
>> > > >> > > >>>>>>> > > other than that this was not included in the
>> original
>> > > >> > > interface?
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > An additional argument in favor of making it optional
>> > is to
>> > > >> > > >>>>>>> simplify source
>> > > >> > > >>>>>>> > implementation. But on its own I'm not sure if that
>> > would be
>> > > >> > > >>>>>>> enough to
>> > > >> > > >>>>>>> > justify making this feature optional. Maybe.
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to just
>> > let
>> > > >> end
>> > > >> > > >>>>>>> users and
>> > > >> > > >>>>>>> > Flink
>> > > >> > > >>>>>>> > > assume all the connectors will implement this
>> feature.
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > As I wrote above that would be an interesting choice
>> to
>> > make
>> > > >> > > (ease
>> > > >> > > >>>>>>> of
>> > > >> > > >>>>>>> > implementation for new users, vs system consistency).
>> > > >> > Regardless
>> > > >> > > >>>>>>> of that,
>> > > >> > > >>>>>>> > yes, for me the main argument is the API backward
>> > > >> > compatibility.
>> > > >> > > >>>>>>> But let's
>> > > >> > > >>>>>>> > clear a couple of points:
>> > > >> > > >>>>>>> > - The current proposal adding methods to the base
>> > interface
>> > > >> > with
>> > > >> > > >>>>>>> default
>> > > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
>> > > >> decorative
>> > > >> > > >>>>>>> version
>> > > >> > > >>>>>>> > would be.
>> > > >> > > >>>>>>> > - Decorative version could implement "throw
>> > > >> > > >>>>>>> UnsupportedOperationException"
>> > > >> > > >>>>>>> > if user enabled watermark alignment just as well and
>> I
>> > agree
>> > > >> > > >>>>>>> that's a
>> > > >> > > >>>>>>> > better option compared to logging a warning.
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > Best,
>> > > >> > > >>>>>>> > Piotrek
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
>> > becket.qin@gmail.com>
>> > > >> > > >>>>>>> napisał(a):
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>> > > Thanks for the reply and patient discussion, Piotr
>> and
>> > > >> Dawid.
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a
>> split
>> > an
>> > > >> > > optional
>> > > >> > > >>>>>>> feature,
>> > > >> > > >>>>>>> > > other than that this was not included in the
>> original
>> > > >> > > interface?
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > To be honest I am really worried about the
>> complexity
>> > of
>> > > >> the
>> > > >> > > >>>>>>> user story
>> > > >> > > >>>>>>> > > here. Optional features like this have a high
>> > overhead.
>> > > >> > Imagine
>> > > >> > > >>>>>>> this
>> > > >> > > >>>>>>> > > feature is optional, now a user enabled watermark
>> > > >> alignment
>> > > >> > and
>> > > >> > > >>>>>>> defined a
>> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that
>> > depends on
>> > > >> > > >>>>>>> whether the
>> > > >> > > >>>>>>> > > involved Source has implmemented this feature. If
>> the
>> > > >> Sources
>> > > >> > > >>>>>>> are well
>> > > >> > > >>>>>>> > > documented, good luck. Otherwise end users may
>> have to
>> > > >> look
>> > > >> > > into
>> > > >> > > >>>>>>> the code
>> > > >> > > >>>>>>> > > of the Source to see whether the feature is
>> supported.
>> > > >> Which
>> > > >> > is
>> > > >> > > >>>>>>> something
>> > > >> > > >>>>>>> > > they shouldn't have to do.
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to just
>> > let
>> > > >> end
>> > > >> > > >>>>>>> users and
>> > > >> > > >>>>>>> > Flink
>> > > >> > > >>>>>>> > > assume all the connectors will implement this
>> feature.
>> > > >> After
>> > > >> > > all
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > > watermark group is not optinoal to the end users.
>> If
>> > in
>> > > >> some
>> > > >> > > >>>>>>> rare cases,
>> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
>> > > >> > > >>>>>>> UnsupportedOperationException
>> > > >> > > >>>>>>> > will
>> > > >> > > >>>>>>> > > be thrown to tell users to explicitly remove this
>> > Source
>> > > >> from
>> > > >> > > the
>> > > >> > > >>>>>>> > watermark
>> > > >> > > >>>>>>> > > group. I don't think we should have a warning
>> message
>> > > >> here,
>> > > >> > as
>> > > >> > > >>>>>>> they tend
>> > > >> > > >>>>>>> > to
>> > > >> > > >>>>>>> > > be ignored in many cases. If we do this, we don't
>> even
>> > > >> need
>> > > >> > the
>> > > >> > > >>>>>>> > supportXXX
>> > > >> > > >>>>>>> > > method in the Source for this feature. In fact
>> this is
>> > > >> > exactly
>> > > >> > > >>>>>>> how many
>> > > >> > > >>>>>>> > > interfaces works today. For example,
>> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
>> > > >> > > >>>>>>> > not
>> > > >> > > >>>>>>> > > supported by Pravega source because it does not
>> > support
>> > > >> > partial
>> > > >> > > >>>>>>> failover.
>> > > >> > > >>>>>>> > > In that case, it simply throws an exception to
>> > trigger a
>> > > >> > global
>> > > >> > > >>>>>>> recovery.
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > The reason we add a default implementation in this
>> > case
>> > > >> would
>> > > >> > > >>>>>>> just for
>> > > >> > > >>>>>>> > the
>> > > >> > > >>>>>>> > > sake of backwards compatibility so the old source
>> can
>> > > >> still
>> > > >> > > >>>>>>> compile.
>> > > >> > > >>>>>>> > Sure,
>> > > >> > > >>>>>>> > > in short term, this feature might not be supported
>> by
>> > many
>> > > >> > > >>>>>>> existing
>> > > >> > > >>>>>>> > > sources. That is OK, and it is quite visible to the
>> > source
>> > > >> > > >>>>>>> developers
>> > > >> > > >>>>>>> > that
>> > > >> > > >>>>>>> > > they did not override the default impl which
>> throws an
>> > > >> > > >>>>>>> > > UnsupportedOperationException.
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > @Dawid,
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in the
>> Source
>> > > >> would
>> > > >> > be
>> > > >> > > >>>>>>> the single
>> > > >> > > >>>>>>> > > >> source of truth regarding how to implement this
>> > > >> feature.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > I also don't find it entirely true. Half of the
>> > classes
>> > > >> are
>> > > >> > > >>>>>>> theoretically
>> > > >> > > >>>>>>> > > > optional and are utility classes from the point
>> of
>> > view
>> > > >> how
>> > > >> > > the
>> > > >> > > >>>>>>> > > interfaces
>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need to
>> > use
>> > > >> any
>> > > >> > of
>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to
>> > list
>> > > >> > their
>> > > >> > > >>>>>>> methods in
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > > Source interface.
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is to guide
>> > users
>> > > >> to
>> > > >> > > >>>>>>> implement the
>> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way to
>> > > >> > implement a
>> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning that. Even
>> the
>> > > >> Java
>> > > >> > > >>>>>>> language
>> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
>> > implementations
>> > > >> [1]
>> > > >> > so
>> > > >> > > >>>>>>> people
>> > > >> > > >>>>>>> > can
>> > > >> > > >>>>>>> > > leverage them. But for this particular case, if we
>> > make
>> > > >> the
>> > > >> > > >>>>>>> feature
>> > > >> > > >>>>>>> > > non-optional, we don't even need the supportXXX()
>> > method
>> > > >> for
>> > > >> > > now.
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > Thanks,
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>> > > >> > > >>>>>>> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > wrote:
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > First of all, let me say I am happy with
>> whichever
>> > > >> option
>> > > >> > is
>> > > >> > > >>>>>>> agreed in
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > > discussion.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > I wanted to clarify a few points from the
>> discussion
>> > > >> > though:
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > @Becket:
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > The main argument for adding the methods to the
>> > > >> > SourceReader
>> > > >> > > >>>>>>> is that
>> > > >> > > >>>>>>> > > these
>> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
>> SourceReader
>> > > >> impl,
>> > > >> > > i.e.
>> > > >> > > >>>>>>> > starting
>> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are
>> > expected
>> > > >> to
>> > > >> > > >>>>>>> support this
>> > > >> > > >>>>>>> > > > method, although some old implementations may not
>> > have
>> > > >> > > >>>>>>> implemented this
>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
>> > features
>> > > >> > from
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > optional
>> > > >> > > >>>>>>> > > > features. While the public decorative interface
>> is a
>> > > >> > solution
>> > > >> > > >>>>>>> to the
>> > > >> > > >>>>>>> > > > optional features. We should not use it for the
>> > features
>> > > >> > that
>> > > >> > > >>>>>>> are
>> > > >> > > >>>>>>> > > > non-optional.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL.
>> > Even
>> > > >> > though
>> > > >> > > >>>>>>> > preferred, I
>> > > >> > > >>>>>>> > > > still think it can be simply optional.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the
>> > Source
>> > > >> would
>> > > >> > > be
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > single
>> > > >> > > >>>>>>> > > > source of truth regarding how to implement this
>> > feature.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > I also don't find it entirely true. Half of the
>> > classes
>> > > >> are
>> > > >> > > >>>>>>> > theoretically
>> > > >> > > >>>>>>> > > > optional and are utility classes from the point
>> of
>> > view
>> > > >> how
>> > > >> > > the
>> > > >> > > >>>>>>> > > interfaces
>> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need to
>> > use
>> > > >> any
>> > > >> > of
>> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to
>> > list
>> > > >> > their
>> > > >> > > >>>>>>> methods in
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > > Source interface.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > @Piotr
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > If we have all of the methods with default
>> > > >> implementation
>> > > >> > in
>> > > >> > > >>>>>>> the base
>> > > >> > > >>>>>>> > > > interface, the API doesn't give any clue to the
>> user
>> > > >> which
>> > > >> > > set
>> > > >> > > >>>>>>> of
>> > > >> > > >>>>>>> > methods
>> > > >> > > >>>>>>> > > > are required to be implemented at the same time.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > I feel that no matter which option we choose this
>> > can
>> > > >> not
>> > > >> > be
>> > > >> > > >>>>>>> solved
>> > > >> > > >>>>>>> > > > entirely in either of the options, because of the
>> > point
>> > > >> > above
>> > > >> > > >>>>>>> and
>> > > >> > > >>>>>>> > because
>> > > >> > > >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits
>> and
>> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly
>> > different
>> > > >> > (one
>> > > >> > > >>>>>>> identifies
>> > > >> > > >>>>>>> > > > splits with splitId the other one passes the
>> splits
>> > > >> > > directly).
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Best,
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Dawid
>> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Hi Piotr,
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Thanks for the comment.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Just to clarify, I am not against the decorative
>> > > >> > interfaces,
>> > > >> > > >>>>>>> but I do
>> > > >> > > >>>>>>> > > > think we should use them with caution. The main
>> > argument
>> > > >> > for
>> > > >> > > >>>>>>> adding the
>> > > >> > > >>>>>>> > > > methods to the SourceReader is that these methods
>> > are
>> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl,
>> i.e.
>> > > >> > starting
>> > > >> > > >>>>>>> from this
>> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to
>> > support
>> > > >> > this
>> > > >> > > >>>>>>> > > > method, although some old implementations may not
>> > have
>> > > >> > > >>>>>>> implemented this
>> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
>> > features
>> > > >> > from
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > optional
>> > > >> > > >>>>>>> > > > features. While the public decorative interface
>> is a
>> > > >> > solution
>> > > >> > > >>>>>>> to the
>> > > >> > > >>>>>>> > > > optional features. We should not use it for the
>> > features
>> > > >> > that
>> > > >> > > >>>>>>> are
>> > > >> > > >>>>>>> > > > non-optional.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > That said, this feature is optional for
>> > SplitReaders.
>> > > >> > > Arguably
>> > > >> > > >>>>>>> we can
>> > > >> > > >>>>>>> > > have
>> > > >> > > >>>>>>> > > > a decorative interface for that, but for
>> simplicity
>> > and
>> > > >> > > >>>>>>> symmetry of the
>> > > >> > > >>>>>>> > > > interface, personally I prefer just adding a new
>> > method.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned about the
>> > > >> decorative
>> > > >> > > >>>>>>> interfaces,
>> > > >> > > >>>>>>> > > > they would make sense if:
>> > > >> > > >>>>>>> > > > 1. The feature is optional.
>> > > >> > > >>>>>>> > > > 2. There is only one decorative interface
>> involved
>> > for a
>> > > >> > > >>>>>>> feature.
>> > > >> > > >>>>>>> > > > Otherwise the argument that all the methods are
>> > grouped
>> > > >> > > >>>>>>> together will
>> > > >> > > >>>>>>> > not
>> > > >> > > >>>>>>> > > > stand.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Compared with that, I think the current solution
>> > works
>> > > >> fine
>> > > >> > > in
>> > > >> > > >>>>>>> all
>> > > >> > > >>>>>>> > cases,
>> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and
>> > default
>> > > >> > > >>>>>>> methods /
>> > > >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > The advantages are:
>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > In the current approach, the Java doc of the
>> > > >> SupportXXX()
>> > > >> > > >>>>>>> method in the
>> > > >> > > >>>>>>> > > > Source would be the single source of truth
>> regarding
>> > > >> how to
>> > > >> > > >>>>>>> implement
>> > > >> > > >>>>>>> > > this
>> > > >> > > >>>>>>> > > > feature. It lists the method that has to be
>> > implemented
>> > > >> to
>> > > >> > > >>>>>>> support this
>> > > >> > > >>>>>>> > > > feature, regardless of how many classes /
>> > interfaces are
>> > > >> > > >>>>>>> involved.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > When implementing the base interface, users do
>> not
>> > need
>> > > >> to
>> > > >> > > >>>>>>> implement a
>> > > >> > > >>>>>>> > > > method with default implementation. If they are
>> > curious
>> > > >> > what
>> > > >> > > >>>>>>> the method
>> > > >> > > >>>>>>> > > is
>> > > >> > > >>>>>>> > > > for, the java doc of that method simply points
>> > users to
>> > > >> the
>> > > >> > > >>>>>>> > SupportXXX()
>> > > >> > > >>>>>>> > > > method in the Source. It seems not adding work to
>> > the
>> > > >> users
>> > > >> > > >>>>>>> compared
>> > > >> > > >>>>>>> > with
>> > > >> > > >>>>>>> > > > decorative interfaces, but gives much better
>> > > >> > discoverability.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > - all of the methods from a single feature are
>> > grouped
>> > > >> in a
>> > > >> > > >>>>>>> single
>> > > >> > > >>>>>>> > > >> decorator interface, together with their
>> dedicated
>> > java
>> > > >> > doc.
>> > > >> > > >>>>>>> It's also
>> > > >> > > >>>>>>> > > >> easier to google search for help using the
>> > decorator
>> > > >> name
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > - if an optional feature requires two methods to
>> be
>> > > >> > > >>>>>>> implemented at
>> > > >> > > >>>>>>> > once,
>> > > >> > > >>>>>>> > > >> decorator can guarantee that
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > These two points are not true when multiple
>> > components
>> > > >> and
>> > > >> > > >>>>>>> classes are
>> > > >> > > >>>>>>> > > > involved collaboratively to provide a feature. In
>> > our
>> > > >> case,
>> > > >> > > we
>> > > >> > > >>>>>>> have
>> > > >> > > >>>>>>> > both
>> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved. And there
>> > might
>> > > >> be
>> > > >> > > other
>> > > >> > > >>>>>>> > > interfaces
>> > > >> > > >>>>>>> > > > on the JM side involved for some future features.
>> > So the
>> > > >> > > >>>>>>> relevant
>> > > >> > > >>>>>>> > methods
>> > > >> > > >>>>>>> > > > can actually be scattered over the places. That
>> > said, we
>> > > >> > may
>> > > >> > > >>>>>>> still use
>> > > >> > > >>>>>>> > > > decorative interfaces for each component, if the
>> > > >> feature is
>> > > >> > > >>>>>>> optional,
>> > > >> > > >>>>>>> > > given
>> > > >> > > >>>>>>> > > > there is a single source of truth for the
>> feature.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Here I would strongly lean towards making life
>> > easier
>> > > >> for
>> > > >> > new
>> > > >> > > >>>>>>> users,
>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
>> > expense
>> > > >> > for
>> > > >> > > >>>>>>> the power
>> > > >> > > >>>>>>> > > >> users.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > I actually think the current approach is simpler,
>> > more
>> > > >> > > >>>>>>> extensible and
>> > > >> > > >>>>>>> > > more
>> > > >> > > >>>>>>> > > > general for all the users. Can you articulate a
>> bit
>> > > >> more on
>> > > >> > > >>>>>>> which part
>> > > >> > > >>>>>>> > > you
>> > > >> > > >>>>>>> > > > think makes users harder to understand?
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > There is another benefit of the decorative
>> > interfaces
>> > > >> which
>> > > >> > > is
>> > > >> > > >>>>>>> not
>> > > >> > > >>>>>>> > > > mentioned, but might be worth considering here.
>> > Usually
>> > > >> the
>> > > >> > > >>>>>>> decorative
>> > > >> > > >>>>>>> > > > interfaces give slightly better backwards
>> > compatibility
>> > > >> > than
>> > > >> > > >>>>>>> the new
>> > > >> > > >>>>>>> > > > default method in the interfaces. That is when
>> > users are
>> > > >> > > using
>> > > >> > > >>>>>>> a jar
>> > > >> > > >>>>>>> > that
>> > > >> > > >>>>>>> > > > was compiled with an older version of Flink which
>> > does
>> > > >> not
>> > > >> > > >>>>>>> have the
>> > > >> > > >>>>>>> > > default
>> > > >> > > >>>>>>> > > > method in the interfaces in question. A
>> decorative
>> > > >> > interface
>> > > >> > > >>>>>>> may still
>> > > >> > > >>>>>>> > > > provide backwards compatibility in that case,
>> while
>> > > >> default
>> > > >> > > >>>>>>> method impl
>> > > >> > > >>>>>>> > > > cannot.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > I think in Flink we in general do not guarantee
>> > custom
>> > > >> > > >>>>>>> components
>> > > >> > > >>>>>>> > > compiled
>> > > >> > > >>>>>>> > > > with an older version can run with a newer
>> version
>> > of
>> > > >> > Flink.
>> > > >> > > A
>> > > >> > > >>>>>>> > recompile
>> > > >> > > >>>>>>> > > > with a newer version would be required. That
>> said,
>> > if
>> > > >> we do
>> > > >> > > >>>>>>> care about
>> > > >> > > >>>>>>> > > > this, we can just change the "supportXXX()"
>> method
>> > in
>> > > >> the
>> > > >> > > >>>>>>> Source
>> > > >> > > >>>>>>> > > interface
>> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave the other
>> > parts
>> > > >> > > >>>>>>> unchanged.
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Thanks,
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>> > > >> > > >>>>>>> pnowojski@apache.org>
>> > > >> > > >>>>>>> > > > wrote:
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > >> Hi,
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for
>> > the
>> > > >> last
>> > > >> > > two
>> > > >> > > >>>>>>> weeks.
>> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be actually
>> > against
>> > > >> > adding
>> > > >> > > >>>>>>> those
>> > > >> > > >>>>>>> > > methods
>> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons mentioned
>> > above.
>> > > >> > > >>>>>>> Clogging the
>> > > >> > > >>>>>>> > > base
>> > > >> > > >>>>>>> > > >> interface for new users with tons of methods
>> that
>> > they
>> > > >> do
>> > > >> > > not
>> > > >> > > >>>>>>> need, do
>> > > >> > > >>>>>>> > > not
>> > > >> > > >>>>>>> > > >> understand and do not know what to do with them.
>> > > >> Moreover,
>> > > >> > > >>>>>>> such
>> > > >> > > >>>>>>> > > decorative
>> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a feature
>> > requires
>> > > >> two
>> > > >> > > or
>> > > >> > > >>>>>>> more
>> > > >> > > >>>>>>> > > methods
>> > > >> > > >>>>>>> > > >> to be implemented at the same time. If we have
>> all
>> > of
>> > > >> the
>> > > >> > > >>>>>>> methods with
>> > > >> > > >>>>>>> > > >> default implementation in the base interface,
>> the
>> > API
>> > > >> > > doesn't
>> > > >> > > >>>>>>> give any
>> > > >> > > >>>>>>> > > >> clue
>> > > >> > > >>>>>>> > > >> to the user which set of methods are required
>> to be
>> > > >> > > >>>>>>> implemented at the
>> > > >> > > >>>>>>> > > >> same
>> > > >> > > >>>>>>> > > >> time.
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
>> > > >> interfaces
>> > > >> > is
>> > > >> > > >>>>>>> which
>> > > >> > > >>>>>>> > > >> interface
>> > > >> > > >>>>>>> > > >> > they can decorate and which combinations of
>> > multiple
>> > > >> > > >>>>>>> decorative
>> > > >> > > >>>>>>> > > >> interfaces
>> > > >> > > >>>>>>> > > >> > are valid. (...)
>> > > >> > > >>>>>>> > > >> > In the future, if there is a new feature added
>> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware),
>> are
>> > we
>> > > >> > going
>> > > >> > > >>>>>>> to create
>> > > >> > > >>>>>>> > > >> another
>> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
>> > SortedSplitReader or
>> > > >> > > >>>>>>> > > >> PrePartitionedAware?
>> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the
>> additional
>> > > >> > decorative
>> > > >> > > >>>>>>> interface
>> > > >> > > >>>>>>> > > >> like
>> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
>> > > >> understanding
>> > > >> > > >>>>>>> cost of
>> > > >> > > >>>>>>> > users
>> > > >> > > >>>>>>> > > >> > because they have to know what decorative
>> > interfaces
>> > > >> are
>> > > >> > > >>>>>>> there,
>> > > >> > > >>>>>>> > which
>> > > >> > > >>>>>>> > > >> > interface they can decorate and which
>> > combinations of
>> > > >> > the
>> > > >> > > >>>>>>> decorative
>> > > >> > > >>>>>>> > > >> > interfaces are valid and which are not.
>> Ideally
>> > we
>> > > >> want
>> > > >> > to
>> > > >> > > >>>>>>> avoid
>> > > >> > > >>>>>>> > that.
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> I'm not sure if I understand how embedding
>> default
>> > > >> methods
>> > > >> > > in
>> > > >> > > >>>>>>> the base
>> > > >> > > >>>>>>> > > >> interface is solving the problem: what can be
>> > combined
>> > > >> or
>> > > >> > > >>>>>>> not? If
>> > > >> > > >>>>>>> > there
>> > > >> > > >>>>>>> > > >> are
>> > > >> > > >>>>>>> > > >> two conflicting features, having decorative
>> > interfaces
>> > > >> > that
>> > > >> > > >>>>>>> can not be
>> > > >> > > >>>>>>> > > >> mixed together actually makes much more sense
>> to me
>> > > >> rather
>> > > >> > > >>>>>>> than having
>> > > >> > > >>>>>>> > > >> them
>> > > >> > > >>>>>>> > > >> all in one base class. How would you allow
>> users to
>> > > >> > > implement
>> > > >> > > >>>>>>> only one
>> > > >> > > >>>>>>> > > of
>> > > >> > > >>>>>>> > > >> those two features?
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are
>> > drawbacks:
>> > > >> > > >>>>>>> > > >> - how can a user discover what decorators are
>> > there?
>> > > >> > > >>>>>>> > > >> - how can a user know where the decorator can be
>> > > >> applied
>> > > >> > to?
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> However those are drawbacks for more power
>> users,
>> > that
>> > > >> can
>> > > >> > > be
>> > > >> > > >>>>>>> > mitigated
>> > > >> > > >>>>>>> > > by
>> > > >> > > >>>>>>> > > >> the documentation. For example listing all of
>> the
>> > > >> > decorators
>> > > >> > > >>>>>>> with
>> > > >> > > >>>>>>> > > >> detailed explanation both in the docs and in the
>> > java
>> > > >> > docs.
>> > > >> > > >>>>>>> More
>> > > >> > > >>>>>>> > > >> experienced users will be able to deal with
>> those
>> > > >> issues
>> > > >> > > >>>>>>> easier, as
>> > > >> > > >>>>>>> > they
>> > > >> > > >>>>>>> > > >> will already have some basic understanding of
>> > Flink.
>> > > >> Also
>> > > >> > if
>> > > >> > > >>>>>>> user has
>> > > >> > > >>>>>>> > a
>> > > >> > > >>>>>>> > > >> problem that he wants to solve, he will google
>> > search a
>> > > >> > > >>>>>>> potential
>> > > >> > > >>>>>>> > > solution
>> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing that he
>> is
>> > very
>> > > >> > > likely
>> > > >> > > >>>>>>> to
>> > > >> > > >>>>>>> > > discover
>> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in the docs.
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> The advantages are:
>> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
>> > > >> > > >>>>>>> > > >> - all of the methods from a single feature are
>> > grouped
>> > > >> in
>> > > >> > a
>> > > >> > > >>>>>>> single
>> > > >> > > >>>>>>> > > >> decorator interface, together with their
>> dedicated
>> > java
>> > > >> > doc.
>> > > >> > > >>>>>>> It's also
>> > > >> > > >>>>>>> > > >> easier to google search for help using the
>> > decorator
>> > > >> name
>> > > >> > > >>>>>>> > > >> - if an optional feature requires two methods
>> to be
>> > > >> > > >>>>>>> implemented at
>> > > >> > > >>>>>>> > once,
>> > > >> > > >>>>>>> > > >> decorator can guarantee that
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> Here I would strongly lean towards making life
>> > easier
>> > > >> for
>> > > >> > > new
>> > > >> > > >>>>>>> users,
>> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
>> > expense
>> > > >> > for
>> > > >> > > >>>>>>> the power
>> > > >> > > >>>>>>> > > >> users.
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> Best,
>> > > >> > > >>>>>>> > > >> Piotrek
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
>> > > >> becket.qin@gmail.com
>> > > >> > >
>> > > >> > > >>>>>>> > napisał(a):
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > I think Sebastion has a good summary. This is
>> a
>> > > >> really
>> > > >> > > >>>>>>> helpful
>> > > >> > > >>>>>>> > > >> discussion.
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might
>> still
>> > be
>> > > >> > better
>> > > >> > > >>>>>>> to add the
>> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source rather than
>> > > >> > > SourceReader.
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > Generally speaking, what we are trying to do
>> > here is
>> > > >> to
>> > > >> > > let
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > Flink
>> > > >> > > >>>>>>> > > >> > framework know what the Source is capable of.
>> In
>> > this
>> > > >> > > FLIP,
>> > > >> > > >>>>>>> it
>> > > >> > > >>>>>>> > happens
>> > > >> > > >>>>>>> > > >> to
>> > > >> > > >>>>>>> > > >> > be the capability that only involves
>> > SourceReader.
>> > > >> But
>> > > >> > in
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > future,
>> > > >> > > >>>>>>> > > >> it is
>> > > >> > > >>>>>>> > > >> > possible that another functionality involves
>> > both the
>> > > >> > > >>>>>>> > SplitEnumerator
>> > > >> > > >>>>>>> > > >> and
>> > > >> > > >>>>>>> > > >> > SourceReader. In that case, following the
>> current
>> > > >> > > approach,
>> > > >> > > >>>>>>> we
>> > > >> > > >>>>>>> > should
>> > > >> > > >>>>>>> > > >> put
>> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
>> > SplitEnumerator
>> > > >> and
>> > > >> > > >>>>>>> SourceReader.
>> > > >> > > >>>>>>> > > >> > Because if we only put this in the
>> SourceReader,
>> > then
>> > > >> > the
>> > > >> > > >>>>>>> JM would
>> > > >> > > >>>>>>> > > have
>> > > >> > > >>>>>>> > > >> to
>> > > >> > > >>>>>>> > > >> > create a SourceReader in order to know whether
>> > this
>> > > >> > > feature
>> > > >> > > >>>>>>> is
>> > > >> > > >>>>>>> > > >> supported,
>> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we put the
>> > > >> > "supportsXXX()"
>> > > >> > > >>>>>>> method in
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > >> > Source, we will break the "symmetric" design
>> > because
>> > > >> > this
>> > > >> > > >>>>>>> FLIP
>> > > >> > > >>>>>>> > chose a
>> > > >> > > >>>>>>> > > >> > different way.
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > This is also why I think supportsXXX() method
>> > seems a
>> > > >> > good
>> > > >> > > >>>>>>> thing to
>> > > >> > > >>>>>>> > > >> have,
>> > > >> > > >>>>>>> > > >> > because when there are a few interfaces /
>> methods
>> > > >> that
>> > > >> > are
>> > > >> > > >>>>>>> expected
>> > > >> > > >>>>>>> > to
>> > > >> > > >>>>>>> > > >> be
>> > > >> > > >>>>>>> > > >> > implemented at the same time in order to
>> deliver
>> > a
>> > > >> > > feature,
>> > > >> > > >>>>>>> it is
>> > > >> > > >>>>>>> > > always
>> > > >> > > >>>>>>> > > >> > good to have a single source of truth to tell
>> the
>> > > >> > > framework
>> > > >> > > >>>>>>> what to
>> > > >> > > >>>>>>> > > do,
>> > > >> > > >>>>>>> > > >> so
>> > > >> > > >>>>>>> > > >> > the framework can do consistent things in
>> > different
>> > > >> > parts.
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
>> > > >> AlignedSourceReader
>> > > >> > +
>> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about
>> is
>> > > >> that we
>> > > >> > > are
>> > > >> > > >>>>>>> > > essentially
>> > > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
>> > > >> implementations to
>> > > >> > > >>>>>>> extend
>> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
>> > > >> > SourceReader
>> > > >> > > >>>>>>> interface
>> > > >> > > >>>>>>> > > >> > without the pausing support useless. So this
>> > > >> indicates
>> > > >> > > that
>> > > >> > > >>>>>>> public
>> > > >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for
>> the
>> > same
>> > > >> > > >>>>>>> purpose) only
>> > > >> > > >>>>>>> > > >> > make sense if the original interface is also
>> > > >> expected to
>> > > >> > > be
>> > > >> > > >>>>>>> used.
>> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add
>> the
>> > > >> method
>> > > >> > to
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > original
>> > > >> > > >>>>>>> > > >> > interface itself.
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > Cheers,
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid
>> Wysakowicz
>> > <
>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > >> > wrote:
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > I think most of your points aligned with the
>> > > >> > suggestions
>> > > >> > > >>>>>>> I made to
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I
>> > believe
>> > > >> we
>> > > >> > > hit
>> > > >> > > >>>>>>> enter
>> > > >> > > >>>>>>> > > >> nearly at
>> > > >> > > >>>>>>> > > >> > > the same time ;) )
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > 1. I changed also the
>> > > >> supportsWatermarksSplitAlignment
>> > > >> > > to
>> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the general
>> > > >> > capability
>> > > >> > > of
>> > > >> > > >>>>>>> > pausing.
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > 2. As for if we should
>> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
>> > > >> > > >>>>>>> > > (option
>> > > >> > > >>>>>>> > > >> b)
>> > > >> > > >>>>>>> > > >> > > or if we should just add the methods (option
>> > c), I
>> > > >> > > >>>>>>> suggest to
>> > > >> > > >>>>>>> > simply
>> > > >> > > >>>>>>> > > >> add
>> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is much
>> > preferred
>> > > >> > > approach
>> > > >> > > >>>>>>> Becket,
>> > > >> > > >>>>>>> > > >> which
>> > > >> > > >>>>>>> > > >> > > others do not object. Unless there is an
>> > opposition
>> > > >> > > let's
>> > > >> > > >>>>>>> go with
>> > > >> > > >>>>>>> > > this
>> > > >> > > >>>>>>> > > >> > > option c.
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Best,
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Dawid
>> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis
>> wrote:
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Hi folks,
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks
>> for
>> > all
>> > > >> the
>> > > >> > > >>>>>>> input and
>> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some
>> > time
>> > > >> to
>> > > >> > > >>>>>>> catch up and
>> > > >> > > >>>>>>> > > >> > structure
>> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted to
>> > find a
>> > > >> way
>> > > >> > > to
>> > > >> > > >>>>>>> > structure
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the feeling
>> > that
>> > > >> some
>> > > >> > > >>>>>>> concerns
>> > > >> > > >>>>>>> > got
>> > > >> > > >>>>>>> > > >> lost
>> > > >> > > >>>>>>> > > >> > in
>> > > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt and
>> please
>> > > >> correct
>> > > >> > > me
>> > > >> > > >>>>>>> if
>> > > >> > > >>>>>>> > > >> something is
>> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect
>> and
>> > > >> > assemble
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > > opinions,
>> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best
>> of my
>> > > >> > > >>>>>>> knowledge):
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment
>> (pause/resume
>> > > >> > behavior)
>> > > >> > > >>>>>>> be a
>> > > >> > > >>>>>>> > general
>> > > >> > > >>>>>>> > > >> > > capability?
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > I personally don't see any reason no to have
>> > it a
>> > > >> > > general
>> > > >> > > >>>>>>> > capability
>> > > >> > > >>>>>>> > > >> > > because for the alignSplit method it is
>> > actually
>> > > >> > > >>>>>>> independent of
>> > > >> > > >>>>>>> > the
>> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a
>> > general
>> > > >> > > >>>>>>> capability, we
>> > > >> > > >>>>>>> > > >> should
>> > > >> > > >>>>>>> > > >> > > also agree on the right wording. Does
>> > > >> > > >>>>>>> "alignSplits(splitsToResume,
>> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then
>> actually
>> > > >> meant?
>> > > >> > (I
>> > > >> > > >>>>>>> see it as
>> > > >> > > >>>>>>> > > >> okay.
>> > > >> > > >>>>>>> > > >> > I
>> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid
>> > suggested
>> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
>> > enable/disable
>> > > >> split
>> > > >> > > >>>>>>> alignment?
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > I would personally not disable the split
>> > alignment
>> > > >> on
>> > > >> > > the
>> > > >> > > >>>>>>> source
>> > > >> > > >>>>>>> > > >> reader
>> > > >> > > >>>>>>> > > >> > > side because if split alignment is used for
>> > some
>> > > >> other
>> > > >> > > >>>>>>> use case
>> > > >> > > >>>>>>> > (see
>> > > >> > > >>>>>>> > > >> A)
>> > > >> > > >>>>>>> > > >> > it
>> > > >> > > >>>>>>> > > >> > > could have nasty side effects on
>> other/future
>> > use
>> > > >> > cases.
>> > > >> > > >>>>>>> Instead,
>> > > >> > > >>>>>>> > I
>> > > >> > > >>>>>>> > > >> would
>> > > >> > > >>>>>>> > > >> > > disable "watermark split alignment" where I
>> > think
>> > > >> it
>> > > >> > > >>>>>>> should
>> > > >> > > >>>>>>> > disable
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for split
>> > alignment.
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > I find it difficult to define the scope of a
>> > > >> supportsX
>> > > >> > > >>>>>>> method
>> > > >> > > >>>>>>> > w.r.t.
>> > > >> > > >>>>>>> > > >> to
>> > > >> > > >>>>>>> > > >> > > the following questions: a) Where is it
>> used?
>> > and
>> > > >> b)
>> > > >> > > What
>> > > >> > > >>>>>>> is the
>> > > >> > > >>>>>>> > > >> expected
>> > > >> > > >>>>>>> > > >> > > output? To b), it's not straight-forward to
>> > > >> provide a
>> > > >> > > >>>>>>> meaningful
>> > > >> > > >>>>>>> > > >> output,
>> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split
>> alignment
>> > but
>> > > >> > > >>>>>>> SplitReader
>> > > >> > > >>>>>>> > not.
>> > > >> > > >>>>>>> > > >> This
>> > > >> > > >>>>>>> > > >> > is
>> > > >> > > >>>>>>> > > >> > > because with the current implementation, we
>> can
>> > > >> > > determine
>> > > >> > > >>>>>>> whether
>> > > >> > > >>>>>>> > > >> split
>> > > >> > > >>>>>>> > > >> > > alignment is fully supported only during
>> > runtime
>> > > >> and
>> > > >> > > >>>>>>> specifically
>> > > >> > > >>>>>>> > > >> > actually
>> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down the call
>> > > >> hierarchy
>> > > >> > up
>> > > >> > > >>>>>>> to the
>> > > >> > > >>>>>>> > > actual
>> > > >> > > >>>>>>> > > >> > > SplitReaders.
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either raise
>> an
>> > > >> error or
>> > > >> > > >>>>>>> warning if
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > >> > > alignment is called but not supported at
>> some
>> > > >> point. I
>> > > >> > > >>>>>>> know we
>> > > >> > > >>>>>>> > > should
>> > > >> > > >>>>>>> > > >> > > carefully think about when this could be the
>> > case
>> > > >> > > because
>> > > >> > > >>>>>>> we don't
>> > > >> > > >>>>>>> > > >> want
>> > > >> > > >>>>>>> > > >> > to
>> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings. However,
>> > warnings
>> > > >> > > could
>> > > >> > > >>>>>>> be an
>> > > >> > > >>>>>>> > > >> indicator
>> > > >> > > >>>>>>> > > >> > > for the user that for watermark split
>> > alignment use
>> > > >> > case
>> > > >> > > >>>>>>> split
>> > > >> > > >>>>>>> > > >> reading is
>> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to either
>> > disable
>> > > >> the
>> > > >> > > >>>>>>> trigger for
>> > > >> > > >>>>>>> > > >> > watermark
>> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
>> > use/implement a
>> > > >> > source
>> > > >> > > >>>>>>> and
>> > > >> > > >>>>>>> > reader
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Thanks for structuring the discussion with
>> the
>> > the
>> > > >> > > various
>> > > >> > > >>>>>>> > > >> possibilities
>> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I
>> would
>> > > >> like to
>> > > >> > > >>>>>>> summarize
>> > > >> > > >>>>>>> > the
>> > > >> > > >>>>>>> > > >> > > following requirements:
>> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
>> > ("symmetric"),
>> > > >> i.e.,
>> > > >> > > >>>>>>> similar
>> > > >> > > >>>>>>> > > >> semantics
>> > > >> > > >>>>>>> > > >> > > should have similar interfaces with similar
>> > usage.
>> > > >> > > >>>>>>> > > >> > > - Make explicit which implementations
>> implement
>> > > >> > > >>>>>>> interfaces/support
>> > > >> > > >>>>>>> > > >> > > behavior.
>> > > >> > > >>>>>>> > > >> > > - Make clear what are default
>> implementations
>> > and
>> > > >> how
>> > > >> > to
>> > > >> > > >>>>>>> implement
>> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > This is a simplified view of the relations
>> > between
>> > > >> > > >>>>>>> relevant
>> > > >> > > >>>>>>> > classes
>> > > >> > > >>>>>>> > > of
>> > > >> > > >>>>>>> > > >> > the
>> > > >> > > >>>>>>> > > >> > > PoC implementation:
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase
>> > > >> (Internal)
>> > > >> > > >>>>>>> <|-- ..
>> > > >> > > >>>>>>> > <|--
>> > > >> > > >>>>>>> > > >> > > MySourceReader
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager
>> > (Internal)
>> > > >> > <>--
>> > > >> > > >>>>>>> > SplitFetcher
>> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
>> > > >> > MySplitReader
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A
>> "has
>> > a"
>> > > >> B)
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
>> > SplitFetcherManager
>> > > >> > > >>>>>>> implement most
>> > > >> > > >>>>>>> > of
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just because we
>> > wanted
>> > > >> to
>> > > >> > > >>>>>>> implement
>> > > >> > > >>>>>>> > > split
>> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be available as
>> > kind of
>> > > >> a
>> > > >> > > >>>>>>> default. As a
>> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
>> > implementation" for
>> > > >> > > >>>>>>> SourceReader
>> > > >> > > >>>>>>> > > that
>> > > >> > > >>>>>>> > > >> > > implements the actual logic for split
>> > alignment.
>> > > >> For
>> > > >> > > that
>> > > >> > > >>>>>>> reason,
>> > > >> > > >>>>>>> > I
>> > > >> > > >>>>>>> > > >> find
>> > > >> > > >>>>>>> > > >> > it
>> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
>> > > >> implementation
>> > > >> > in
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > > interface
>> > > >> > > >>>>>>> > > >> for
>> > > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence,
>> interface
>> > > >> strategy
>> > > >> > > c)
>> > > >> > > >>>>>>> is
>> > > >> > > >>>>>>> > > difficult
>> > > >> > > >>>>>>> > > >> > > because this would require NOOP default
>> > > >> > implementations
>> > > >> > > >>>>>>> in the
>> > > >> > > >>>>>>> > > public
>> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader.
>> > This is
>> > > >> > the
>> > > >> > > >>>>>>> same for
>> > > >> > > >>>>>>> > > >> strategy
>> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP default
>> > > >> > implementation
>> > > >> > > >>>>>>> in the
>> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described
>> > method
>> > > >> > > >>>>>>> signatures of
>> > > >> > > >>>>>>> > > >> alignSplit
>> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ and
>> it
>> > > >> would
>> > > >> > be
>> > > >> > > >>>>>>> extremely
>> > > >> > > >>>>>>> > > >> > difficult
>> > > >> > > >>>>>>> > > >> > > to make the signatures the same (with even
>> > > >> potential
>> > > >> > > >>>>>>> performance
>> > > >> > > >>>>>>> > > >> impact
>> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of split
>> ids).
>> > > >> > Therefore,
>> > > >> > > >>>>>>> having a
>> > > >> > > >>>>>>> > > >> > symmetric
>> > > >> > > >>>>>>> > > >> > > decorative interface as of strategy a) is
>> > actually
>> > > >> not
>> > > >> > > >>>>>>> possible
>> > > >> > > >>>>>>> > and
>> > > >> > > >>>>>>> > > >> > having
>> > > >> > > >>>>>>> > > >> > > two decorative interfaces with different
>> method
>> > > >> > > >>>>>>> signatures is
>> > > >> > > >>>>>>> > > >> confusing.
>> > > >> > > >>>>>>> > > >> > My
>> > > >> > > >>>>>>> > > >> > > conclusion is that we are best with
>> strategy b)
>> > > >> which
>> > > >> > > >>>>>>> means to
>> > > >> > > >>>>>>> > have
>> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit
>> from
>> > the
>> > > >> > parent
>> > > >> > > >>>>>>> > interface:
>> > > >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader,
>> > SplitReader
>> > > >> > <|--
>> > > >> > > >>>>>>> > > >> > AlignedSplitReader
>> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what the
>> > > >> > > implications
>> > > >> > > >>>>>>> are and
>> > > >> > > >>>>>>> > if
>> > > >> > > >>>>>>> > > >> this
>> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid
>> > and I
>> > > >> just
>> > > >> > > >>>>>>> try to
>> > > >> > > >>>>>>> > > >> implement
>> > > >> > > >>>>>>> > > >> > and
>> > > >> > > >>>>>>> > > >> > > see if we like it. :)
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > # Summary
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know your
>> > > >> perspectives.
>> > > >> > > >>>>>>> Please
>> > > >> > > >>>>>>> > correct
>> > > >> > > >>>>>>> > > >> me,
>> > > >> > > >>>>>>> > > >> > if
>> > > >> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
>> > > >> something. My
>> > > >> > > >>>>>>> perspective
>> > > >> > > >>>>>>> > > >> would
>> > > >> > > >>>>>>> > > >> > be:
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Top A: Yes
>> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger
>> for
>> > split
>> > > >> > > >>>>>>> alignment)
>> > > >> > > >>>>>>> > > >> > > Top C: No
>> > > >> > > >>>>>>> > > >> > > Top D: b)
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > Best,
>> > > >> > > >>>>>>> > > >> > > Sebastian
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid
>> > Wysakowicz <
>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > > wrote:
>> > > >> > > >>>>>>> > > >> > >
>> > > >> > > >>>>>>> > > >> > >> @Arvid:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
>> > approach, I
>> > > >> > fear
>> > > >> > > >>>>>>> that it
>> > > >> > > >>>>>>> > > >> doesn't
>> > > >> > > >>>>>>> > > >> > work
>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
>> > always
>> > > >> be
>> > > >> > > >>>>>>> aligned
>> > > >> > > >>>>>>> > > >> cross-task
>> > > >> > > >>>>>>> > > >> > and
>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So
>> > it's
>> > > >> > > >>>>>>> plausible to put
>> > > >> > > >>>>>>> > > >> > sources
>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they do
>> > not
>> > > >> use
>> > > >> > any
>> > > >> > > >>>>>>> of the
>> > > >> > > >>>>>>> > > >> presented
>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>> > > >> warning, if
>> > > >> > > >>>>>>> they handle
>> > > >> > > >>>>>>> > > >> > multiple
>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be
>> for
>> > > >> telling
>> > > >> > > if
>> > > >> > > >>>>>>> it
>> > > >> > > >>>>>>> > > supports
>> > > >> > > >>>>>>> > > >> > that
>> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment
>> > would
>> > > >> > always
>> > > >> > > >>>>>>> be
>> > > >> > > >>>>>>> > > supported.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I believe to
>> be
>> > > >> closest
>> > > >> > > to a
>> > > >> > > >>>>>>> > consensus
>> > > >> > > >>>>>>> > > >> > >> between all participants. Do you mind
>> taking a
>> > > >> look?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Dawid
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully
>> agree
>> > > >> that
>> > > >> > > >>>>>>> decorator
>> > > >> > > >>>>>>> > > >> interfaces
>> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big
>> fan
>> > of
>> > > >> > > >>>>>>> overloading the
>> > > >> > > >>>>>>> > > base
>> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as
>> > is). The
>> > > >> > > usual
>> > > >> > > >>>>>>> feedback
>> > > >> > > >>>>>>> > > to
>> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are always that
>> > they are
>> > > >> > > >>>>>>> overwhelming
>> > > >> > > >>>>>>> > and
>> > > >> > > >>>>>>> > > >> too
>> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not
>> > oppose
>> > > >> 1c)
>> > > >> > as
>> > > >> > > >>>>>>> scattered
>> > > >> > > >>>>>>> > > >> > interfaces
>> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a)
>> and
>> > 1d).
>> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
>> > approach, I
>> > > >> > fear
>> > > >> > > >>>>>>> that it
>> > > >> > > >>>>>>> > > >> doesn't
>> > > >> > > >>>>>>> > > >> > work
>> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
>> > always
>> > > >> be
>> > > >> > > >>>>>>> aligned
>> > > >> > > >>>>>>> > > >> cross-task
>> > > >> > > >>>>>>> > > >> > and
>> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So
>> > it's
>> > > >> > > >>>>>>> plausible to put
>> > > >> > > >>>>>>> > > >> > sources
>> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they do
>> > not
>> > > >> use
>> > > >> > any
>> > > >> > > >>>>>>> of the
>> > > >> > > >>>>>>> > > >> presented
>> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>> > > >> warning, if
>> > > >> > > >>>>>>> they handle
>> > > >> > > >>>>>>> > > >> > multiple
>> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate
>> > future
>> > > >> use
>> > > >> > > >>>>>>> cases makes
>> > > >> > > >>>>>>> > > >> sense
>> > > >> > > >>>>>>> > > >> > but
>> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen)
>> should
>> > be
>> > > >> > > adjusted
>> > > >> > > >>>>>>> > > >> accordingly.
>> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
>> > PausingSourceReader
>> > > >> and
>> > > >> > > I'd
>> > > >> > > >>>>>>> go for
>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal
>> > afaik). We
>> > > >> > could
>> > > >> > > >>>>>>> also
>> > > >> > > >>>>>>> > split
>> > > >> > > >>>>>>> > > it
>> > > >> > > >>>>>>> > > >> > into
>> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
>> > > >> pauseOrResumeSplits
>> > > >> > > >>>>>>> may allow
>> > > >> > > >>>>>>> > > >> Sources
>> > > >> > > >>>>>>> > > >> > to
>> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as
>> > written
>> > > >> in
>> > > >> > > the
>> > > >> > > >>>>>>> > Javadoc),
>> > > >> > > >>>>>>> > > >> both
>> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not
>> > sure if
>> > > >> > there
>> > > >> > > >>>>>>> is a
>> > > >> > > >>>>>>> > system
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >> can.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
>> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
>> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the
>> capability
>> > > >> > section:
>> > > >> > > >>>>>>> Sources
>> > > >> > > >>>>>>> > that
>> > > >> > > >>>>>>> > > >> > adopt
>> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in earlier
>> > > >> versions. So
>> > > >> > > it
>> > > >> > > >>>>>>> feels
>> > > >> > > >>>>>>> > like
>> > > >> > > >>>>>>> > > >> we
>> > > >> > > >>>>>>> > > >> > are
>> > > >> > > >>>>>>> > > >> > >> only forward compatible (old sources can be
>> > used
>> > > >> > after
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > change);
>> > > >> > > >>>>>>> > > >> but
>> > > >> > > >>>>>>> > > >> > I
>> > > >> > > >>>>>>> > > >> > >> guess this holds for any API addition.
>> > > >> > > >>>>>>> > > >> > >> - You might want to add what happens when
>> all
>> > > >> splits
>> > > >> > > are
>> > > >> > > >>>>>>> paused.
>> > > >> > > >>>>>>> > > >> > >> - You may want to describe how the 3
>> flavors
>> > of
>> > > >> > > >>>>>>> SourceReaderBase
>> > > >> > > >>>>>>> > > >> > interact
>> > > >> > > >>>>>>> > > >> > >> with the interface.
>> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to include
>> > Kafka
>> > > >> and
>> > > >> > > >>>>>>> Pulsar in
>> > > >> > > >>>>>>> > the
>> > > >> > > >>>>>>> > > >> > FLIP.
>> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate follow-up
>> > work.
>> > > >> > (could
>> > > >> > > >>>>>>> be in the
>> > > >> > > >>>>>>> > > >> same
>> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Arvid
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid
>> > Wysakowicz
>> > > >> <
>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > >> > >> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>> > > >> > > >>>>>>> WithSplitsAlignment",
>> > > >> > > >>>>>>> > > >> along
>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>> > > >> > > >>>>>>> WithSplitsAlignment",
>> > > >> > > >>>>>>> > > or
>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
>> > AlignedSourceReader"
>> > > >> > and
>> > > >> > > >>>>>>> > > "MySplitReader
>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader"
>> > and
>> > > >> > > >>>>>>> "MySplitReader
>> > > >> > > >>>>>>> > > >> > implements
>> > > >> > > >>>>>>> > > >> > >> SplitReader".
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
>> Dawid
>> > > >> would
>> > > >> > > be:
>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader"
>> > and
>> > > >> > > >>>>>>> "MySplitReader
>> > > >> > > >>>>>>> > > >> > implements
>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
>> personally
>> > > >> > speaking I
>> > > >> > > >>>>>>> think it
>> > > >> > > >>>>>>> > is
>> > > >> > > >>>>>>> > > >> > simpler
>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
>> reader
>> > with
>> > > >> > > >>>>>>> default impl.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to have it
>> > aligned as
>> > > >> > much
>> > > >> > > as
>> > > >> > > >>>>>>> > possible.
>> > > >> > > >>>>>>> > > >> I'd
>> > > >> > > >>>>>>> > > >> > be
>> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
>> > > >> > AlignedSplitReader
>> > > >> > > >>>>>>> with
>> > > >> > > >>>>>>> > > >> > SplitReader. We
>> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Dawid
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
>> > things in
>> > > >> > the
>> > > >> > > >>>>>>> future?
>> > > >> > > >>>>>>> > For
>> > > >> > > >>>>>>> > > >> > example,
>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits.
>> This
>> > > >> > feature
>> > > >> > > >>>>>>> may also
>> > > >> > > >>>>>>> > > need
>> > > >> > > >>>>>>> > > >> an
>> > > >> > > >>>>>>> > > >> > >> interface like this.
>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface
>> to
>> > > >> support
>> > > >> > > >>>>>>> align other
>> > > >> > > >>>>>>> > > >> > things, I
>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>> > > >> > > >>>>>>> > > >> instead
>> > > >> > > >>>>>>> > > >> > of
>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it would
>> > > >> usually
>> > > >> > be
>> > > >> > > >>>>>>> more
>> > > >> > > >>>>>>> > > >> extensible
>> > > >> > > >>>>>>> > > >> > to
>> > > >> > > >>>>>>> > > >> > >> just describe what the method actually
>> does,
>> > > >> instead
>> > > >> > of
>> > > >> > > >>>>>>> assuming
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For example, in this
>> > case,
>> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
>> > > >> > > >>>>>>> > > >> > >> would be more extensible because this can
>> be
>> > used
>> > > >> for
>> > > >> > > >>>>>>> any kind of
>> > > >> > > >>>>>>> > > >> flow
>> > > >> > > >>>>>>> > > >> > >> control, be it watermark alignment or
>> simple
>> > rate
>> > > >> > > >>>>>>> limiting.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
>> think
>> > > >> they
>> > > >> > > have
>> > > >> > > >>>>>>> their
>> > > >> > > >>>>>>> > own
>> > > >> > > >>>>>>> > > >> > >> advantages.
>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
>> > > >> extending
>> > > >> > > >>>>>>> abilities
>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
>> methods in
>> > > >> this
>> > > >> > > >>>>>>> case. When
>> > > >> > > >>>>>>> > you
>> > > >> > > >>>>>>> > > >> have
>> > > >> > > >>>>>>> > > >> > a
>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
>> help
>> > to
>> > > >> > > >>>>>>> organize them
>> > > >> > > >>>>>>> > and
>> > > >> > > >>>>>>> > > >> make
>> > > >> > > >>>>>>> > > >> > >> users clear which methods
>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
>> > > >> ability.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if this
>> is
>> > a
>> > > >> > general
>> > > >> > > >>>>>>> design
>> > > >> > > >>>>>>> > > >> pattern
>> > > >> > > >>>>>>> > > >> > in
>> > > >> > > >>>>>>> > > >> > >> the other components in Flink. But it looks
>> > like
>> > > >> the
>> > > >> > > >>>>>>> current API
>> > > >> > > >>>>>>> > > >> > proposal
>> > > >> > > >>>>>>> > > >> > >> is not symmetric.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> The current proposal is essentially
>> > > >> "MySourceReader
>> > > >> > > >>>>>>> implements
>> > > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along
>> with
>> > > >> > > >>>>>>> "MySplitReader
>> > > >> > > >>>>>>> > > >> implements
>> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd
>> > consider any
>> > > >> > one
>> > > >> > > >>>>>>> of the
>> > > >> > > >>>>>>> > > >> following
>> > > >> > > >>>>>>> > > >> > as
>> > > >> > > >>>>>>> > > >> > >> symmetric.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>> > > >> > > >>>>>>> WithSplitsAlignment",
>> > > >> > > >>>>>>> > > >> along
>> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>> > > >> > > >>>>>>> WithSplitsAlignment",
>> > > >> > > >>>>>>> > > or
>> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
>> > AlignedSourceReader"
>> > > >> > and
>> > > >> > > >>>>>>> > > "MySplitReader
>> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader"
>> > and
>> > > >> > > >>>>>>> "MySplitReader
>> > > >> > > >>>>>>> > > >> > implements
>> > > >> > > >>>>>>> > > >> > >> SplitReader".
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
>> Dawid
>> > > >> would
>> > > >> > > be:
>> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader"
>> > and
>> > > >> > > >>>>>>> "MySplitReader
>> > > >> > > >>>>>>> > > >> > implements
>> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
>> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although
>> personally
>> > > >> > speaking I
>> > > >> > > >>>>>>> think it
>> > > >> > > >>>>>>> > is
>> > > >> > > >>>>>>> > > >> > simpler
>> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split
>> reader
>> > with
>> > > >> > > >>>>>>> default impl.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org>
>> <
>> > > >> > > >>>>>>> > dwysakowicz@apache.org
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > > >> <
>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>> dwysakowicz@apache.org
>> > >,
>> > > >> > thanks
>> > > >> > > >>>>>>> for the
>> > > >> > > >>>>>>> > > reply.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
>> preference
>> > > >> and I
>> > > >> > > >>>>>>> agree most
>> > > >> > > >>>>>>> > of
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
>> fine
>> > > >> > following
>> > > >> > > >>>>>>> your
>> > > >> > > >>>>>>> > > >> suggestion
>> > > >> > > >>>>>>> > > >> > to
>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
>> > > >> > > >>>>>>> > > >> but
>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in
>> the
>> > > >> Source
>> > > >> > to
>> > > >> > > >>>>>>> keep the
>> > > >> > > >>>>>>> > > two
>> > > >> > > >>>>>>> > > >> > >> methods together.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in
>> > Source
>> > > >> is
>> > > >> > > >>>>>>> that this
>> > > >> > > >>>>>>> > > allows
>> > > >> > > >>>>>>> > > >> > some
>> > > >> > > >>>>>>> > > >> > >> compile time check. For example, if a user
>> > enabled
>> > > >> > > >>>>>>> watermark
>> > > >> > > >>>>>>> > > >> alignment
>> > > >> > > >>>>>>> > > >> > >> while it is not supported by the Source, an
>> > > >> exception
>> > > >> > > >>>>>>> can be
>> > > >> > > >>>>>>> > thrown
>> > > >> > > >>>>>>> > > >> at
>> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general useful.
>> That
>> > > >> said,
>> > > >> > I
>> > > >> > > >>>>>>> agree that
>> > > >> > > >>>>>>> > > API
>> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the
>> two
>> > > >> methods
>> > > >> > > >>>>>>> together.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
>> > > >> > > >>>>>>> imjark@gmail.com> <
>> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
>> > > >> imjark@gmail.com>
>> > > >> > > >>>>>>> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182.
>> > Sorry I
>> > > >> did
>> > > >> > > >>>>>>> miss it.
>> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >> Jark
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
>> > Wysakowicz <
>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
>> dwysakowicz@apache.org>
>> > <
>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > >> > >> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Jark:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
>> > watermarks
>> > > >> > when
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > source
>> > > >> > > >>>>>>> > > >> > >> implements the interface?
>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>> > > >> alignment
>> > > >> > > even
>> > > >> > > >>>>>>> if Kafka
>> > > >> > > >>>>>>> > > >> > >> implements the interface,
>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
>> somehow. I
>> > > >> agree
>> > > >> > > >>>>>>> with Becket
>> > > >> > > >>>>>>> > > >> > >> we may need a
>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
>> users to
>> > > >> > > configure
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > source
>> > > >> > > >>>>>>> > > >> to
>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>> > > >> > > maxDesiredWatermark?
>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
>> maxDesiredWatermark
>> > will
>> > > >> > > >>>>>>> greatly affect
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> throughput
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
>> users
>> > > >> > > configure
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > > >> alignment
>> > > >> > > >>>>>>> > > >> > >> offset?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1]
>> which
>> > has
>> > > >> > been
>> > > >> > > >>>>>>> already
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> implemented
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with
>> the
>> > > >> > > watermark
>> > > >> > > >>>>>>> strategy
>> > > >> > > >>>>>>> > > >> which
>> > > >> > > >>>>>>> > > >> > >> also configures drift and update interval.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface
>> to
>> > > >> support
>> > > >> > > >>>>>>> align other
>> > > >> > > >>>>>>> > > >> things,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>> > > >> > > >>>>>>> > > >> instead
>> > > >> > > >>>>>>> > > >> > of
>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Becket:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I understand your point. On the other hand
>> > putting
>> > > >> > all
>> > > >> > > >>>>>>> methods,
>> > > >> > > >>>>>>> > > even
>> > > >> > > >>>>>>> > > >> > with
>> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain
>> > > >> features,
>> > > >> > > >>>>>>> makes the
>> > > >> > > >>>>>>> > > entry
>> > > >> > > >>>>>>> > > >> > >> threshold for writing a new source higher.
>> > > >> Instead of
>> > > >> > > >>>>>>> focusing on
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> basic
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> and required properties of the Source, the
>> > person
>> > > >> > > >>>>>>> implementing a
>> > > >> > > >>>>>>> > > >> source
>> > > >> > > >>>>>>> > > >> > >> must bother with and need to figure out
>> what
>> > all
>> > > >> of
>> > > >> > the
>> > > >> > > >>>>>>> extra
>> > > >> > > >>>>>>> > > >> features
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> are
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> about and how to deal with them. It makes
>> it
>> > also
>> > > >> > > harder
>> > > >> > > >>>>>>> to
>> > > >> > > >>>>>>> > > organize
>> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
>> preference
>> > > >> and I
>> > > >> > > >>>>>>> agree most
>> > > >> > > >>>>>>> > of
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am
>> fine
>> > > >> > following
>> > > >> > > >>>>>>> your
>> > > >> > > >>>>>>> > > >> suggestion
>> > > >> > > >>>>>>> > > >> > to
>> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
>> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
>> > > >> > > >>>>>>> > > >> but
>> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in
>> the
>> > > >> Source
>> > > >> > to
>> > > >> > > >>>>>>> keep the
>> > > >> > > >>>>>>> > > two
>> > > >> > > >>>>>>> > > >> > >> methods together.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate
>> the
>> > > >> > > >>>>>>> "alignSplits"
>> > > >> > > >>>>>>> > methods
>> > > >> > > >>>>>>> > > >> > differ
>> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader.
>> The
>> > > >> reason
>> > > >> > > for
>> > > >> > > >>>>>>> that is
>> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds,
>> > whereas
>> > > >> > > >>>>>>> SplitReader
>> > > >> > > >>>>>>> > > needs
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
>> > > >> discrepancy
>> > > >> > > >>>>>>> acceptable
>> > > >> > > >>>>>>> > for
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes significantly,
>> > > >> especially
>> > > >> > > as
>> > > >> > > >>>>>>> they
>> > > >> > > >>>>>>> > would
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> highly
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> likely impact performance as we would have
>> to
>> > > >> perform
>> > > >> > > >>>>>>> additional
>> > > >> > > >>>>>>> > > >> > lookups.
>> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary
>> > interface.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Dawid
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> [1]
>> > https://cwiki.apache.org/confluence/x/hQYBCw
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I
>> > missed
>> > > >> > > >>>>>>> something).
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
>> > watermarks
>> > > >> > when
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > source
>> > > >> > > >>>>>>> > > >> > >> implements the interface?
>> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>> > > >> alignment
>> > > >> > > even
>> > > >> > > >>>>>>> if Kafka
>> > > >> > > >>>>>>> > > >> > >> implements the interface,
>> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
>> somehow. I
>> > > >> agree
>> > > >> > > >>>>>>> with Becket
>> > > >> > > >>>>>>> > > >> > >> we may need a
>> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for
>> users to
>> > > >> > > configure
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > source
>> > > >> > > >>>>>>> > > >> to
>> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>> > > >> > > maxDesiredWatermark?
>> > > >> > > >>>>>>> > > >> > >> I think the algorithm of
>> maxDesiredWatermark
>> > will
>> > > >> > > >>>>>>> greatly affect
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> throughput
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
>> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
>> users
>> > > >> > > configure
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > > >> alignment
>> > > >> > > >>>>>>> > > >> > >> offset?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
>> > things in
>> > > >> > the
>> > > >> > > >>>>>>> future?
>> > > >> > > >>>>>>> > For
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> example,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> align read speed, I have
>> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits.
>> This
>> > > >> > feature
>> > > >> > > >>>>>>> may also
>> > > >> > > >>>>>>> > > need
>> > > >> > > >>>>>>> > > >> an
>> > > >> > > >>>>>>> > > >> > >> interface like this.
>> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface
>> to
>> > > >> support
>> > > >> > > >>>>>>> align other
>> > > >> > > >>>>>>> > > >> things,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
>> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>> > > >> > > >>>>>>> `alignWatermarksForSplits`
>> > > >> > > >>>>>>> > > >> instead
>> > > >> > > >>>>>>> > > >> > of
>> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
>> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
>> think
>> > > >> they
>> > > >> > > have
>> > > >> > > >>>>>>> their
>> > > >> > > >>>>>>> > own
>> > > >> > > >>>>>>> > > >> > >> advantages.
>> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
>> > > >> extending
>> > > >> > > >>>>>>> abilities
>> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than
>> methods in
>> > > >> this
>> > > >> > > >>>>>>> case. When
>> > > >> > > >>>>>>> > you
>> > > >> > > >>>>>>> > > >> have
>> > > >> > > >>>>>>> > > >> > a
>> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
>> help
>> > to
>> > > >> > > >>>>>>> organize them
>> > > >> > > >>>>>>> > and
>> > > >> > > >>>>>>> > > >> make
>> > > >> > > >>>>>>> > > >> > >> users clear which methods
>> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
>> > > >> ability.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >> Jark
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>> > > >> > > >>>>>>> becket.qin@gmail.com>
>> > > >> > > >>>>>>> > <
>> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com>
>> <
>> > > >> > > >>>>>>> becket.qin@gmail.com>
>> > > >> > > >>>>>>> > <
>> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies that
>> I
>> > > >> somehow
>> > > >> > > >>>>>>> misread a
>> > > >> > > >>>>>>> > > bunch
>> > > >> > > >>>>>>> > > >> of
>> > > >> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default implementation, I
>> was
>> > > >> > thinking
>> > > >> > > >>>>>>> of the
>> > > >> > > >>>>>>> > > default
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> no-op
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> implementation. I am a little worried about
>> > the
>> > > >> > > >>>>>>> proliferation of
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> decorative
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most important
>> thing
>> > about
>> > > >> > > >>>>>>> interfaces is
>> > > >> > > >>>>>>> > > that
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> they
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this case, I
>> prefer
>> > > >> adding
>> > > >> > > >>>>>>> new method
>> > > >> > > >>>>>>> > to
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > >> existing interface for the following
>> reasons:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of
>> decorative
>> > > >> > interfaces
>> > > >> > > >>>>>>> is which
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> interface
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> they can decorate and which combinations of
>> > > >> multiple
>> > > >> > > >>>>>>> decorative
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> interfaces
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
>> > > >> > withSplitsAlignment
>> > > >> > > >>>>>>> interface
>> > > >> > > >>>>>>> > > is
>> > > >> > > >>>>>>> > > >> > only
>> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader which means
>> it
>> > > >> can't
>> > > >> > > >>>>>>> decorate any
>> > > >> > > >>>>>>> > > >> other
>> > > >> > > >>>>>>> > > >> > >> interface. From an interface design
>> > perspective, a
>> > > >> > > >>>>>>> natural
>> > > >> > > >>>>>>> > question
>> > > >> > > >>>>>>> > > >> is
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> why
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
>> > > >> > > >>>>>>> "withSplitsAlignment"? And it
>> > > >> > > >>>>>>> > > is
>> > > >> > > >>>>>>> > > >> > also
>> > > >> > > >>>>>>> > > >> > >> natural to assume that a split reader
>> > implementing
>> > > >> > both
>> > > >> > > >>>>>>> > SplitReader
>> > > >> > > >>>>>>> > > >> and
>> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a
>> > source
>> > > >> > reader
>> > > >> > > >>>>>>> > implementing
>> > > >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works.
>> > So why
>> > > >> > > isn't
>> > > >> > > >>>>>>> there an
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> interface
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if
>> > there
>> > > >> is a
>> > > >> > > new
>> > > >> > > >>>>>>> feature
>> > > >> > > >>>>>>> > > >> added
>> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data
>> aware),
>> > are
>> > > >> we
>> > > >> > > >>>>>>> going to
>> > > >> > > >>>>>>> > create
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> another
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
>> > > >> SortedSplitReader or
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think the
>> > additional
>> > > >> > > >>>>>>> decorative
>> > > >> > > >>>>>>> > > interface
>> > > >> > > >>>>>>> > > >> > like
>> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
>> > > >> > > understanding
>> > > >> > > >>>>>>> cost of
>> > > >> > > >>>>>>> > > >> users
>> > > >> > > >>>>>>> > > >> > >> because they have to know what decorative
>> > > >> interfaces
>> > > >> > > are
>> > > >> > > >>>>>>> there,
>> > > >> > > >>>>>>> > > which
>> > > >> > > >>>>>>> > > >> > >> interface they can decorate and which
>> > > >> combinations of
>> > > >> > > the
>> > > >> > > >>>>>>> > > decorative
>> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which are not.
>> > Ideally we
>> > > >> > want
>> > > >> > > >>>>>>> to avoid
>> > > >> > > >>>>>>> > > >> that.
>> > > >> > > >>>>>>> > > >> > To
>> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an
>> > interface of
>> > > >> > > >>>>>>> > > >> withSplitsAlignment,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> it
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an internal
>> > > >> interface
>> > > >> > > and
>> > > >> > > >>>>>>> let
>> > > >> > > >>>>>>> > > >> > SourceReader
>> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader
>> > with a
>> > > >> > > default
>> > > >> > > >>>>>>> > > >> implementation
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> of
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic branching in
>> the
>> > > >> source
>> > > >> > > >>>>>>> logic,
>> > > >> > > >>>>>>> > > >> especially
>> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast majority
>> of
>> > the
>> > > >> > > >>>>>>> SourceReader
>> > > >> > > >>>>>>> > > >> > >> implementations, if not all, would just
>> extend
>> > > >> from
>> > > >> > the
>> > > >> > > >>>>>>> > > >> > SourceReaderBase.
>> > > >> > > >>>>>>> > > >> > >> That means adding a new method to the
>> > interface
>> > > >> would
>> > > >> > > >>>>>>> effectively
>> > > >> > > >>>>>>> > > >> give
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> the
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> c) A related design principle that may be
>> > worth
>> > > >> > > >>>>>>> discussing is how
>> > > >> > > >>>>>>> > > do
>> > > >> > > >>>>>>> > > >> we
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> let
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> the Source implementations tell Flink what
>> > > >> capability
>> > > >> > > is
>> > > >> > > >>>>>>> > supported
>> > > >> > > >>>>>>> > > >> and
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> what
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the most
>> > > >> intuitive
>> > > >> > > >>>>>>> place to me
>> > > >> > > >>>>>>> > > is
>> > > >> > > >>>>>>> > > >> in
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> the
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Source itself, because that is the entrance
>> > of the
>> > > >> > > >>>>>>> entire Source
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> connector
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> logic.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am wondering
>> > if the
>> > > >> > > >>>>>>> following
>> > > >> > > >>>>>>> > > >> interface
>> > > >> > > >>>>>>> > > >> > >> would be easier to understand by the users.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal
>> > > >> interface,
>> > > >> > > >>>>>>> let both
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> SourceReader
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a
>> default
>> > > >> no-op
>> > > >> > > >>>>>>> > > implementation.
>> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
>> > > >> supportSplitsAlignment()"
>> > > >> > > to
>> > > >> > > >>>>>>> the
>> > > >> > > >>>>>>> > Source
>> > > >> > > >>>>>>> > > >> > >> interface, with a default implementation
>> > returning
>> > > >> > > >>>>>>> false. Sources
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> have
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic can change
>> > this to
>> > > >> > > >>>>>>> return true,
>> > > >> > > >>>>>>> > and
>> > > >> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
>> > > >> > SourceReader
>> > > >> > > /
>> > > >> > > >>>>>>> > > SplitReader
>> > > >> > > >>>>>>> > > >> if
>> > > >> > > >>>>>>> > > >> > >> needed.
>> > > >> > > >>>>>>> > > >> > >> - In the future, if a new optional feature
>> is
>> > > >> going
>> > > >> > to
>> > > >> > > >>>>>>> be added
>> > > >> > > >>>>>>> > to
>> > > >> > > >>>>>>> > > >> the
>> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires the
>> > awareness
>> > > >> from
>> > > >> > > >>>>>>> Flink, we
>> > > >> > > >>>>>>> > can
>> > > >> > > >>>>>>> > > >> add
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> more
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> What do you think?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid
>> > Wysakowicz <
>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>> dwysakowicz@apache.org>
>> > <
>> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
>> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
>> > dwysakowicz@apache.org>
>> > > >> <
>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Konstantin:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
>> `AlignedSplitReader`
>> > > >> > > interface
>> > > >> > > >>>>>>> (aka the
>> > > >> > > >>>>>>> > > >> stop &
>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
>> > Kafka and
>> > > >> > > >>>>>>> Pulsar only,
>> > > >> > > >>>>>>> > > >> correct?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though, those are
>> > the
>> > > >> only
>> > > >> > > >>>>>>> sources
>> > > >> > > >>>>>>> > which
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> consume
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits and thus
>> > > >> alignment
>> > > >> > > >>>>>>> applies.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Thomas:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>> > > >> > > >>>>>>> SourceReaderBase and
>> > > >> > > >>>>>>> > > then
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> doing
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
>> implement
>> > > >> > > >>>>>>> AlignedSplitReader
>> > > >> > > >>>>>>> > > >> could
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> be
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
>> > > >> instead
>> > > >> > be
>> > > >> > > >>>>>>> added to
>> > > >> > > >>>>>>> > the
>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
>> > KafkaSourceReader) to
>> > > >> > make
>> > > >> > > >>>>>>> it
>> > > >> > > >>>>>>> > explicit
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we
>> > could
>> > > >> > > >>>>>>> actually do
>> > > >> > > >>>>>>> > that.
>> > > >> > > >>>>>>> > > >> Given
>> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
>> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
>> > > >> > > >>>>>>> > > >> rather
>> > > >> > > >>>>>>> > > >> > >> short (just a forward to the corresponding
>> > method
>> > > >> of
>> > > >> > > >>>>>>> > SplitFetcher),
>> > > >> > > >>>>>>> > > >> we
>> > > >> > > >>>>>>> > > >> > >> could reimplement it in the actual source
>> > > >> > > >>>>>>> implementations. This
>> > > >> > > >>>>>>> > > >> solution
>> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors of new
>> > sources
>> > > >> would
>> > > >> > > >>>>>>> have to do
>> > > >> > > >>>>>>> > > two
>> > > >> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and
>> > > >> implement
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> instead of just extending
>> AlignedSplitReader.
>> > I
>> > > >> would
>> > > >> > > be
>> > > >> > > >>>>>>> fine
>> > > >> > > >>>>>>> > with
>> > > >> > > >>>>>>> > > >> such
>> > > >> > > >>>>>>> > > >> > a
>> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Steven:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation section,
>> is
>> > it
>> > > >> > > >>>>>>> accurate? Let's
>> > > >> > > >>>>>>> > > >> assume
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> one
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
>> > one of
>> > > >> the
>> > > >> > > >>>>>>> partition
>> > > >> > > >>>>>>> > is
>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
>> > watermark
>> > > >> > for
>> > > >> > > >>>>>>> this
>> > > >> > > >>>>>>> > source
>> > > >> > > >>>>>>> > > >> task
>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
>> data
>> > from
>> > > >> > other
>> > > >> > > >>>>>>> two Kafka
>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back
>> > the
>> > > >> > overall
>> > > >> > > >>>>>>> > watermark.
>> > > >> > > >>>>>>> > > I
>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
>> late
>> > data
>> > > >> for
>> > > >> > > >>>>>>> this slow
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> partition.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark. Watermark
>> of
>> > an
>> > > >> > > >>>>>>> operator is the
>> > > >> > > >>>>>>> > > >> minimum
>> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I have another question about the restart.
>> Say
>> > > >> split
>> > > >> > > >>>>>>> alignment is
>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
>> > failed and
>> > > >> > > >>>>>>> restored from
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > >> > last
>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is
>> not
>> > > >> > > >>>>>>> checkpointed,
>> > > >> > > >>>>>>> > > initially
>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
>> > cycle
>> > > >> of
>> > > >> > > >>>>>>> watermark
>> > > >> > > >>>>>>> > > >> > aggregation
>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
>> > corner is
>> > > >> a
>> > > >> > > >>>>>>> problem. Just
>> > > >> > > >>>>>>> > > >> want
>> > > >> > > >>>>>>> > > >> > to
>> > > >> > > >>>>>>> > > >> > >> understand it more.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> @Becket:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
>> > > >> general
>> > > >> > use
>> > > >> > > >>>>>>> case, so
>> > > >> > > >>>>>>> > > >> should
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> we
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
>> SourceReader
>> > > >> directly
>> > > >> > > >>>>>>> instead of
>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
>> > > >> WithSplitAssignment?
>> > > >> > > We
>> > > >> > > >>>>>>> can
>> > > >> > > >>>>>>> > > provide
>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
>> > > >> compatibility
>> > > >> > > >>>>>>> won't be an
>> > > >> > > >>>>>>> > > >> issue.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a default
>> > > >> > implementation.
>> > > >> > > >>>>>>> How would
>> > > >> > > >>>>>>> > we
>> > > >> > > >>>>>>> > > >> do
>> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it
>> better
>> > than
>> > > >> > > having
>> > > >> > > >>>>>>> an
>> > > >> > > >>>>>>> > opt-in
>> > > >> > > >>>>>>> > > >> > >> interface? The default implementation would
>> > have
>> > > >> to
>> > > >> > be
>> > > >> > > >>>>>>> added
>> > > >> > > >>>>>>> > > >> exclusively
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> in
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the
>> way
>> > > >> notice
>> > > >> > > >>>>>>> > > SourceReaderBase
>> > > >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
>> > > >> effectively
>> > > >> > > all
>> > > >> > > >>>>>>> > > >> implementations
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> do
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I
>> > think
>> > > >> it is
>> > > >> > > >>>>>>> impossible
>> > > >> > > >>>>>>> > to
>> > > >> > > >>>>>>> > > >> > >> implement the SourceReader interface
>> directly
>> > by
>> > > >> end
>> > > >> > > >>>>>>> users.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
>> interface
>> > > >> > probably
>> > > >> > > >>>>>>> also
>> > > >> > > >>>>>>> > needs
>> > > >> > > >>>>>>> > > >> some
>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
>> > > >> > granularity.
>> > > >> > > >>>>>>> Can you
>> > > >> > > >>>>>>> > add
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >> interface change into the public interface
>> > > >> section as
>> > > >> > > >>>>>>> well?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> It has been added from the beginning. See
>> > > >> > > >>>>>>> *AlignedSplitReader.*
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
>> > > >> > assignSplits
>> > > >> > > >>>>>>> here,
>> > > >> > > >>>>>>> > given
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> it
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
>> > assignments? It
>> > > >> > > seems
>> > > >> > > >>>>>>> > something
>> > > >> > > >>>>>>> > > >> like
>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>> > > >> adjustSplitsThrottling() is
>> > > >> > > >>>>>>> more
>> > > >> > > >>>>>>> > > accurate.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not
>> > assign. Do
>> > > >> you
>> > > >> > > >>>>>>> still
>> > > >> > > >>>>>>> > prefer
>> > > >> > > >>>>>>> > > a
>> > > >> > > >>>>>>> > > >> > >> different name for that? Personally, I am
>> > open for
>> > > >> > > >>>>>>> suggestions
>> > > >> > > >>>>>>> > > here.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Dawid
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> [1]
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>>
>> > > >> > >
>> > > >> >
>> > > >>
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I
>> > think
>> > > >> the
>> > > >> > > >>>>>>> motivation
>> > > >> > > >>>>>>> > > >> makes a
>> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions /
>> > questions.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
>> > > >> general
>> > > >> > use
>> > > >> > > >>>>>>> case, so
>> > > >> > > >>>>>>> > > >> should
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> we
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> just add the related methods to
>> SourceReader
>> > > >> directly
>> > > >> > > >>>>>>> instead of
>> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
>> > > >> WithSplitAssignment?
>> > > >> > > We
>> > > >> > > >>>>>>> can
>> > > >> > > >>>>>>> > > provide
>> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
>> > > >> compatibility
>> > > >> > > >>>>>>> won't be an
>> > > >> > > >>>>>>> > > >> issue.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
>> interface
>> > > >> > probably
>> > > >> > > >>>>>>> also
>> > > >> > > >>>>>>> > needs
>> > > >> > > >>>>>>> > > >> some
>> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
>> > > >> > granularity.
>> > > >> > > >>>>>>> Can you
>> > > >> > > >>>>>>> > add
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >> interface change into the public interface
>> > > >> section as
>> > > >> > > >>>>>>> well?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
>> > > >> > assignSplits
>> > > >> > > >>>>>>> here,
>> > > >> > > >>>>>>> > given
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> it
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
>> > assignments? It
>> > > >> > > seems
>> > > >> > > >>>>>>> > something
>> > > >> > > >>>>>>> > > >> like
>> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>> > > >> adjustSplitsThrottling() is
>> > > >> > > >>>>>>> more
>> > > >> > > >>>>>>> > > accurate.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu
>> <
>> > > >> > > >>>>>>> stevenz3wu@gmail.com
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> > > <
>> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com>
>> <
>> > > >> > > >>>>>>> stevenz3wu@gmail.com>
>> > > >> > > >>>>>>> > <
>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
>> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> However, a single source operator may read
>> > data
>> > > >> from
>> > > >> > > >>>>>>> multiple
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
>> > > >> partitions,
>> > > >> > > such
>> > > >> > > >>>>>>> that
>> > > >> > > >>>>>>> > even
>> > > >> > > >>>>>>> > > >> with
>> > > >> > > >>>>>>> > > >> > >> watermark alignment the source operator may
>> > need
>> > > >> to
>> > > >> > > >>>>>>> buffer
>> > > >> > > >>>>>>> > > excessive
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> amount
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> of data if one split emits data faster than
>> > > >> another.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> For this part from the motivation section,
>> is
>> > it
>> > > >> > > >>>>>>> accurate? Let's
>> > > >> > > >>>>>>> > > >> assume
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> one
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
>> > one of
>> > > >> the
>> > > >> > > >>>>>>> partition
>> > > >> > > >>>>>>> > is
>> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
>> > watermark
>> > > >> > for
>> > > >> > > >>>>>>> this
>> > > >> > > >>>>>>> > source
>> > > >> > > >>>>>>> > > >> task
>> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent
>> data
>> > from
>> > > >> > other
>> > > >> > > >>>>>>> two Kafka
>> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back
>> > the
>> > > >> > overall
>> > > >> > > >>>>>>> > watermark.
>> > > >> > > >>>>>>> > > I
>> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have
>> late
>> > data
>> > > >> for
>> > > >> > > >>>>>>> this slow
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> partition.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I have another question about the restart.
>> Say
>> > > >> split
>> > > >> > > >>>>>>> alignment is
>> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
>> > failed and
>> > > >> > > >>>>>>> restored from
>> > > >> > > >>>>>>> > > the
>> > > >> > > >>>>>>> > > >> > last
>> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is
>> not
>> > > >> > > >>>>>>> checkpointed,
>> > > >> > > >>>>>>> > > initially
>> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
>> > cycle
>> > > >> of
>> > > >> > > >>>>>>> watermark
>> > > >> > > >>>>>>> > > >> > aggregation
>> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
>> > corner is
>> > > >> a
>> > > >> > > >>>>>>> problem. Just
>> > > >> > > >>>>>>> > > >> want
>> > > >> > > >>>>>>> > > >> > to
>> > > >> > > >>>>>>> > > >> > >> understand it more.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas
>> Weise <
>> > > >> > > >>>>>>> thw@apache.org> <
>> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
>> thw@apache.org>
>> > <
>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
>> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>> > > >> > > >>>>>>> SourceReaderBase and
>> > > >> > > >>>>>>> > > then
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> doing
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
>> implement
>> > > >> > > >>>>>>> AlignedSplitReader
>> > > >> > > >>>>>>> > > >> could
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> be
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
>> > > >> instead
>> > > >> > be
>> > > >> > > >>>>>>> added to
>> > > >> > > >>>>>>> > the
>> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
>> > KafkaSourceReader) to
>> > > >> > make
>> > > >> > > >>>>>>> it
>> > > >> > > >>>>>>> > explicit
>> > > >> > > >>>>>>> > > >> that
>> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Thanks,
>> > > >> > > >>>>>>> > > >> > >> Thomas
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin
>> > Knauf <
>> > > >> > > >>>>>>> > > knaufk@apache.org>
>> > > >> > > >>>>>>> > > >> <
>> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
>> > > >> > knaufk@apache.org
>> > > >> > > >
>> > > >> > > >>>>>>> <
>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
>> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
>> `AlignedSplitReader`
>> > > >> > > interface
>> > > >> > > >>>>>>> (aka the
>> > > >> > > >>>>>>> > > >> stop
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> &
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
>> > Kafka and
>> > > >> > > >>>>>>> Pulsar only,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> correct?
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to
>> > > >> complete
>> > > >> > the
>> > > >> > > >>>>>>> watermark
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> aligned
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Cheers,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Konstantin
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid
>> > Wysakowicz
>> > > >> <
>> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
>> dwysakowicz@apache.org>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I
>> > support it
>> > > >> ;)
>> > > >> > I
>> > > >> > > >>>>>>> think we
>> > > >> > > >>>>>>> > can
>> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there are
>> no
>> > > >> concerns
>> > > >> > > so
>> > > >> > > >>>>>>> far.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Dawid
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis
>> wrote:
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion on FLIP
>> 217
>> > [1]
>> > > >> for
>> > > >> > > an
>> > > >> > > >>>>>>> > extension
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> of
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment
>> also
>> > in
>> > > >> > > >>>>>>> SplitReaders. To
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> do
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> so,
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and
>> > resume
>> > > >> > reading
>> > > >> > > >>>>>>> from
>> > > >> > > >>>>>>> > split
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> sources
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and
>> > controlls
>> > > >> > > >>>>>>> suspend and
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> resume.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> To
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> gather information about current watermarks
>> > of the
>> > > >> > > >>>>>>> SplitReaders,
>> > > >> > > >>>>>>> > we
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> extend
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and
>> > report
>> > > >> > > >>>>>>> watermarks to
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> the
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> SourceOperator.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2],
>> prototyped
>> > by
>> > > >> Arvid
>> > > >> > > >>>>>>> Heise and
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> revised
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> and
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most
>> of
>> > the
>> > > >> > work.)
>> > > >> > > >>>>>>> and me.
>> > > >> > > >>>>>>> > The
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> changes
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way that if
>> > affected
>> > > >> > > >>>>>>> components do
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> not
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> support split alignment the behavior is as
>> > before.
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Best,
>> > > >> > > >>>>>>> > > >> > >> Sebastian
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> [1]
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>>
>> > > >> > >
>> > > >> >
>> > > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> [2]
>> > > >> > > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> --
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
>> > > >> > > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> > >>
>> > > >> > > >>>>>>> > > >> >
>> > > >> > > >>>>>>> > > >>
>> > > >> > > >>>>>>> > > >
>> > > >> > > >>>>>>> > >
>> > > >> > > >>>>>>> >
>> > > >> > > >>>>>>>
>> > > >> > > >>>>>>
>> > > >> > >
>> > > >> >
>> > > >>
>> > > >
>> >
>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi,

It looks like option 1 wins overall? So let's go with that.

Best,
Piotrek

śr., 15 cze 2022 o 04:13 Steven Wu <st...@gmail.com> napisał(a):

> Both option 1 (default impl in base interface) and option 2 (decorative
> interface) are pretty common patterns. I would also be fine with either.
> The important thing is that an exception is thrown if a source doesn't
> support the alignment capability.
>
> The other point is that we can validate the source capability if alignment
> is enabled in WatermarkStrategy. I believe either option can achieve this
> goal too.
> public interface WatermarkStrategy<T> {
>     WatermarkStrategy<T> withWatermarkAlignment(String watermarkGroup,
> Duration maxAllowedWatermarkDrift);
> }
>
> If I have to pick one, I am slightly favoring option 1 (base interface). As
> watermark is already an essential concept of source, maybe watermark
> alignment capability can also be a property of the base source/reader
> interface.
>
> On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org> wrote:
>
> > Hi everyone,
> >
> > Thank you for all the effort that went into this discussion. The split
> > level watermark alignment will be an important feature for Flink that
> > will address operational problems for various use cases. From reading
> > through this thread it appears that not too much remains to bring this
> > FLIP to acceptance and allow development to move forward. I would like
> > to contribute if possible.
> >
> > Regarding option 1 vs. option 2: I don't have a strong preference,
> > perhaps slightly leaning towards option 1.
> >
> > However, from a user perspective, should the split level alignment be
> > an opt-in feature, at least for a few releases? If yes, then we would
> > require a knob similar to supportsPausingSplits(), which I understand
> > won't be part of the revised FLIP. Such control may be beneficial:
> >
> > * Compare runtime behavior with split level alignment on/off
> > * Allow use of sources that don't implement pausing splits yet
> >
> > The second point would, from the user's perspective, be necessary for
> > backward compatibility? While the interface aspect and source
> > compatibility has been discussed in great detail, I don't think it
> > would be desirable if an application that already uses alignment fails
> > after upgrading to the new Flink version, forcing users to lock step
> > modify sources for the new non-optional split level alignment.
> >
> > So I think clarification of the compatibility aspect on the FLIP page
> > would be necessary.
> >
> > Thanks,
> > Thomas
> >
> > On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <piotr.nowojski@gmail.com
> >
> > wrote:
> > >
> > > Hi Becket,
> > >
> > > Thanks for summing this up. Just one correction:
> > >
> > > > Piotr prefers option 2, his opinions are:
> > > >   e) It is OK that the code itself in option 2 indicates the
> developers
> > > that a feature is optional. We will rely on the documentation to
> correct
> > > that and clarify that the feature is actually obligatory.
> > >
> > > I would say based on a) and b) that feature would be still optional. So
> > > both the implementation and the documentation would be saying that. We
> > > could add a mention to the docs and release notes, that this feature
> will
> > > be obligatory in the next major release and plan such a release
> > accordingly.
> > >
> > > Re the option 1., as you mentioned:
> > > > As for option 1: For developers, the feature is still optional due to
> > the
> > > default implementation in the interface, regardless of what the default
> > > implementation does, because the code compiles without overriding these
> > > methods
> > >
> > > Also importantly, the code will work in most cases.
> > >
> > > > Obligatory: Jobs may fail if these methods are not implemented
> > properly.
> > > e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in
> > Java,
> > > e.g. Iterator.remove() by default throws
> "UnsupportedOperationException",
> > > informing the implementation that things may go wrong if this method is
> > not
> > > implemented.
> > >
> > > For me `Iterator#remove()` is an optional feature. Personally, I don't
> > > remember if I have ever implemented it.
> > >
> > > Best,
> > > Piotrek
> > >
> > > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com> napisał(a):
> > >
> > > > I had an offline discussion with Piotr and here is the summary.
> Please
> > > > correct me if I miss something, Piotr.
> > > >
> > > > There are two things we would like to seek more opinions from the
> > > > community, so we can make progress on this FLIP.
> > > >
> > > > 1. The General pattern to add obligatory features to existing
> > interfaces.
> > > >
> > > >
> >
> ***********************************************************************************
> > > > For interfaces exposed to the developers for implementation, they are
> > > > either intended to be *optional* or *obligatory. *While it is quite
> > clear
> > > > about how to convey that intention when creating the interfaces, it
> is
> > not
> > > > as commonly agreed when we are adding new features to an existing
> > > > interface. In general, Flink uses decorative interfaces when adding
> > > > optional features to existing interfaces. Both Piotr and I agree that
> > looks
> > > > good.
> > > >
> > > > Different opinions are mainly about how to add obligatory features to
> > the
> > > > existing interfaces, probably due to different understandings of
> > > > "obligatory".
> > > >
> > > > We have discussed about four options:
> > > >
> > > > *Option 1:*
> > > >
> > > >    - Just add a new method to the existing interface.
> > > >    - For backwards compatibility, the method would have a default
> > > >    implementation throwing "UnsupportedOperationException".
> > > >    - In the next major version, remove the default implementation.
> > > >    - For the developers, any method with a default implementation
> > > >    throwing an "UnsupportedOperationException" should be taken as
> > obligatory.
> > > >
> > > > *Option 2:*
> > > >
> > > >    - Always make the features optional by adding a decorative
> > interface,
> > > >    just like ordinary optional features.
> > > >    - Inform the developers via documentation that this feature is
> > > >    obligatory, although it looks like optional from the code.
> > > >    - In case the developers did not implement the decorative
> interface,
> > > >    throw an exception
> > > >    - In the next major version, move the methods in the decorative
> > > >    interface to the base interface, and deprecate the decorative
> > interface.
> > > >
> > > > *Option 3:*
> > > >
> > > >    - Always bump the major version when a new obligatory feature is
> > > >    added, even if we may have to do it frequently.
> > > >
> > > > *Option 4:*
> > > >
> > > >    - Add a V2, V3... of the interface affected by the new obligatory
> > > >    feature.
> > > >    - In the next major versions, deprecate old versions of the
> > interfaces.
> > > >
> > > > Both Piotr and me agreed that option 3 and option 4 have a big side
> > effect
> > > > and should be avoided. We have different preference between option 1
> > and
> > > > option 2.
> > > >
> > > > Personally I prefer option 1, the reasons are:
> > > >   a) simple and intuitive. Java 8 introduced the default impl in
> > > > interfaces exactly for interface evolving, and this is a common
> > pattern in
> > > > many projects.
> > > >   b) prominent to the developers that the feature is expected to be
> > > > implemented, because it explicitly throws an exception in the default
> > impl.
> > > >   c) low maintenance overhead - the Flink framework can always assume
> > the
> > > > method exists, so no special handling logic is needed.
> > > >   d) communicate a clear semantic boundary between optional and
> > obligatory
> > > > features in the Flink to the developers.
> > > >       - Optional: Jobs still run without exception if these methods
> are
> > > > not implemented. e.g. all the SupportsXXXPushDown interfaces.
> > > >       - Obligatory: Jobs may fail if these methods are not
> implemented
> > > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a common
> > pattern
> > > > in Java, e.g. Iterator.remove() by default throws
> > > > "UnsupportedOperationException", informing the implementation that
> > things
> > > > may go wrong if this method is not implemented.
> > > >
> > > > As for option 2, Although the API itself sounds clean, it misleads
> > people
> > > > to think of an obligatory feature to be optional - from the code the
> > > > feature is optional, but the documents say it is obligatory. We
> > probably
> > > > should avoid such code-doc inconsistency, as people will be confused.
> > And I
> > > > would actually be bewildered that sometimes not implementing an
> > "optional"
> > > > feature is fine, but sometimes it causes the jobs to fail.
> > > >
> > > > In response to the argument that the method with a default
> > implementation
> > > > is always optional, if that is true, it actually means all the
> > interfaces
> > > > should be immutable once they are created. If we want to add a method
> > to an
> > > > existing interface, for backwards compatibility, we will have to
> > provide a
> > > > default implementation. And the fact it has a default implementation
> > > > indicates the method is optional. If that method is optional, it
> should
> > > > reside in a separate decorative interface, otherwise it clogs that
> > existing
> > > > interface. Therefore, people should never add a method to an existing
> > > > interface. I find this conclusion a bit extreme.
> > > >
> > > > Piotr prefers option 2, his opinions are:
> > > >     a) Obligatory methods are the methods that fail the code
> > compilation
> > > > if not implemented.
> > > >     b) All obligatory methods should reside in the base interface,
> > without
> > > > a default implementation. And all the optional methods should be in
> > > > decorative interfaces. This is a clean API.
> > > >     c) due to b), there isn't a viable solution to add an obligatory
> > > > method to an existing interface in a backwards compatible way. Unless
> > we
> > > > are OK with breaking backwards compatibility, all the interfaces
> > should be
> > > > treated as immutable. As a compromise, we might as well just treat
> all
> > the
> > > > features added later as optional features. This way we keep the API
> > clean.
> > > >     d) based on b) and c), option 2 has a clean API, while option 1
> > does
> > > > not.
> > > >     e) It is OK that the code itself in option 2 indicates the
> > developers
> > > > that a feature is optional. We will rely on the documentation to
> > correct
> > > > that and clarify that the feature is actually obligatory.
> > > >     f) Regarding the effectiveness of making people aware that the
> > feature
> > > > is obligatory, Option 1 and Option 2 are similar. For people that do
> > not
> > > > read the release note / documentation, they will mistake the feature
> > to be
> > > > optional anyways.
> > > >
> > > > As for option 1: For developers, the feature is still optional due to
> > the
> > > > default implementation in the interface, regardless of what the
> default
> > > > implementation does, because the code compiles without overriding
> these
> > > > methods. Also, another problem of this option is that for users that
> > do not
> > > > know about the history of the interface, they may be confused by the
> > > > default implementation throwing an exception.
> > > >
> > > >
> > > > 2. For this particular FLIP, should it be optional or not?
> > > >
> > > >
> >
> ***********************************************************************************
> > > > As mentioned in the previous email, I feel this FLIP should be
> > obligatory,
> > > > for the following reasons:
> > > > 1. The Flink framework exposes the watermark alignment API to the end
> > > > users. From the end users' perspective, the feature should be
> available
> > > > regardless of the implementation details in the pluggables. This is
> > true
> > > > for any other methods exposed as the Flink API.
> > > > 2. If a Source is not pausable, the end user should receive an
> > exception
> > > > when enable the watermark alignment, (both Piotr and me agree on
> > this). In
> > > > that case, it meets my criteria of obligatory feature because not
> > > > implementing the feature causes a framework API to throw exception
> and
> > > > fails the job.
> > > >
> > > > On the other hand, Piotr does not have a strong opinion regarding
> > whether
> > > > this feature should be optional or not.
> > > >
> > > >
> > > > Thanks for reading through this long email. So basically in order to
> > make
> > > > progress on this FLIP, we want to see what do people feel about the
> > above
> > > > two topics.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <pnowojski@apache.org
> >
> > > > wrote:
> > > >
> > > >> Hi Becket,
> > > >>
> > > >> I still sustain what I wrote before:
> > > >> > I think I would still vote soft -1 on this option, but I wouldn't
> > block
> > > >> it in case I am out-voted.
> > > >>
> > > >> > I think it might be helpful to agree on the definition of optional
> > in
> > > >> our
> > > >> case.
> > > >>
> > > >> For me it doesn't matter whether a default method throwing an
> > exception we
> > > >> call optional or non-optional. As long as we keep it this way, the
> > effect
> > > >> is the same. It's effectively a method that a user doesn't have to
> > > >> implement. If interface/system allows some methods to be not
> > implemented,
> > > >> some users will do just that, regardless if we call it and document
> as
> > > >> non-optional. And at the same time it's clogging the base interface.
> > > >>
> > > >> By the way, just the need for a java-doc/documentation explaining
> the
> > > >> existence of some construct is a bad smell (code should be
> > > >> self-documenting
> > > >> and default method throwing an UnsupportedOperationException is
> not).
> > > >>
> > > >> > Please note that so far we do not assume whether the feature is in
> > > >> > the original API or it is added later. A newly added feature can
> > also be
> > > >> > non-optional, although it might take some time for all the
> pluggable
> > > >> > developers to catch up, and they should still work if the new
> > feature is
> > > >> > not used until they catch up. In contrast, we may never expect an
> > > >> optional
> > > >> > feature to catch up, because leaving it unimplemented is also
> > blessed.
> > > >> >
> > > >> > Let's take the checkpointing as an example. Imagine Flink did not
> > > >> support
> > > >> > checkpointing before release 1.16. And now we are trying to add
> > > >> > checkpointing to Flink. So we exposed the checkpoint configuration
> > to
> > > >> the
> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
> > operators,
> > > >> > connectors) developers that methods like "snapshotState()" is
> > optional?
> > > >> If
> > > >> > we do that, the availability of checkpointing in Flink would be
> > severely
> > > >> > weakened. But apparently we should still allow the existing
> > > >> implementations
> > > >> > to work without checkpointing. It looks to me that adding the
> > method to
> > > >> the
> > > >> > pluggable interfaces with a default implementation throwing
> > > >> > "UnsupportedOperationException" would be the solution here. Please
> > note
> > > >> > that in this case, having the default implementation does not mean
> > this
> > > >> is
> > > >> > optional. It is just the technique to support backwards
> > compatibility in
> > > >> > the feature evolution. The fact that this method is in the base
> > > >> interface
> > > >> > suggests it is not optional, so the developers SHOULD implement
> it.
> > > >>
> > > >> I would soft vote -1 for having the default method throwing
> > > >> UnsupportedOperationException as one of thing for this (FLIP-217)
> > special
> > > >> circumstances.
> > > >>
> > > >> At the moment, without thinking this over too much, I would vote
> > harder -1
> > > >> for having this as a general rule when adding new features. If we
> > ever end
> > > >> up with an API that is littered with default methods throwing
> > > >> UnsupportedOperationException that are documented as "non optional"
> it
> > > >> would be IMO a big design failure. I would be ok-ish with that, only
> > if
> > > >> that was a temporary thing and we had an aggressive plan to release
> > more
> > > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
> > > >> compatibility, that would get rid of those default methods. Adding
> > > >> checkpointing and methods like "snapshotState()" would IMO easily
> > justify
> > > >> a
> > > >> new major Flink release. In that case we could add those methods
> with
> > > >> default implementation for some transition period, a one or two
> minor
> > > >> releases, followed by a clean up in a major release. However I would
> > still
> > > >> argue that it would be cleaner/better to add a decorative interface
> > like
> > > >> `CheckpointedOperator` interface instead of adding those default
> > methods
> > > >> to
> > > >> the base `Operator` interface.
> > > >>
> > > >> I think I can sum up our disagreement here is that I would like to
> > keep
> > > >> the
> > > >> interfaces simpler, with only obligatory methods/features on one
> side
> > and
> > > >> clearly optional features on the other. While you would like to add
> an
> > > >> extra third state in between those two?
> > > >>
> > > >> Best,
> > > >> Piotrek
> > > >>
> > > >>
> > > >>
> > > >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com>
> > napisał(a):
> > > >>
> > > >> > Thanks for the clarification, Piotr and Sebastian.
> > > >> >
> > > >> > It looks like the key problem is still whether the implementation
> of
> > > >> > pausable splits in the Sources should be optional or not.
> > > >> >
> > > >> > I think it might be helpful to agree on the definition of optional
> > in
> > > >> our
> > > >> > case. To me:
> > > >> > Optional = "You CAN leave the method unimplemented, and that is
> > fine."
> > > >> > Non-Optional = "You CAN leave the method unimplemented, but you
> > SHOULD
> > > >> NOT,
> > > >> > because people assume this works."
> > > >> >
> > > >> > I think one sufficient condition of a Non-Optional feature is that
> > if
> > > >> the
> > > >> > feature is exposed through the framework API, Flink should expect
> > the
> > > >> > pluggables to support this feature by default. Otherwise the
> > > >> availability
> > > >> > of that feature becomes undefined.
> > > >> >
> > > >> > Please note that so far we do not assume whether the feature is in
> > > >> > the original API or it is added later. A newly added feature can
> > also be
> > > >> > non-optional, although it might take some time for all the
> pluggable
> > > >> > developers to catch up, and they should still work if the new
> > feature is
> > > >> > not used until they catch up. In contrast, we may never expect an
> > > >> optional
> > > >> > feature to catch up, because leaving it unimplemented is also
> > blessed.
> > > >> >
> > > >> > Let's take the checkpointing as an example. Imagine Flink did not
> > > >> support
> > > >> > checkpointing before release 1.16. And now we are trying to add
> > > >> > checkpointing to Flink. So we exposed the checkpoint configuration
> > to
> > > >> the
> > > >> > end users. In the meantime, will we tell the pluggable (e.g.
> > operators,
> > > >> > connectors) developers that methods like "snapshotState()" is
> > optional?
> > > >> If
> > > >> > we do that, the availability of checkpointing in Flink would be
> > severely
> > > >> > weakened. But apparently we should still allow the existing
> > > >> implementations
> > > >> > to work without checkpointing. It looks to me that adding the
> > method to
> > > >> the
> > > >> > pluggable interfaces with a default implementation throwing
> > > >> > "UnsupportedOperationException" would be the solution here. Please
> > note
> > > >> > that in this case, having the default implementation does not mean
> > this
> > > >> is
> > > >> > optional. It is just the technique to support backwards
> > compatibility in
> > > >> > the feature evolution. The fact that this method is in the base
> > > >> interface
> > > >> > suggests it is not optional, so the developers SHOULD implement
> it.
> > > >> >
> > > >> > When it comes to this FLIP, I think it meets the criteria of
> > > >> non-optional
> > > >> > features, so we should just use the evolution path of non-optional
> > > >> > features.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> > pnowojski@apache.org>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > >
> > > >> > > Actually previously I thought about having a decorative
> interface
> > and
> > > >> > > whenever watermark alignment is enabled, checking that the
> source
> > > >> > > implements the decorative interface. If not, throwing an
> > exception.
> > > >> > >
> > > >> > > The option with default methods in the source interfaces
> throwing
> > > >> > > `UnsupportedOperationException` I think still suffers from the
> > same
> > > >> > > problems I mentioned before. It's still an optional
> implementation
> > > >> and at
> > > >> > > the same time it's clogging the base interface. I think I would
> > still
> > > >> > vote
> > > >> > > soft -1 on this option, but I wouldn't block it in case I am
> > > >> out-voted.
> > > >> > >
> > > >> > > Best,
> > > >> > > Piotrek
> > > >> > >
> > > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> > sebastian@ververica.com>
> > > >> > > napisał(a):
> > > >> > >
> > > >> > > > Hi Becket,
> > > >> > > >
> > > >> > > > Thanks a lot for your fast and detailed response. For me, it
> > > >> converges
> > > >> > > and
> > > >> > > > dropping the supportsX method sounds very reasonable to me.
> > (Side
> > > >> note:
> > > >> > > > With "pausable splits" enabled as "default" I think we
> > > >> misunderstood.
> > > >> > As
> > > >> > > > you described now "default" I understand as that it should be
> > the
> > > >> new
> > > >> > > > recommended way of implementation, and I think that is fully
> > valid.
> > > >> > > Before,
> > > >> > > > I understood "default" here as the default implementation,
> i.e.,
> > > >> > throwing
> > > >> > > > UnsupportedOperationException, which is the exact opposite.
> :) )
> > > >> > > >
> > > >> > > > Nevertheless: As mentioned, an open question for me is if
> > watermark
> > > >> > > > alignment should enforce pausable splits. For clarification,
> the
> > > >> > current
> > > >> > > > documentation [1] says:
> > > >> > > >
> > > >> > > > *Note:* As of 1.15, Flink supports aligning across tasks of
> the
> > same
> > > >> > > >> source and/or different sources. It does not support aligning
> > > >> > > >> splits/partitions/shards in the same task.
> > > >> > > >>
> > > >> > > >> In a case where there are e.g. two Kafka partitions that
> > produce
> > > >> > > >> watermarks at different pace, that get assigned to the same
> > task
> > > >> > > watermark
> > > >> > > >> might not behave as expected. Fortunately, worst case it
> > should not
> > > >> > > perform
> > > >> > > >> worse than without alignment.
> > > >> > > >>
> > > >> > > >> Given the limitation above, we suggest applying watermark
> > > >> alignment in
> > > >> > > >> two situations:
> > > >> > > >>
> > > >> > > >>    1. You have two different sources (e.g. Kafka and File)
> that
> > > >> > produce
> > > >> > > >>    watermarks at different speeds
> > > >> > > >>    2. You run your source with parallelism equal to the
> number
> > of
> > > >> > > >>    splits/shards/partitions, which results in every subtask
> > being
> > > >> > > assigned a
> > > >> > > >>    single unit of work.
> > > >> > > >>
> > > >> > > >> I personally see no issue in implementing and I see no reason
> > > >> against
> > > >> > > > implementing this dependency of watermark alignment and
> pausable
> > > >> > splits.
> > > >> > > (I
> > > >> > > > think this would even be a good path towards shaping watermark
> > > >> > alignment
> > > >> > > in
> > > >> > > > 1.16.) However, "I don't see" means that I would be happy to
> > hear
> > > >> > Dawid's
> > > >> > > > and Piotrek's opinions as they implemented watermark alignment
> > > >> based on
> > > >> > > > FLIP-182 [2] and I don't want to miss relevant
> > rationale/background
> > > >> > info
> > > >> > > > from their side.
> > > >> > > >
> > > >> > > > *@Piotrek* *@Dawid *What do you think?
> > > >> > > >
> > > >> > > > Regards,
> > > >> > > > Sebastian
> > > >> > > >
> > > >> > > > [1]
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > >> > > > [2]
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > >> > > >
> > > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> > becket.qin@gmail.com>
> > > >> > wrote:
> > > >> > > >
> > > >> > > >> +dev
> > > >> > > >>
> > > >> > > >> Hi Sebastian,
> > > >> > > >>
> > > >> > > >> Thank you for the summary. Please see the detailed replies
> > inline.
> > > >> As
> > > >> > a
> > > >> > > >> recap of my suggestions.
> > > >> > > >>
> > > >> > > >> 1. Pausable splits API.
> > > >> > > >>   a) Add default implementations to methods
> > "pauseOrResumeSplits"
> > > >> in
> > > >> > > both
> > > >> > > >> SourceReader and SplitReader where both default
> implementations
> > > >> throw
> > > >> > > >>  UnsupportedOperationException.
> > > >> > > >>
> > > >> > > >> 2. User story.
> > > >> > > >>     a) We tell users to enable the watermark alignment as
> they
> > > >> like.
> > > >> > > This
> > > >> > > >> is exactly what the current Flink API is.
> > > >> > > >>     b) We tell the source developers, please implement
> pausable
> > > >> > splits,
> > > >> > > >> otherwise bad things may happen. Think of it like you are
> > expected
> > > >> to
> > > >> > > >> implement SourceReader#snapshotState() properly, otherwise
> > > >> exceptions
> > > >> > > will
> > > >> > > >> be thrown when users enable checkpointing.
> > > >> > > >>
> > > >> > > >> Thanks,
> > > >> > > >>
> > > >> > > >> Jiangjie (Becket) Qin
> > > >> > > >>
> > > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> > > >> > > >> sebastian@ververica.com> wrote:
> > > >> > > >>
> > > >> > > >>> Hi Becket, Hi everybody,
> > > >> > > >>>
> > > >> > > >>> I'm sorry if I misread the messages but I could not derive
> an
> > > >> > agreement
> > > >> > > >>> from the mailing list. Nevertheless, if I understand you
> > right the
> > > >> > > >>> suggestion is:
> > > >> > > >>>
> > > >> > > >>> * Add default implementations to methods
> > "pauseOrResumeSplits" in
> > > >> > both
> > > >> > > >>> SourceReader and SplitReader where both default
> > implementations
> > > >> throw
> > > >> > > >>> UnsupportedOperationException.
> > > >> > > >>>
> > > >> > > >> Yes.
> > > >> > > >>
> > > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source interface.
> > (In
> > > >> the
> > > >> > > >>> following, I refer to supporting this as "pausable splits".)
> > > >> > > >>>
> > > >> > > >> We may no longer need this if pausable splits are expected to
> > be
> > > >> > > >> implemented by the source developers, i.e. non-optional.
> Having
> > > >> this
> > > >> > > method
> > > >> > > >> would then be somewhat misleading as it looks like the
> sources
> > > >> that do
> > > >> > > not
> > > >> > > >> support pausable splits are also acceptable in the long term.
> > So
> > > >> API
> > > >> > > wise,
> > > >> > > >> I'd say maybe we should remove this for this FLIP, although I
> > > >> believe
> > > >> > > this
> > > >> > > >> supportXXX pattern itself is still attractive for optional
> > > >> features.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >>>
> > > >> > > >>> To make the conclusions explicit:
> > > >> > > >>>
> > > >> > > >>> 1. The implementation of pauseOrResumeSplits in both
> > interfaces
> > > >> > > >>> SourceReader and SplitReader are optional where the default
> is
> > > >> that
> > > >> > it
> > > >> > > >>> doesn't support it. (--> This means that the implementation
> is
> > > >> still
> > > >> > > >>> optional for the source developer.)
> > > >> > > >>>
> > > >> > > >> It is optional for backwards compatibility with existing
> > sources,
> > > >> as
> > > >> > > they
> > > >> > > >> may still compile without code change. But starting from this
> > FLIP,
> > > >> > > Flink
> > > >> > > >> will always optimistically assume that all the sources
> support
> > > >> > pausable
> > > >> > > >> splits. If a source does not support pausable splits, it goes
> > to an
> > > >> > > error
> > > >> > > >> handling path when watermark alignment is enabled on it. This
> > is
> > > >> > > different
> > > >> > > >> from a usual optional feature, where no error is expected.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >>> 2. If watermark alignment is enabled in the application code
> > by
> > > >> > adding
> > > >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
> > > >> SourceReader or
> > > >> > > >>> SplitReader do not support pausableSplits, we throw an
> > > >> > > >>> UnsupportedOperationException.
> > > >> > > >>>
> > > >> > > >> Yes.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >>> 3. With regard to your statement:
> > > >> > > >>>
> > > >> > > >>>> [...] basically means watermark alignment is an
> non-optional
> > > >> feature
> > > >> > > to
> > > >> > > >>>> the end users.
> > > >> > > >>>
> > > >> > > >>> You actually mean that "pausable splits" are non-optional
> for
> > the
> > > >> app
> > > >> > > >>> developer if watermark alignment is enabled. However,
> > watermark
> > > >> > > alignment
> > > >> > > >>> is optional and can be enabled/disabled.
> > > >> > > >>>
> > > >> > > >> Yes, watermark alignment can be enabled/disabled in
> individual
> > > >> sources
> > > >> > > in
> > > >> > > >> Flink jobs, which basically means the code supporting
> watermark
> > > >> > > alignment
> > > >> > > >> has to already be there. That again means the Source
> > developers are
> > > >> > also
> > > >> > > >> expected to support pausable splits by default. So this way
> we
> > > >> > > essentially
> > > >> > > >> tell the end users that you may enable / disable this feature
> > as
> > > >> you
> > > >> > > wish,
> > > >> > > >> and tell the source developers that you SHOULD implement this
> > > >> because
> > > >> > > the
> > > >> > > >> end users may turn it on/off at will. And if the source does
> > not
> > > >> > support
> > > >> > > >> pausable splits, that goes to an error handling path when
> > watermark
> > > >> > > >> alignment is enabled on it. So users know they have to
> > explicitly
> > > >> > > exclude
> > > >> > > >> this source.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >>>
> > > >> > > >>> So far it's totally clear to me and I hope this is what you
> > mean.
> > > >> I
> > > >> > > also
> > > >> > > >>> agree with both statements:
> > > >> > > >>>
> > > >> > > >>> So making that expectation aligned with the source
> developers
> > > >> seems
> > > >> > > >>>> reasonable.
> > > >> > > >>>>
> > > >> > > >>>
> > > >> > > >>> I think this is a simple and clean solution from both the
> end
> > user
> > > >> > and
> > > >> > > >>>> source developers' standpoint.
> > > >> > > >>>>
> > > >> > > >>>
> > > >> > > >>> However, a last conclusion derives from 3. and is an open
> > question
> > > >> > for
> > > >> > > >>> me:
> > > >> > > >>>
> > > >> > > >>> 4. The feature of "pausable splits" is now tightly bound to
> > > >> watermark
> > > >> > > >>> alignment, i.e., if sources do not support "pausable splits"
> > one
> > > >> can
> > > >> > > not
> > > >> > > >>> enable watermark alignment for these sources. This
> dependency
> > is
> > > >> not
> > > >> > > the
> > > >> > > >>> current status of watermark alignment implementation because
> > it
> > > >> > is/was
> > > >> > > >>> implemented without pausable splits. Do we want to introduce
> > this
> > > >> > > >>> dependency? (This is an open question. I cannot judge that.)
> > > >> > > >>>
> > > >> > > >> The watermark alignment basically relies on the pausable
> > splits,
> > > >> > right?
> > > >> > > >> So personally I found it quite reasonable that if the source
> > does
> > > >> not
> > > >> > > >> support pausable splits, end users cannot enable watermark
> > > >> alignment
> > > >> > on
> > > >> > > it.
> > > >> > > >>
> > > >> > > >>
> > > >> > > >>> If something is wrong, please correct me.
> > > >> > > >>>
> > > >> > > >>> Regards,
> > > >> > > >>> Sebastian
> > > >> > > >>>
> > > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> > becket.qin@gmail.com>
> > > >> > > wrote:
> > > >> > > >>>
> > > >> > > >>>> Hi Sebastian,
> > > >> > > >>>>
> > > >> > > >>>> Thanks for the reply and patient discussion. I agree this
> is
> > a
> > > >> > tricky
> > > >> > > >>>> decision.
> > > >> > > >>>>
> > > >> > > >>>>
> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
> > which I
> > > >> see
> > > >> > as
> > > >> > > >>>>> follows:
> > > >> > > >>>>> (1) An interface with default NOOP implementation makes
> the
> > > >> > > >>>>> implementation optional. And in my opinion, a default
> > > >> > implementation
> > > >> > > is and
> > > >> > > >>>>> will remain a way of making implementation optional
> because
> > > >> even in
> > > >> > > future
> > > >> > > >>>>> a developer can decide to implement the "old flavor"
> without
> > > >> > support
> > > >> > > for
> > > >> > > >>>>> pausable splits.
> > > >> > > >>>>> (2) It may not be too critical but I also find it
> suboptimal
> > > >> that
> > > >> > > with
> > > >> > > >>>>> a NOOP default implementation there is no way to check at
> > > >> runtime
> > > >> > if
> > > >> > > >>>>> SourceReader or SplitReader actually support pausing. (To
> > do so,
> > > >> > one
> > > >> > > would
> > > >> > > >>>>> need a supportsX method which makes it again more
> > complicated.)\
> > > >> > > >>>>
> > > >> > > >>>>
> > > >> > > >>>> Based on the last few messages in the mailing list.  Piotr
> > and I
> > > >> > > agreed
> > > >> > > >>>> that the default implementation should just throw an
> > > >> > > >>>> UnsupportedOperationException if the source is unpausable.
> So
> > > >> this
> > > >> > > >>>> basically tells the Source developers that this feature is
> > > >> expected
> > > >> > > to be
> > > >> > > >>>> supported. Because we cannot prevent end users from putting
> > an
> > > >> > > unpausable
> > > >> > > >>>> source into the watermark alignment group, that basically
> > means
> > > >> > > watermark
> > > >> > > >>>> alignment is an non-optional feature to the end users. So
> > making
> > > >> > that
> > > >> > > >>>> expectation aligned with the source developers seems
> > reasonable.
> > > >> > And
> > > >> > > if a
> > > >> > > >>>> source does not support this feature, the end users should
> > > >> > explicitly
> > > >> > > >>>> remove that source from the watermark alignment group.
> > > >> > > >>>>
> > > >> > > >>>> Personally speaking I think this is a simple and clean
> > solution
> > > >> from
> > > >> > > >>>> both the end user and source developers' standpoint.
> > > >> > > >>>>
> > > >> > > >>>> Does this address your concerns?
> > > >> > > >>>>
> > > >> > > >>>> Thanks,
> > > >> > > >>>>
> > > >> > > >>>> Jiangjie (Becket) Qin
> > > >> > > >>>>
> > > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> > > >> > > >>>> sebastian@ververica.com> wrote:
> > > >> > > >>>>
> > > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > >> > > >>>>>
> > > >> > > >>>>> we, Dawid and I, discussed the various suggestions/options
> > and
> > > >> we
> > > >> > > >>>>> would be okay either way because we find neither solution
> is
> > > >> > perfect
> > > >> > > just
> > > >> > > >>>>> because of the already present complexity.
> > > >> > > >>>>>
> > > >> > > >>>>> Option c) Adding methods to the interfaces of SourceReader
> > and
> > > >> > > >>>>> SplitReader
> > > >> > > >>>>> Option a) Adding decorative interfaces to be used by
> > > >> SourceReader
> > > >> > and
> > > >> > > >>>>> SplitReader
> > > >> > > >>>>>
> > > >> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is
> > based on
> > > >> > > >>>>> Option c) which we find acceptable because the complexity
> > added
> > > >> is
> > > >> > > only a
> > > >> > > >>>>> single method.
> > > >> > > >>>>>
> > > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
> > which I
> > > >> see
> > > >> > as
> > > >> > > >>>>> follows:
> > > >> > > >>>>> (1) An interface with default NOOP implementation makes
> the
> > > >> > > >>>>> implementation optional. And in my opinion, a default
> > > >> > implementation
> > > >> > > is and
> > > >> > > >>>>> will remain a way of making implementation optional
> because
> > > >> even in
> > > >> > > future
> > > >> > > >>>>> a developer can decide to implement the "old flavor"
> without
> > > >> > support
> > > >> > > for
> > > >> > > >>>>> pausable splits.
> > > >> > > >>>>> (2) It may not be too critical but I also find it
> suboptimal
> > > >> that
> > > >> > > with
> > > >> > > >>>>> a NOOP default implementation there is no way to check at
> > > >> runtime
> > > >> > if
> > > >> > > >>>>> SourceReader or SplitReader actually support pausing. (To
> > do so,
> > > >> > one
> > > >> > > would
> > > >> > > >>>>> need a supportsX method which makes it again more
> > complicated.)
> > > >> > > >>>>>
> > > >> > > >>>>> However, we haven't changed it because Option a) is also
> not
> > > >> > optimal
> > > >> > > >>>>> or straight-forward:
> > > >> > > >>>>> (1) We need to add two distinct yet similar decorative
> > > >> interfaces
> > > >> > > >>>>> since, as mentioned, the signatures of the methods are
> > > >> different.
> > > >> > For
> > > >> > > >>>>> example, we would need decorative interfaces like
> > > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > > >> > `SourceReaderWithPausableSplits`.
> > > >> > > >>>>> (2) As a consequence, we would need to somehow document
> > > >> how/where
> > > >> > to
> > > >> > > >>>>> implement both interfaces and how this relates to each
> > other.
> > > >> This
> > > >> > > we could
> > > >> > > >>>>> solve by adding a note in the interface of SourceReader
> and
> > > >> > > SplitReader and
> > > >> > > >>>>> reference to the decorative interfaces but it still
> > increases
> > > >> > > complexity
> > > >> > > >>>>> too.
> > > >> > > >>>>>
> > > >> > > >>>>> In summary, we see both as acceptable and preferred over
> > other
> > > >> > > >>>>> options. The question is if we can find a solution or
> > compromise
> > > >> > > that is
> > > >> > > >>>>> acceptable for everybody to reach consensus.
> > > >> > > >>>>>
> > > >> > > >>>>> Please let us know what you think because we would be
> happy
> > if
> > > >> we
> > > >> > can
> > > >> > > >>>>> conclude the discussion to avoid dropping the initiative
> on
> > this
> > > >> > > FLIP.
> > > >> > > >>>>>
> > > >> > > >>>>> Regards,
> > > >> > > >>>>> Sebastian
> > > >> > > >>>>>
> > > >> > > >>>>> [1]
> > > >> > > >>>>>
> > > >> > >
> > > >> >
> > > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > >> > > >>>>> (v. 12)
> > > >> > > >>>>>
> > > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> > > >> > pnowojski@apache.org
> > > >> > > >
> > > >> > > >>>>> wrote:
> > > >> > > >>>>>
> > > >> > > >>>>>> Hi Guowei,
> > > >> > > >>>>>>
> > > >> > > >>>>>> as Dawid wrote a couple of messages back:
> > > >> > > >>>>>>
> > > >> > > >>>>>> > This is covered in the previous FLIP[1] which has been
> > > >> already
> > > >> > > >>>>>> implemented in 1.15. In short, it must be enabled with
> the
> > > >> > watermark
> > > >> > > >>>>>> strategy which also configures drift and update interval
> > > >> > > >>>>>>
> > > >> > > >>>>>> So by default watermark alignment is disabled, regardless
> > if a
> > > >> > > source
> > > >> > > >>>>>> supports it or not.
> > > >> > > >>>>>>
> > > >> > > >>>>>> Best,
> > > >> > > >>>>>> Piotrek
> > > >> > > >>>>>>
> > > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <guowei.mgw@gmail.com
> >
> > > >> > > napisał(a):
> > > >> > > >>>>>>
> > > >> > > >>>>>>> Hi,
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> We know that in the case of Bounded input Flink supports
> > the
> > > >> > Batch
> > > >> > > >>>>>>> execution mode. Currently in Batch execution mode, flink
> > is
> > > >> > > executed
> > > >> > > >>>>>>> on a
> > > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark
> > alignment
> > > >> > > might
> > > >> > > >>>>>>> not
> > > >> > > >>>>>>> gain much.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> So my question is: Is watermark alignment the default
> > > >> > behavior(for
> > > >> > > >>>>>>> implemented source only)? If so, have you considered
> > > >> evaluating
> > > >> > the
> > > >> > > >>>>>>> impact
> > > >> > > >>>>>>> of this behavior on the Batch execution mode? Or thinks
> > it is
> > > >> not
> > > >> > > >>>>>>> necessary.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> Correct me if I miss something.
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> Best,
> > > >> > > >>>>>>> Guowei
> > > >> > > >>>>>>>
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> > > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > > >> > > >>>>>>> wrote:
> > > >> > > >>>>>>>
> > > >> > > >>>>>>> > Hi Becket and Dawid,
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > > I feel that no matter which option we choose this
> can
> > not
> > > >> be
> > > >> > > >>>>>>> solved
> > > >> > > >>>>>>> > entirely in either of the options, because of the
> point
> > > >> above
> > > >> > and
> > > >> > > >>>>>>> because
> > > >> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
> > > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly
> different
> > (one
> > > >> > > >>>>>>> identifies
> > > >> > > >>>>>>> > splits with splitId the other one passes the splits
> > > >> directly).
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > Yes, that's a good point in this case and for features
> > that
> > > >> > need
> > > >> > > >>>>>>> to be
> > > >> > > >>>>>>> > implemented in more than one place.
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a split
> > an
> > > >> > > optional
> > > >> > > >>>>>>> feature,
> > > >> > > >>>>>>> > > other than that this was not included in the
> original
> > > >> > > interface?
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > An additional argument in favor of making it optional
> > is to
> > > >> > > >>>>>>> simplify source
> > > >> > > >>>>>>> > implementation. But on its own I'm not sure if that
> > would be
> > > >> > > >>>>>>> enough to
> > > >> > > >>>>>>> > justify making this feature optional. Maybe.
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to just
> > let
> > > >> end
> > > >> > > >>>>>>> users and
> > > >> > > >>>>>>> > Flink
> > > >> > > >>>>>>> > > assume all the connectors will implement this
> feature.
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > As I wrote above that would be an interesting choice
> to
> > make
> > > >> > > (ease
> > > >> > > >>>>>>> of
> > > >> > > >>>>>>> > implementation for new users, vs system consistency).
> > > >> > Regardless
> > > >> > > >>>>>>> of that,
> > > >> > > >>>>>>> > yes, for me the main argument is the API backward
> > > >> > compatibility.
> > > >> > > >>>>>>> But let's
> > > >> > > >>>>>>> > clear a couple of points:
> > > >> > > >>>>>>> > - The current proposal adding methods to the base
> > interface
> > > >> > with
> > > >> > > >>>>>>> default
> > > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
> > > >> decorative
> > > >> > > >>>>>>> version
> > > >> > > >>>>>>> > would be.
> > > >> > > >>>>>>> > - Decorative version could implement "throw
> > > >> > > >>>>>>> UnsupportedOperationException"
> > > >> > > >>>>>>> > if user enabled watermark alignment just as well and I
> > agree
> > > >> > > >>>>>>> that's a
> > > >> > > >>>>>>> > better option compared to logging a warning.
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > Best,
> > > >> > > >>>>>>> > Piotrek
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> > becket.qin@gmail.com>
> > > >> > > >>>>>>> napisał(a):
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>> > > Thanks for the reply and patient discussion, Piotr
> and
> > > >> Dawid.
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > Is there any reason for pausing reading from a split
> > an
> > > >> > > optional
> > > >> > > >>>>>>> feature,
> > > >> > > >>>>>>> > > other than that this was not included in the
> original
> > > >> > > interface?
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > To be honest I am really worried about the
> complexity
> > of
> > > >> the
> > > >> > > >>>>>>> user story
> > > >> > > >>>>>>> > > here. Optional features like this have a high
> > overhead.
> > > >> > Imagine
> > > >> > > >>>>>>> this
> > > >> > > >>>>>>> > > feature is optional, now a user enabled watermark
> > > >> alignment
> > > >> > and
> > > >> > > >>>>>>> defined a
> > > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that
> > depends on
> > > >> > > >>>>>>> whether the
> > > >> > > >>>>>>> > > involved Source has implmemented this feature. If
> the
> > > >> Sources
> > > >> > > >>>>>>> are well
> > > >> > > >>>>>>> > > documented, good luck. Otherwise end users may have
> to
> > > >> look
> > > >> > > into
> > > >> > > >>>>>>> the code
> > > >> > > >>>>>>> > > of the Source to see whether the feature is
> supported.
> > > >> Which
> > > >> > is
> > > >> > > >>>>>>> something
> > > >> > > >>>>>>> > > they shouldn't have to do.
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > I think it would be way simpler and clearer to just
> > let
> > > >> end
> > > >> > > >>>>>>> users and
> > > >> > > >>>>>>> > Flink
> > > >> > > >>>>>>> > > assume all the connectors will implement this
> feature.
> > > >> After
> > > >> > > all
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > > watermark group is not optinoal to the end users. If
> > in
> > > >> some
> > > >> > > >>>>>>> rare cases,
> > > >> > > >>>>>>> > > the feature cannot be supported, a clear
> > > >> > > >>>>>>> UnsupportedOperationException
> > > >> > > >>>>>>> > will
> > > >> > > >>>>>>> > > be thrown to tell users to explicitly remove this
> > Source
> > > >> from
> > > >> > > the
> > > >> > > >>>>>>> > watermark
> > > >> > > >>>>>>> > > group. I don't think we should have a warning
> message
> > > >> here,
> > > >> > as
> > > >> > > >>>>>>> they tend
> > > >> > > >>>>>>> > to
> > > >> > > >>>>>>> > > be ignored in many cases. If we do this, we don't
> even
> > > >> need
> > > >> > the
> > > >> > > >>>>>>> > supportXXX
> > > >> > > >>>>>>> > > method in the Source for this feature. In fact this
> is
> > > >> > exactly
> > > >> > > >>>>>>> how many
> > > >> > > >>>>>>> > > interfaces works today. For example,
> > > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > >> > > >>>>>>> > not
> > > >> > > >>>>>>> > > supported by Pravega source because it does not
> > support
> > > >> > partial
> > > >> > > >>>>>>> failover.
> > > >> > > >>>>>>> > > In that case, it simply throws an exception to
> > trigger a
> > > >> > global
> > > >> > > >>>>>>> recovery.
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > The reason we add a default implementation in this
> > case
> > > >> would
> > > >> > > >>>>>>> just for
> > > >> > > >>>>>>> > the
> > > >> > > >>>>>>> > > sake of backwards compatibility so the old source
> can
> > > >> still
> > > >> > > >>>>>>> compile.
> > > >> > > >>>>>>> > Sure,
> > > >> > > >>>>>>> > > in short term, this feature might not be supported
> by
> > many
> > > >> > > >>>>>>> existing
> > > >> > > >>>>>>> > > sources. That is OK, and it is quite visible to the
> > source
> > > >> > > >>>>>>> developers
> > > >> > > >>>>>>> > that
> > > >> > > >>>>>>> > > they did not override the default impl which throws
> an
> > > >> > > >>>>>>> > > UnsupportedOperationException.
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > @Dawid,
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in the
> Source
> > > >> would
> > > >> > be
> > > >> > > >>>>>>> the single
> > > >> > > >>>>>>> > > >> source of truth regarding how to implement this
> > > >> feature.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > I also don't find it entirely true. Half of the
> > classes
> > > >> are
> > > >> > > >>>>>>> theoretically
> > > >> > > >>>>>>> > > > optional and are utility classes from the point of
> > view
> > > >> how
> > > >> > > the
> > > >> > > >>>>>>> > > interfaces
> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need to
> > use
> > > >> any
> > > >> > of
> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to
> > list
> > > >> > their
> > > >> > > >>>>>>> methods in
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > > Source interface.
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > I think the ultimate goal of java docs is to guide
> > users
> > > >> to
> > > >> > > >>>>>>> implement the
> > > >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way to
> > > >> > implement a
> > > >> > > >>>>>>> > > SourceReader, it seems worth mentioning that. Even
> the
> > > >> Java
> > > >> > > >>>>>>> language
> > > >> > > >>>>>>> > > documentation interfaces lists the konwn
> > implementations
> > > >> [1]
> > > >> > so
> > > >> > > >>>>>>> people
> > > >> > > >>>>>>> > can
> > > >> > > >>>>>>> > > leverage them. But for this particular case, if we
> > make
> > > >> the
> > > >> > > >>>>>>> feature
> > > >> > > >>>>>>> > > non-optional, we don't even need the supportXXX()
> > method
> > > >> for
> > > >> > > now.
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > Thanks,
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
> > > >> > > >>>>>>> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > wrote:
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > First of all, let me say I am happy with whichever
> > > >> option
> > > >> > is
> > > >> > > >>>>>>> agreed in
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > > discussion.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > I wanted to clarify a few points from the
> discussion
> > > >> > though:
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > @Becket:
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > The main argument for adding the methods to the
> > > >> > SourceReader
> > > >> > > >>>>>>> is that
> > > >> > > >>>>>>> > > these
> > > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to
> SourceReader
> > > >> impl,
> > > >> > > i.e.
> > > >> > > >>>>>>> > starting
> > > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are
> > expected
> > > >> to
> > > >> > > >>>>>>> support this
> > > >> > > >>>>>>> > > > method, although some old implementations may not
> > have
> > > >> > > >>>>>>> implemented this
> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
> > features
> > > >> > from
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > optional
> > > >> > > >>>>>>> > > > features. While the public decorative interface
> is a
> > > >> > solution
> > > >> > > >>>>>>> to the
> > > >> > > >>>>>>> > > > optional features. We should not use it for the
> > features
> > > >> > that
> > > >> > > >>>>>>> are
> > > >> > > >>>>>>> > > > non-optional.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL.
> > Even
> > > >> > though
> > > >> > > >>>>>>> > preferred, I
> > > >> > > >>>>>>> > > > still think it can be simply optional.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the
> > Source
> > > >> would
> > > >> > > be
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > single
> > > >> > > >>>>>>> > > > source of truth regarding how to implement this
> > feature.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > I also don't find it entirely true. Half of the
> > classes
> > > >> are
> > > >> > > >>>>>>> > theoretically
> > > >> > > >>>>>>> > > > optional and are utility classes from the point of
> > view
> > > >> how
> > > >> > > the
> > > >> > > >>>>>>> > > interfaces
> > > >> > > >>>>>>> > > > are organized. Theoretically users do not need to
> > use
> > > >> any
> > > >> > of
> > > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to
> > list
> > > >> > their
> > > >> > > >>>>>>> methods in
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > > Source interface.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > @Piotr
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > If we have all of the methods with default
> > > >> implementation
> > > >> > in
> > > >> > > >>>>>>> the base
> > > >> > > >>>>>>> > > > interface, the API doesn't give any clue to the
> user
> > > >> which
> > > >> > > set
> > > >> > > >>>>>>> of
> > > >> > > >>>>>>> > methods
> > > >> > > >>>>>>> > > > are required to be implemented at the same time.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > I feel that no matter which option we choose this
> > can
> > > >> not
> > > >> > be
> > > >> > > >>>>>>> solved
> > > >> > > >>>>>>> > > > entirely in either of the options, because of the
> > point
> > > >> > above
> > > >> > > >>>>>>> and
> > > >> > > >>>>>>> > because
> > > >> > > >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits
> and
> > > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly
> > different
> > > >> > (one
> > > >> > > >>>>>>> identifies
> > > >> > > >>>>>>> > > > splits with splitId the other one passes the
> splits
> > > >> > > directly).
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Best,
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Dawid
> > > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Hi Piotr,
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Thanks for the comment.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Just to clarify, I am not against the decorative
> > > >> > interfaces,
> > > >> > > >>>>>>> but I do
> > > >> > > >>>>>>> > > > think we should use them with caution. The main
> > argument
> > > >> > for
> > > >> > > >>>>>>> adding the
> > > >> > > >>>>>>> > > > methods to the SourceReader is that these methods
> > are
> > > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl,
> i.e.
> > > >> > starting
> > > >> > > >>>>>>> from this
> > > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to
> > support
> > > >> > this
> > > >> > > >>>>>>> > > > method, although some old implementations may not
> > have
> > > >> > > >>>>>>> implemented this
> > > >> > > >>>>>>> > > > feature. I think we should distinguish the new
> > features
> > > >> > from
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > optional
> > > >> > > >>>>>>> > > > features. While the public decorative interface
> is a
> > > >> > solution
> > > >> > > >>>>>>> to the
> > > >> > > >>>>>>> > > > optional features. We should not use it for the
> > features
> > > >> > that
> > > >> > > >>>>>>> are
> > > >> > > >>>>>>> > > > non-optional.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > That said, this feature is optional for
> > SplitReaders.
> > > >> > > Arguably
> > > >> > > >>>>>>> we can
> > > >> > > >>>>>>> > > have
> > > >> > > >>>>>>> > > > a decorative interface for that, but for
> simplicity
> > and
> > > >> > > >>>>>>> symmetry of the
> > > >> > > >>>>>>> > > > interface, personally I prefer just adding a new
> > method.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Regarding the advantages you mentioned about the
> > > >> decorative
> > > >> > > >>>>>>> interfaces,
> > > >> > > >>>>>>> > > > they would make sense if:
> > > >> > > >>>>>>> > > > 1. The feature is optional.
> > > >> > > >>>>>>> > > > 2. There is only one decorative interface involved
> > for a
> > > >> > > >>>>>>> feature.
> > > >> > > >>>>>>> > > > Otherwise the argument that all the methods are
> > grouped
> > > >> > > >>>>>>> together will
> > > >> > > >>>>>>> > not
> > > >> > > >>>>>>> > > > stand.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Compared with that, I think the current solution
> > works
> > > >> fine
> > > >> > > in
> > > >> > > >>>>>>> all
> > > >> > > >>>>>>> > cases,
> > > >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and
> > default
> > > >> > > >>>>>>> methods /
> > > >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > The advantages are:
> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > In the current approach, the Java doc of the
> > > >> SupportXXX()
> > > >> > > >>>>>>> method in the
> > > >> > > >>>>>>> > > > Source would be the single source of truth
> regarding
> > > >> how to
> > > >> > > >>>>>>> implement
> > > >> > > >>>>>>> > > this
> > > >> > > >>>>>>> > > > feature. It lists the method that has to be
> > implemented
> > > >> to
> > > >> > > >>>>>>> support this
> > > >> > > >>>>>>> > > > feature, regardless of how many classes /
> > interfaces are
> > > >> > > >>>>>>> involved.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > When implementing the base interface, users do not
> > need
> > > >> to
> > > >> > > >>>>>>> implement a
> > > >> > > >>>>>>> > > > method with default implementation. If they are
> > curious
> > > >> > what
> > > >> > > >>>>>>> the method
> > > >> > > >>>>>>> > > is
> > > >> > > >>>>>>> > > > for, the java doc of that method simply points
> > users to
> > > >> the
> > > >> > > >>>>>>> > SupportXXX()
> > > >> > > >>>>>>> > > > method in the Source. It seems not adding work to
> > the
> > > >> users
> > > >> > > >>>>>>> compared
> > > >> > > >>>>>>> > with
> > > >> > > >>>>>>> > > > decorative interfaces, but gives much better
> > > >> > discoverability.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > - all of the methods from a single feature are
> > grouped
> > > >> in a
> > > >> > > >>>>>>> single
> > > >> > > >>>>>>> > > >> decorator interface, together with their
> dedicated
> > java
> > > >> > doc.
> > > >> > > >>>>>>> It's also
> > > >> > > >>>>>>> > > >> easier to google search for help using the
> > decorator
> > > >> name
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > - if an optional feature requires two methods to
> be
> > > >> > > >>>>>>> implemented at
> > > >> > > >>>>>>> > once,
> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > These two points are not true when multiple
> > components
> > > >> and
> > > >> > > >>>>>>> classes are
> > > >> > > >>>>>>> > > > involved collaboratively to provide a feature. In
> > our
> > > >> case,
> > > >> > > we
> > > >> > > >>>>>>> have
> > > >> > > >>>>>>> > both
> > > >> > > >>>>>>> > > > SourceReader and SplitReader involved. And there
> > might
> > > >> be
> > > >> > > other
> > > >> > > >>>>>>> > > interfaces
> > > >> > > >>>>>>> > > > on the JM side involved for some future features.
> > So the
> > > >> > > >>>>>>> relevant
> > > >> > > >>>>>>> > methods
> > > >> > > >>>>>>> > > > can actually be scattered over the places. That
> > said, we
> > > >> > may
> > > >> > > >>>>>>> still use
> > > >> > > >>>>>>> > > > decorative interfaces for each component, if the
> > > >> feature is
> > > >> > > >>>>>>> optional,
> > > >> > > >>>>>>> > > given
> > > >> > > >>>>>>> > > > there is a single source of truth for the feature.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Here I would strongly lean towards making life
> > easier
> > > >> for
> > > >> > new
> > > >> > > >>>>>>> users,
> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
> > expense
> > > >> > for
> > > >> > > >>>>>>> the power
> > > >> > > >>>>>>> > > >> users.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > I actually think the current approach is simpler,
> > more
> > > >> > > >>>>>>> extensible and
> > > >> > > >>>>>>> > > more
> > > >> > > >>>>>>> > > > general for all the users. Can you articulate a
> bit
> > > >> more on
> > > >> > > >>>>>>> which part
> > > >> > > >>>>>>> > > you
> > > >> > > >>>>>>> > > > think makes users harder to understand?
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > There is another benefit of the decorative
> > interfaces
> > > >> which
> > > >> > > is
> > > >> > > >>>>>>> not
> > > >> > > >>>>>>> > > > mentioned, but might be worth considering here.
> > Usually
> > > >> the
> > > >> > > >>>>>>> decorative
> > > >> > > >>>>>>> > > > interfaces give slightly better backwards
> > compatibility
> > > >> > than
> > > >> > > >>>>>>> the new
> > > >> > > >>>>>>> > > > default method in the interfaces. That is when
> > users are
> > > >> > > using
> > > >> > > >>>>>>> a jar
> > > >> > > >>>>>>> > that
> > > >> > > >>>>>>> > > > was compiled with an older version of Flink which
> > does
> > > >> not
> > > >> > > >>>>>>> have the
> > > >> > > >>>>>>> > > default
> > > >> > > >>>>>>> > > > method in the interfaces in question. A decorative
> > > >> > interface
> > > >> > > >>>>>>> may still
> > > >> > > >>>>>>> > > > provide backwards compatibility in that case,
> while
> > > >> default
> > > >> > > >>>>>>> method impl
> > > >> > > >>>>>>> > > > cannot.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > I think in Flink we in general do not guarantee
> > custom
> > > >> > > >>>>>>> components
> > > >> > > >>>>>>> > > compiled
> > > >> > > >>>>>>> > > > with an older version can run with a newer version
> > of
> > > >> > Flink.
> > > >> > > A
> > > >> > > >>>>>>> > recompile
> > > >> > > >>>>>>> > > > with a newer version would be required. That said,
> > if
> > > >> we do
> > > >> > > >>>>>>> care about
> > > >> > > >>>>>>> > > > this, we can just change the "supportXXX()" method
> > in
> > > >> the
> > > >> > > >>>>>>> Source
> > > >> > > >>>>>>> > > interface
> > > >> > > >>>>>>> > > > to use decorative interfaces, and leave the other
> > parts
> > > >> > > >>>>>>> unchanged.
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Thanks,
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
> > > >> > > >>>>>>> pnowojski@apache.org>
> > > >> > > >>>>>>> > > > wrote:
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > >> Hi,
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for
> > the
> > > >> last
> > > >> > > two
> > > >> > > >>>>>>> weeks.
> > > >> > > >>>>>>> > > >> Regarding the interfaces, I would be actually
> > against
> > > >> > adding
> > > >> > > >>>>>>> those
> > > >> > > >>>>>>> > > methods
> > > >> > > >>>>>>> > > >> to the base interfaces for the reasons mentioned
> > above.
> > > >> > > >>>>>>> Clogging the
> > > >> > > >>>>>>> > > base
> > > >> > > >>>>>>> > > >> interface for new users with tons of methods that
> > they
> > > >> do
> > > >> > > not
> > > >> > > >>>>>>> need, do
> > > >> > > >>>>>>> > > not
> > > >> > > >>>>>>> > > >> understand and do not know what to do with them.
> > > >> Moreover,
> > > >> > > >>>>>>> such
> > > >> > > >>>>>>> > > decorative
> > > >> > > >>>>>>> > > >> interfaces are solving a problem if a feature
> > requires
> > > >> two
> > > >> > > or
> > > >> > > >>>>>>> more
> > > >> > > >>>>>>> > > methods
> > > >> > > >>>>>>> > > >> to be implemented at the same time. If we have
> all
> > of
> > > >> the
> > > >> > > >>>>>>> methods with
> > > >> > > >>>>>>> > > >> default implementation in the base interface, the
> > API
> > > >> > > doesn't
> > > >> > > >>>>>>> give any
> > > >> > > >>>>>>> > > >> clue
> > > >> > > >>>>>>> > > >> to the user which set of methods are required to
> be
> > > >> > > >>>>>>> implemented at the
> > > >> > > >>>>>>> > > >> same
> > > >> > > >>>>>>> > > >> time.
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
> > > >> interfaces
> > > >> > is
> > > >> > > >>>>>>> which
> > > >> > > >>>>>>> > > >> interface
> > > >> > > >>>>>>> > > >> > they can decorate and which combinations of
> > multiple
> > > >> > > >>>>>>> decorative
> > > >> > > >>>>>>> > > >> interfaces
> > > >> > > >>>>>>> > > >> > are valid. (...)
> > > >> > > >>>>>>> > > >> > In the future, if there is a new feature added
> > > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware),
> are
> > we
> > > >> > going
> > > >> > > >>>>>>> to create
> > > >> > > >>>>>>> > > >> another
> > > >> > > >>>>>>> > > >> > interface of SplitReader such as
> > SortedSplitReader or
> > > >> > > >>>>>>> > > >> PrePartitionedAware?
> > > >> > > >>>>>>> > > >> > Can they be combined? So I think the additional
> > > >> > decorative
> > > >> > > >>>>>>> interface
> > > >> > > >>>>>>> > > >> like
> > > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
> > > >> understanding
> > > >> > > >>>>>>> cost of
> > > >> > > >>>>>>> > users
> > > >> > > >>>>>>> > > >> > because they have to know what decorative
> > interfaces
> > > >> are
> > > >> > > >>>>>>> there,
> > > >> > > >>>>>>> > which
> > > >> > > >>>>>>> > > >> > interface they can decorate and which
> > combinations of
> > > >> > the
> > > >> > > >>>>>>> decorative
> > > >> > > >>>>>>> > > >> > interfaces are valid and which are not. Ideally
> > we
> > > >> want
> > > >> > to
> > > >> > > >>>>>>> avoid
> > > >> > > >>>>>>> > that.
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> I'm not sure if I understand how embedding
> default
> > > >> methods
> > > >> > > in
> > > >> > > >>>>>>> the base
> > > >> > > >>>>>>> > > >> interface is solving the problem: what can be
> > combined
> > > >> or
> > > >> > > >>>>>>> not? If
> > > >> > > >>>>>>> > there
> > > >> > > >>>>>>> > > >> are
> > > >> > > >>>>>>> > > >> two conflicting features, having decorative
> > interfaces
> > > >> > that
> > > >> > > >>>>>>> can not be
> > > >> > > >>>>>>> > > >> mixed together actually makes much more sense to
> me
> > > >> rather
> > > >> > > >>>>>>> than having
> > > >> > > >>>>>>> > > >> them
> > > >> > > >>>>>>> > > >> all in one base class. How would you allow users
> to
> > > >> > > implement
> > > >> > > >>>>>>> only one
> > > >> > > >>>>>>> > > of
> > > >> > > >>>>>>> > > >> those two features?
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are
> > drawbacks:
> > > >> > > >>>>>>> > > >> - how can a user discover what decorators are
> > there?
> > > >> > > >>>>>>> > > >> - how can a user know where the decorator can be
> > > >> applied
> > > >> > to?
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> However those are drawbacks for more power users,
> > that
> > > >> can
> > > >> > > be
> > > >> > > >>>>>>> > mitigated
> > > >> > > >>>>>>> > > by
> > > >> > > >>>>>>> > > >> the documentation. For example listing all of the
> > > >> > decorators
> > > >> > > >>>>>>> with
> > > >> > > >>>>>>> > > >> detailed explanation both in the docs and in the
> > java
> > > >> > docs.
> > > >> > > >>>>>>> More
> > > >> > > >>>>>>> > > >> experienced users will be able to deal with those
> > > >> issues
> > > >> > > >>>>>>> easier, as
> > > >> > > >>>>>>> > they
> > > >> > > >>>>>>> > > >> will already have some basic understanding of
> > Flink.
> > > >> Also
> > > >> > if
> > > >> > > >>>>>>> user has
> > > >> > > >>>>>>> > a
> > > >> > > >>>>>>> > > >> problem that he wants to solve, he will google
> > search a
> > > >> > > >>>>>>> potential
> > > >> > > >>>>>>> > > solution
> > > >> > > >>>>>>> > > >> to his problem anyway, and while doing that he is
> > very
> > > >> > > likely
> > > >> > > >>>>>>> to
> > > >> > > >>>>>>> > > discover
> > > >> > > >>>>>>> > > >> the decorator that he needs anyway in the docs.
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> The advantages are:
> > > >> > > >>>>>>> > > >> - clean and easy to implement base interface
> > > >> > > >>>>>>> > > >> - all of the methods from a single feature are
> > grouped
> > > >> in
> > > >> > a
> > > >> > > >>>>>>> single
> > > >> > > >>>>>>> > > >> decorator interface, together with their
> dedicated
> > java
> > > >> > doc.
> > > >> > > >>>>>>> It's also
> > > >> > > >>>>>>> > > >> easier to google search for help using the
> > decorator
> > > >> name
> > > >> > > >>>>>>> > > >> - if an optional feature requires two methods to
> be
> > > >> > > >>>>>>> implemented at
> > > >> > > >>>>>>> > once,
> > > >> > > >>>>>>> > > >> decorator can guarantee that
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> Here I would strongly lean towards making life
> > easier
> > > >> for
> > > >> > > new
> > > >> > > >>>>>>> users,
> > > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
> > expense
> > > >> > for
> > > >> > > >>>>>>> the power
> > > >> > > >>>>>>> > > >> users.
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> Best,
> > > >> > > >>>>>>> > > >> Piotrek
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> > > >> becket.qin@gmail.com
> > > >> > >
> > > >> > > >>>>>>> > napisał(a):
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > I think Sebastion has a good summary. This is a
> > > >> really
> > > >> > > >>>>>>> helpful
> > > >> > > >>>>>>> > > >> discussion.
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might still
> > be
> > > >> > better
> > > >> > > >>>>>>> to add the
> > > >> > > >>>>>>> > > >> > supportsXXX() method in the Source rather than
> > > >> > > SourceReader.
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > Generally speaking, what we are trying to do
> > here is
> > > >> to
> > > >> > > let
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > Flink
> > > >> > > >>>>>>> > > >> > framework know what the Source is capable of.
> In
> > this
> > > >> > > FLIP,
> > > >> > > >>>>>>> it
> > > >> > > >>>>>>> > happens
> > > >> > > >>>>>>> > > >> to
> > > >> > > >>>>>>> > > >> > be the capability that only involves
> > SourceReader.
> > > >> But
> > > >> > in
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > future,
> > > >> > > >>>>>>> > > >> it is
> > > >> > > >>>>>>> > > >> > possible that another functionality involves
> > both the
> > > >> > > >>>>>>> > SplitEnumerator
> > > >> > > >>>>>>> > > >> and
> > > >> > > >>>>>>> > > >> > SourceReader. In that case, following the
> current
> > > >> > > approach,
> > > >> > > >>>>>>> we
> > > >> > > >>>>>>> > should
> > > >> > > >>>>>>> > > >> put
> > > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
> > SplitEnumerator
> > > >> and
> > > >> > > >>>>>>> SourceReader.
> > > >> > > >>>>>>> > > >> > Because if we only put this in the
> SourceReader,
> > then
> > > >> > the
> > > >> > > >>>>>>> JM would
> > > >> > > >>>>>>> > > have
> > > >> > > >>>>>>> > > >> to
> > > >> > > >>>>>>> > > >> > create a SourceReader in order to know whether
> > this
> > > >> > > feature
> > > >> > > >>>>>>> is
> > > >> > > >>>>>>> > > >> supported,
> > > >> > > >>>>>>> > > >> > which is a little ugly. But if we put the
> > > >> > "supportsXXX()"
> > > >> > > >>>>>>> method in
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > >> > Source, we will break the "symmetric" design
> > because
> > > >> > this
> > > >> > > >>>>>>> FLIP
> > > >> > > >>>>>>> > chose a
> > > >> > > >>>>>>> > > >> > different way.
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > This is also why I think supportsXXX() method
> > seems a
> > > >> > good
> > > >> > > >>>>>>> thing to
> > > >> > > >>>>>>> > > >> have,
> > > >> > > >>>>>>> > > >> > because when there are a few interfaces /
> methods
> > > >> that
> > > >> > are
> > > >> > > >>>>>>> expected
> > > >> > > >>>>>>> > to
> > > >> > > >>>>>>> > > >> be
> > > >> > > >>>>>>> > > >> > implemented at the same time in order to
> deliver
> > a
> > > >> > > feature,
> > > >> > > >>>>>>> it is
> > > >> > > >>>>>>> > > always
> > > >> > > >>>>>>> > > >> > good to have a single source of truth to tell
> the
> > > >> > > framework
> > > >> > > >>>>>>> what to
> > > >> > > >>>>>>> > > do,
> > > >> > > >>>>>>> > > >> so
> > > >> > > >>>>>>> > > >> > the framework can do consistent things in
> > different
> > > >> > parts.
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> > > >> AlignedSourceReader
> > > >> > +
> > > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about
> is
> > > >> that we
> > > >> > > are
> > > >> > > >>>>>>> > > essentially
> > > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
> > > >> implementations to
> > > >> > > >>>>>>> extend
> > > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
> > > >> > SourceReader
> > > >> > > >>>>>>> interface
> > > >> > > >>>>>>> > > >> > without the pausing support useless. So this
> > > >> indicates
> > > >> > > that
> > > >> > > >>>>>>> public
> > > >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for
> the
> > same
> > > >> > > >>>>>>> purpose) only
> > > >> > > >>>>>>> > > >> > make sense if the original interface is also
> > > >> expected to
> > > >> > > be
> > > >> > > >>>>>>> used.
> > > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add the
> > > >> method
> > > >> > to
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > original
> > > >> > > >>>>>>> > > >> > interface itself.
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > Cheers,
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid
> Wysakowicz
> > <
> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > >> > wrote:
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > I think most of your points aligned with the
> > > >> > suggestions
> > > >> > > >>>>>>> I made to
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I
> > believe
> > > >> we
> > > >> > > hit
> > > >> > > >>>>>>> enter
> > > >> > > >>>>>>> > > >> nearly at
> > > >> > > >>>>>>> > > >> > > the same time ;) )
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Two points after we synced offline
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > 1. I changed also the
> > > >> supportsWatermarksSplitAlignment
> > > >> > > to
> > > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the general
> > > >> > capability
> > > >> > > of
> > > >> > > >>>>>>> > pausing.
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > 2. As for if we should
> > > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > >> > > >>>>>>> > > (option
> > > >> > > >>>>>>> > > >> b)
> > > >> > > >>>>>>> > > >> > > or if we should just add the methods (option
> > c), I
> > > >> > > >>>>>>> suggest to
> > > >> > > >>>>>>> > simply
> > > >> > > >>>>>>> > > >> add
> > > >> > > >>>>>>> > > >> > > the two methods as I felt this is much
> > preferred
> > > >> > > approach
> > > >> > > >>>>>>> Becket,
> > > >> > > >>>>>>> > > >> which
> > > >> > > >>>>>>> > > >> > > others do not object. Unless there is an
> > opposition
> > > >> > > let's
> > > >> > > >>>>>>> go with
> > > >> > > >>>>>>> > > this
> > > >> > > >>>>>>> > > >> > > option c.
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Best,
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Dawid
> > > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis
> wrote:
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Hi folks,
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for
> > all
> > > >> the
> > > >> > > >>>>>>> input and
> > > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some
> > time
> > > >> to
> > > >> > > >>>>>>> catch up and
> > > >> > > >>>>>>> > > >> > structure
> > > >> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted to
> > find a
> > > >> way
> > > >> > > to
> > > >> > > >>>>>>> > structure
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the feeling
> > that
> > > >> some
> > > >> > > >>>>>>> concerns
> > > >> > > >>>>>>> > got
> > > >> > > >>>>>>> > > >> lost
> > > >> > > >>>>>>> > > >> > in
> > > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt and
> please
> > > >> correct
> > > >> > > me
> > > >> > > >>>>>>> if
> > > >> > > >>>>>>> > > >> something is
> > > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect
> and
> > > >> > assemble
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > > opinions,
> > > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best of
> my
> > > >> > > >>>>>>> knowledge):
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > # Top A: Should split alignment (pause/resume
> > > >> > behavior)
> > > >> > > >>>>>>> be a
> > > >> > > >>>>>>> > general
> > > >> > > >>>>>>> > > >> > > capability?
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > I personally don't see any reason no to have
> > it a
> > > >> > > general
> > > >> > > >>>>>>> > capability
> > > >> > > >>>>>>> > > >> > > because for the alignSplit method it is
> > actually
> > > >> > > >>>>>>> independent of
> > > >> > > >>>>>>> > the
> > > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a
> > general
> > > >> > > >>>>>>> capability, we
> > > >> > > >>>>>>> > > >> should
> > > >> > > >>>>>>> > > >> > > also agree on the right wording. Does
> > > >> > > >>>>>>> "alignSplits(splitsToResume,
> > > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then
> actually
> > > >> meant?
> > > >> > (I
> > > >> > > >>>>>>> see it as
> > > >> > > >>>>>>> > > >> okay.
> > > >> > > >>>>>>> > > >> > I
> > > >> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid
> > suggested
> > > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
> > enable/disable
> > > >> split
> > > >> > > >>>>>>> alignment?
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > I would personally not disable the split
> > alignment
> > > >> on
> > > >> > > the
> > > >> > > >>>>>>> source
> > > >> > > >>>>>>> > > >> reader
> > > >> > > >>>>>>> > > >> > > side because if split alignment is used for
> > some
> > > >> other
> > > >> > > >>>>>>> use case
> > > >> > > >>>>>>> > (see
> > > >> > > >>>>>>> > > >> A)
> > > >> > > >>>>>>> > > >> > it
> > > >> > > >>>>>>> > > >> > > could have nasty side effects on other/future
> > use
> > > >> > cases.
> > > >> > > >>>>>>> Instead,
> > > >> > > >>>>>>> > I
> > > >> > > >>>>>>> > > >> would
> > > >> > > >>>>>>> > > >> > > disable "watermark split alignment" where I
> > think
> > > >> it
> > > >> > > >>>>>>> should
> > > >> > > >>>>>>> > disable
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > > watermark-dependent trigger for split
> > alignment.
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > I find it difficult to define the scope of a
> > > >> supportsX
> > > >> > > >>>>>>> method
> > > >> > > >>>>>>> > w.r.t.
> > > >> > > >>>>>>> > > >> to
> > > >> > > >>>>>>> > > >> > > the following questions: a) Where is it used?
> > and
> > > >> b)
> > > >> > > What
> > > >> > > >>>>>>> is the
> > > >> > > >>>>>>> > > >> expected
> > > >> > > >>>>>>> > > >> > > output? To b), it's not straight-forward to
> > > >> provide a
> > > >> > > >>>>>>> meaningful
> > > >> > > >>>>>>> > > >> output,
> > > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split
> alignment
> > but
> > > >> > > >>>>>>> SplitReader
> > > >> > > >>>>>>> > not.
> > > >> > > >>>>>>> > > >> This
> > > >> > > >>>>>>> > > >> > is
> > > >> > > >>>>>>> > > >> > > because with the current implementation, we
> can
> > > >> > > determine
> > > >> > > >>>>>>> whether
> > > >> > > >>>>>>> > > >> split
> > > >> > > >>>>>>> > > >> > > alignment is fully supported only during
> > runtime
> > > >> and
> > > >> > > >>>>>>> specifically
> > > >> > > >>>>>>> > > >> > actually
> > > >> > > >>>>>>> > > >> > > only when calling alignSplits down the call
> > > >> hierarchy
> > > >> > up
> > > >> > > >>>>>>> to the
> > > >> > > >>>>>>> > > actual
> > > >> > > >>>>>>> > > >> > > SplitReaders.
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either raise an
> > > >> error or
> > > >> > > >>>>>>> warning if
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > >> > > alignment is called but not supported at some
> > > >> point. I
> > > >> > > >>>>>>> know we
> > > >> > > >>>>>>> > > should
> > > >> > > >>>>>>> > > >> > > carefully think about when this could be the
> > case
> > > >> > > because
> > > >> > > >>>>>>> we don't
> > > >> > > >>>>>>> > > >> want
> > > >> > > >>>>>>> > > >> > to
> > > >> > > >>>>>>> > > >> > > flood anybody with such warnings. However,
> > warnings
> > > >> > > could
> > > >> > > >>>>>>> be an
> > > >> > > >>>>>>> > > >> indicator
> > > >> > > >>>>>>> > > >> > > for the user that for watermark split
> > alignment use
> > > >> > case
> > > >> > > >>>>>>> split
> > > >> > > >>>>>>> > > >> reading is
> > > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to either
> > disable
> > > >> the
> > > >> > > >>>>>>> trigger for
> > > >> > > >>>>>>> > > >> > watermark
> > > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
> > use/implement a
> > > >> > source
> > > >> > > >>>>>>> and
> > > >> > > >>>>>>> > reader
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > > fully supports split alignment.
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Thanks for structuring the discussion with
> the
> > the
> > > >> > > various
> > > >> > > >>>>>>> > > >> possibilities
> > > >> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I
> would
> > > >> like to
> > > >> > > >>>>>>> summarize
> > > >> > > >>>>>>> > the
> > > >> > > >>>>>>> > > >> > > following requirements:
> > > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
> > ("symmetric"),
> > > >> i.e.,
> > > >> > > >>>>>>> similar
> > > >> > > >>>>>>> > > >> semantics
> > > >> > > >>>>>>> > > >> > > should have similar interfaces with similar
> > usage.
> > > >> > > >>>>>>> > > >> > > - Make explicit which implementations
> implement
> > > >> > > >>>>>>> interfaces/support
> > > >> > > >>>>>>> > > >> > > behavior.
> > > >> > > >>>>>>> > > >> > > - Make clear what are default implementations
> > and
> > > >> how
> > > >> > to
> > > >> > > >>>>>>> implement
> > > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > This is a simplified view of the relations
> > between
> > > >> > > >>>>>>> relevant
> > > >> > > >>>>>>> > classes
> > > >> > > >>>>>>> > > of
> > > >> > > >>>>>>> > > >> > the
> > > >> > > >>>>>>> > > >> > > PoC implementation:
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase
> > > >> (Internal)
> > > >> > > >>>>>>> <|-- ..
> > > >> > > >>>>>>> > <|--
> > > >> > > >>>>>>> > > >> > > MySourceReader
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager
> > (Internal)
> > > >> > <>--
> > > >> > > >>>>>>> > SplitFetcher
> > > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
> > > >> > MySplitReader
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A
> "has
> > a"
> > > >> B)
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> > SplitFetcherManager
> > > >> > > >>>>>>> implement most
> > > >> > > >>>>>>> > of
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > > "logic" for split alignment just because we
> > wanted
> > > >> to
> > > >> > > >>>>>>> implement
> > > >> > > >>>>>>> > > split
> > > >> > > >>>>>>> > > >> > > alignment and wanted it to be available as
> > kind of
> > > >> a
> > > >> > > >>>>>>> default. As a
> > > >> > > >>>>>>> > > >> > > consequence, we have a "default
> > implementation" for
> > > >> > > >>>>>>> SourceReader
> > > >> > > >>>>>>> > > that
> > > >> > > >>>>>>> > > >> > > implements the actual logic for split
> > alignment.
> > > >> For
> > > >> > > that
> > > >> > > >>>>>>> reason,
> > > >> > > >>>>>>> > I
> > > >> > > >>>>>>> > > >> find
> > > >> > > >>>>>>> > > >> > it
> > > >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
> > > >> implementation
> > > >> > in
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > > interface
> > > >> > > >>>>>>> > > >> for
> > > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence, interface
> > > >> strategy
> > > >> > > c)
> > > >> > > >>>>>>> is
> > > >> > > >>>>>>> > > difficult
> > > >> > > >>>>>>> > > >> > > because this would require NOOP default
> > > >> > implementations
> > > >> > > >>>>>>> in the
> > > >> > > >>>>>>> > > public
> > > >> > > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader.
> > This is
> > > >> > the
> > > >> > > >>>>>>> same for
> > > >> > > >>>>>>> > > >> strategy
> > > >> > > >>>>>>> > > >> > > d) because it would require NOOP default
> > > >> > implementation
> > > >> > > >>>>>>> in the
> > > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described
> > method
> > > >> > > >>>>>>> signatures of
> > > >> > > >>>>>>> > > >> alignSplit
> > > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ and
> it
> > > >> would
> > > >> > be
> > > >> > > >>>>>>> extremely
> > > >> > > >>>>>>> > > >> > difficult
> > > >> > > >>>>>>> > > >> > > to make the signatures the same (with even
> > > >> potential
> > > >> > > >>>>>>> performance
> > > >> > > >>>>>>> > > >> impact
> > > >> > > >>>>>>> > > >> > > because of additional loop-ups of split ids).
> > > >> > Therefore,
> > > >> > > >>>>>>> having a
> > > >> > > >>>>>>> > > >> > symmetric
> > > >> > > >>>>>>> > > >> > > decorative interface as of strategy a) is
> > actually
> > > >> not
> > > >> > > >>>>>>> possible
> > > >> > > >>>>>>> > and
> > > >> > > >>>>>>> > > >> > having
> > > >> > > >>>>>>> > > >> > > two decorative interfaces with different
> method
> > > >> > > >>>>>>> signatures is
> > > >> > > >>>>>>> > > >> confusing.
> > > >> > > >>>>>>> > > >> > My
> > > >> > > >>>>>>> > > >> > > conclusion is that we are best with strategy
> b)
> > > >> which
> > > >> > > >>>>>>> means to
> > > >> > > >>>>>>> > have
> > > >> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit from
> > the
> > > >> > parent
> > > >> > > >>>>>>> > interface:
> > > >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader,
> > SplitReader
> > > >> > <|--
> > > >> > > >>>>>>> > > >> > AlignedSplitReader
> > > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what the
> > > >> > > implications
> > > >> > > >>>>>>> are and
> > > >> > > >>>>>>> > if
> > > >> > > >>>>>>> > > >> this
> > > >> > > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid
> > and I
> > > >> just
> > > >> > > >>>>>>> try to
> > > >> > > >>>>>>> > > >> implement
> > > >> > > >>>>>>> > > >> > and
> > > >> > > >>>>>>> > > >> > > see if we like it. :)
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > # Summary
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > In conclusion, please let me know your
> > > >> perspectives.
> > > >> > > >>>>>>> Please
> > > >> > > >>>>>>> > correct
> > > >> > > >>>>>>> > > >> me,
> > > >> > > >>>>>>> > > >> > if
> > > >> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
> > > >> something. My
> > > >> > > >>>>>>> perspective
> > > >> > > >>>>>>> > > >> would
> > > >> > > >>>>>>> > > >> > be:
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Top A: Yes
> > > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for
> > split
> > > >> > > >>>>>>> alignment)
> > > >> > > >>>>>>> > > >> > > Top C: No
> > > >> > > >>>>>>> > > >> > > Top D: b)
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > Best,
> > > >> > > >>>>>>> > > >> > > Sebastian
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid
> > Wysakowicz <
> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > > wrote:
> > > >> > > >>>>>>> > > >> > >
> > > >> > > >>>>>>> > > >> > >> @Arvid:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> > approach, I
> > > >> > fear
> > > >> > > >>>>>>> that it
> > > >> > > >>>>>>> > > >> doesn't
> > > >> > > >>>>>>> > > >> > work
> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
> > always
> > > >> be
> > > >> > > >>>>>>> aligned
> > > >> > > >>>>>>> > > >> cross-task
> > > >> > > >>>>>>> > > >> > and
> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So
> > it's
> > > >> > > >>>>>>> plausible to put
> > > >> > > >>>>>>> > > >> > sources
> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they do
> > not
> > > >> use
> > > >> > any
> > > >> > > >>>>>>> of the
> > > >> > > >>>>>>> > > >> presented
> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
> > > >> warning, if
> > > >> > > >>>>>>> they handle
> > > >> > > >>>>>>> > > >> > multiple
> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be
> for
> > > >> telling
> > > >> > > if
> > > >> > > >>>>>>> it
> > > >> > > >>>>>>> > > supports
> > > >> > > >>>>>>> > > >> > that
> > > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment
> > would
> > > >> > always
> > > >> > > >>>>>>> be
> > > >> > > >>>>>>> > > supported.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I updated interfaces to what I believe to be
> > > >> closest
> > > >> > > to a
> > > >> > > >>>>>>> > consensus
> > > >> > > >>>>>>> > > >> > >> between all participants. Do you mind
> taking a
> > > >> look?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully
> agree
> > > >> that
> > > >> > > >>>>>>> decorator
> > > >> > > >>>>>>> > > >> interfaces
> > > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big fan
> > of
> > > >> > > >>>>>>> overloading the
> > > >> > > >>>>>>> > > base
> > > >> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as
> > is). The
> > > >> > > usual
> > > >> > > >>>>>>> feedback
> > > >> > > >>>>>>> > > to
> > > >> > > >>>>>>> > > >> > >> Source-related interfaces are always that
> > they are
> > > >> > > >>>>>>> overwhelming
> > > >> > > >>>>>>> > and
> > > >> > > >>>>>>> > > >> too
> > > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not
> > oppose
> > > >> 1c)
> > > >> > as
> > > >> > > >>>>>>> scattered
> > > >> > > >>>>>>> > > >> > interfaces
> > > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a)
> and
> > 1d).
> > > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> > approach, I
> > > >> > fear
> > > >> > > >>>>>>> that it
> > > >> > > >>>>>>> > > >> doesn't
> > > >> > > >>>>>>> > > >> > work
> > > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
> > always
> > > >> be
> > > >> > > >>>>>>> aligned
> > > >> > > >>>>>>> > > >> cross-task
> > > >> > > >>>>>>> > > >> > and
> > > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So
> > it's
> > > >> > > >>>>>>> plausible to put
> > > >> > > >>>>>>> > > >> > sources
> > > >> > > >>>>>>> > > >> > >> into an alignment group even though they do
> > not
> > > >> use
> > > >> > any
> > > >> > > >>>>>>> of the
> > > >> > > >>>>>>> > > >> presented
> > > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
> > > >> warning, if
> > > >> > > >>>>>>> they handle
> > > >> > > >>>>>>> > > >> > multiple
> > > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate
> > future
> > > >> use
> > > >> > > >>>>>>> cases makes
> > > >> > > >>>>>>> > > >> sense
> > > >> > > >>>>>>> > > >> > but
> > > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen)
> should
> > be
> > > >> > > adjusted
> > > >> > > >>>>>>> > > >> accordingly.
> > > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> > PausingSourceReader
> > > >> and
> > > >> > > I'd
> > > >> > > >>>>>>> go for
> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal
> > afaik). We
> > > >> > could
> > > >> > > >>>>>>> also
> > > >> > > >>>>>>> > split
> > > >> > > >>>>>>> > > it
> > > >> > > >>>>>>> > > >> > into
> > > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> > > >> pauseOrResumeSplits
> > > >> > > >>>>>>> may allow
> > > >> > > >>>>>>> > > >> Sources
> > > >> > > >>>>>>> > > >> > to
> > > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as
> > written
> > > >> in
> > > >> > > the
> > > >> > > >>>>>>> > Javadoc),
> > > >> > > >>>>>>> > > >> both
> > > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not
> > sure if
> > > >> > there
> > > >> > > >>>>>>> is a
> > > >> > > >>>>>>> > system
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >> can.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> > > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the
> capability
> > > >> > section:
> > > >> > > >>>>>>> Sources
> > > >> > > >>>>>>> > that
> > > >> > > >>>>>>> > > >> > adopt
> > > >> > > >>>>>>> > > >> > >> this interface cannot be used in earlier
> > > >> versions. So
> > > >> > > it
> > > >> > > >>>>>>> feels
> > > >> > > >>>>>>> > like
> > > >> > > >>>>>>> > > >> we
> > > >> > > >>>>>>> > > >> > are
> > > >> > > >>>>>>> > > >> > >> only forward compatible (old sources can be
> > used
> > > >> > after
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > change);
> > > >> > > >>>>>>> > > >> but
> > > >> > > >>>>>>> > > >> > I
> > > >> > > >>>>>>> > > >> > >> guess this holds for any API addition.
> > > >> > > >>>>>>> > > >> > >> - You might want to add what happens when
> all
> > > >> splits
> > > >> > > are
> > > >> > > >>>>>>> paused.
> > > >> > > >>>>>>> > > >> > >> - You may want to describe how the 3 flavors
> > of
> > > >> > > >>>>>>> SourceReaderBase
> > > >> > > >>>>>>> > > >> > interact
> > > >> > > >>>>>>> > > >> > >> with the interface.
> > > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to include
> > Kafka
> > > >> and
> > > >> > > >>>>>>> Pulsar in
> > > >> > > >>>>>>> > the
> > > >> > > >>>>>>> > > >> > FLIP.
> > > >> > > >>>>>>> > > >> > >> For me, this is rather immediate follow-up
> > work.
> > > >> > (could
> > > >> > > >>>>>>> be in the
> > > >> > > >>>>>>> > > >> same
> > > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Arvid
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid
> > Wysakowicz
> > > >> <
> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >> > > >>>>>>> > > >> along
> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >> > > >>>>>>> > > or
> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > AlignedSourceReader"
> > > >> > and
> > > >> > > >>>>>>> > > "MySplitReader
> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader"
> > and
> > > >> > > >>>>>>> "MySplitReader
> > > >> > > >>>>>>> > > >> > implements
> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
> Dawid
> > > >> would
> > > >> > > be:
> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader"
> > and
> > > >> > > >>>>>>> "MySplitReader
> > > >> > > >>>>>>> > > >> > implements
> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> > > >> > speaking I
> > > >> > > >>>>>>> think it
> > > >> > > >>>>>>> > is
> > > >> > > >>>>>>> > > >> > simpler
> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split reader
> > with
> > > >> > > >>>>>>> default impl.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I think that is a good idea to have it
> > aligned as
> > > >> > much
> > > >> > > as
> > > >> > > >>>>>>> > possible.
> > > >> > > >>>>>>> > > >> I'd
> > > >> > > >>>>>>> > > >> > be
> > > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
> > > >> > AlignedSplitReader
> > > >> > > >>>>>>> with
> > > >> > > >>>>>>> > > >> > SplitReader. We
> > > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
> > things in
> > > >> > the
> > > >> > > >>>>>>> future?
> > > >> > > >>>>>>> > For
> > > >> > > >>>>>>> > > >> > example,
> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits.
> This
> > > >> > feature
> > > >> > > >>>>>>> may also
> > > >> > > >>>>>>> > > need
> > > >> > > >>>>>>> > > >> an
> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> > > >> support
> > > >> > > >>>>>>> align other
> > > >> > > >>>>>>> > > >> > things, I
> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > >> > > >>>>>>> > > >> instead
> > > >> > > >>>>>>> > > >> > of
> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it would
> > > >> usually
> > > >> > be
> > > >> > > >>>>>>> more
> > > >> > > >>>>>>> > > >> extensible
> > > >> > > >>>>>>> > > >> > to
> > > >> > > >>>>>>> > > >> > >> just describe what the method actually does,
> > > >> instead
> > > >> > of
> > > >> > > >>>>>>> assuming
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > >> > >> purpose of doing this. For example, in this
> > case,
> > > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > >> > > >>>>>>> > > >> > >> would be more extensible because this can be
> > used
> > > >> for
> > > >> > > >>>>>>> any kind of
> > > >> > > >>>>>>> > > >> flow
> > > >> > > >>>>>>> > > >> > >> control, be it watermark alignment or simple
> > rate
> > > >> > > >>>>>>> limiting.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
> think
> > > >> they
> > > >> > > have
> > > >> > > >>>>>>> their
> > > >> > > >>>>>>> > own
> > > >> > > >>>>>>> > > >> > >> advantages.
> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> > > >> extending
> > > >> > > >>>>>>> abilities
> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods
> in
> > > >> this
> > > >> > > >>>>>>> case. When
> > > >> > > >>>>>>> > you
> > > >> > > >>>>>>> > > >> have
> > > >> > > >>>>>>> > > >> > a
> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
> help
> > to
> > > >> > > >>>>>>> organize them
> > > >> > > >>>>>>> > and
> > > >> > > >>>>>>> > > >> make
> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
> > > >> ability.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if this
> is
> > a
> > > >> > general
> > > >> > > >>>>>>> design
> > > >> > > >>>>>>> > > >> pattern
> > > >> > > >>>>>>> > > >> > in
> > > >> > > >>>>>>> > > >> > >> the other components in Flink. But it looks
> > like
> > > >> the
> > > >> > > >>>>>>> current API
> > > >> > > >>>>>>> > > >> > proposal
> > > >> > > >>>>>>> > > >> > >> is not symmetric.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> The current proposal is essentially
> > > >> "MySourceReader
> > > >> > > >>>>>>> implements
> > > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along
> with
> > > >> > > >>>>>>> "MySplitReader
> > > >> > > >>>>>>> > > >> implements
> > > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd
> > consider any
> > > >> > one
> > > >> > > >>>>>>> of the
> > > >> > > >>>>>>> > > >> following
> > > >> > > >>>>>>> > > >> > as
> > > >> > > >>>>>>> > > >> > >> symmetric.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >> > > >>>>>>> > > >> along
> > > >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > > >> > > >>>>>>> WithSplitsAlignment",
> > > >> > > >>>>>>> > > or
> > > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> > AlignedSourceReader"
> > > >> > and
> > > >> > > >>>>>>> > > "MySplitReader
> > > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader"
> > and
> > > >> > > >>>>>>> "MySplitReader
> > > >> > > >>>>>>> > > >> > implements
> > > >> > > >>>>>>> > > >> > >> SplitReader".
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I think the latest proposal according to
> Dawid
> > > >> would
> > > >> > > be:
> > > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader"
> > and
> > > >> > > >>>>>>> "MySplitReader
> > > >> > > >>>>>>> > > >> > implements
> > > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > >> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> > > >> > speaking I
> > > >> > > >>>>>>> think it
> > > >> > > >>>>>>> > is
> > > >> > > >>>>>>> > > >> > simpler
> > > >> > > >>>>>>> > > >> > >> to just add a new method to the split reader
> > with
> > > >> > > >>>>>>> default impl.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org>
> <
> > > >> > > >>>>>>> > dwysakowicz@apache.org
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > > >> <
> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> dwysakowicz@apache.org
> > >,
> > > >> > thanks
> > > >> > > >>>>>>> for the
> > > >> > > >>>>>>> > > reply.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> preference
> > > >> and I
> > > >> > > >>>>>>> agree most
> > > >> > > >>>>>>> > of
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> > > >> > following
> > > >> > > >>>>>>> your
> > > >> > > >>>>>>> > > >> suggestion
> > > >> > > >>>>>>> > > >> > to
> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > >> > > >>>>>>> > > >> but
> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in
> the
> > > >> Source
> > > >> > to
> > > >> > > >>>>>>> keep the
> > > >> > > >>>>>>> > > two
> > > >> > > >>>>>>> > > >> > >> methods together.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in
> > Source
> > > >> is
> > > >> > > >>>>>>> that this
> > > >> > > >>>>>>> > > allows
> > > >> > > >>>>>>> > > >> > some
> > > >> > > >>>>>>> > > >> > >> compile time check. For example, if a user
> > enabled
> > > >> > > >>>>>>> watermark
> > > >> > > >>>>>>> > > >> alignment
> > > >> > > >>>>>>> > > >> > >> while it is not supported by the Source, an
> > > >> exception
> > > >> > > >>>>>>> can be
> > > >> > > >>>>>>> > thrown
> > > >> > > >>>>>>> > > >> at
> > > >> > > >>>>>>> > > >> > >> compile time. It seems in general useful.
> That
> > > >> said,
> > > >> > I
> > > >> > > >>>>>>> agree that
> > > >> > > >>>>>>> > > API
> > > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the two
> > > >> methods
> > > >> > > >>>>>>> together.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
> > > >> > > >>>>>>> imjark@gmail.com> <
> > > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
> > > >> imjark@gmail.com>
> > > >> > > >>>>>>> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182.
> > Sorry I
> > > >> did
> > > >> > > >>>>>>> miss it.
> > > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >> Jark
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
> > Wysakowicz <
> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > >> > <dw...@apache.org> <
> dwysakowicz@apache.org>
> > <
> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Jark:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
> > watermarks
> > > >> > when
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > source
> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
> > > >> alignment
> > > >> > > even
> > > >> > > >>>>>>> if Kafka
> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> somehow. I
> > > >> agree
> > > >> > > >>>>>>> with Becket
> > > >> > > >>>>>>> > > >> > >> we may need a
> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users
> to
> > > >> > > configure
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > source
> > > >> > > >>>>>>> > > >> to
> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > > >> > > maxDesiredWatermark?
> > > >> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark
> > will
> > > >> > > >>>>>>> greatly affect
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> throughput
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
> users
> > > >> > > configure
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > > >> alignment
> > > >> > > >>>>>>> > > >> > >> offset?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1]
> which
> > has
> > > >> > been
> > > >> > > >>>>>>> already
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> implemented
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with
> the
> > > >> > > watermark
> > > >> > > >>>>>>> strategy
> > > >> > > >>>>>>> > > >> which
> > > >> > > >>>>>>> > > >> > >> also configures drift and update interval.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> > > >> support
> > > >> > > >>>>>>> align other
> > > >> > > >>>>>>> > > >> things,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > >> > > >>>>>>> > > >> instead
> > > >> > > >>>>>>> > > >> > of
> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I understand your point. On the other hand
> > putting
> > > >> > all
> > > >> > > >>>>>>> methods,
> > > >> > > >>>>>>> > > even
> > > >> > > >>>>>>> > > >> > with
> > > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain
> > > >> features,
> > > >> > > >>>>>>> makes the
> > > >> > > >>>>>>> > > entry
> > > >> > > >>>>>>> > > >> > >> threshold for writing a new source higher.
> > > >> Instead of
> > > >> > > >>>>>>> focusing on
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> basic
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> and required properties of the Source, the
> > person
> > > >> > > >>>>>>> implementing a
> > > >> > > >>>>>>> > > >> source
> > > >> > > >>>>>>> > > >> > >> must bother with and need to figure out what
> > all
> > > >> of
> > > >> > the
> > > >> > > >>>>>>> extra
> > > >> > > >>>>>>> > > >> features
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> are
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> about and how to deal with them. It makes it
> > also
> > > >> > > harder
> > > >> > > >>>>>>> to
> > > >> > > >>>>>>> > > organize
> > > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a
> preference
> > > >> and I
> > > >> > > >>>>>>> agree most
> > > >> > > >>>>>>> > of
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> > > >> > following
> > > >> > > >>>>>>> your
> > > >> > > >>>>>>> > > >> suggestion
> > > >> > > >>>>>>> > > >> > to
> > > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > >> > > >>>>>>> > > >> but
> > > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in
> the
> > > >> Source
> > > >> > to
> > > >> > > >>>>>>> keep the
> > > >> > > >>>>>>> > > two
> > > >> > > >>>>>>> > > >> > >> methods together.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
> > > >> > > >>>>>>> "alignSplits"
> > > >> > > >>>>>>> > methods
> > > >> > > >>>>>>> > > >> > differ
> > > >> > > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader.
> The
> > > >> reason
> > > >> > > for
> > > >> > > >>>>>>> that is
> > > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds,
> > whereas
> > > >> > > >>>>>>> SplitReader
> > > >> > > >>>>>>> > > needs
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
> > > >> discrepancy
> > > >> > > >>>>>>> acceptable
> > > >> > > >>>>>>> > for
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > >> sake of simplifying changes significantly,
> > > >> especially
> > > >> > > as
> > > >> > > >>>>>>> they
> > > >> > > >>>>>>> > would
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> highly
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> likely impact performance as we would have
> to
> > > >> perform
> > > >> > > >>>>>>> additional
> > > >> > > >>>>>>> > > >> > lookups.
> > > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary
> > interface.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> [1]
> > https://cwiki.apache.org/confluence/x/hQYBCw
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I
> > missed
> > > >> > > >>>>>>> something).
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
> > watermarks
> > > >> > when
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > source
> > > >> > > >>>>>>> > > >> > >> implements the interface?
> > > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
> > > >> alignment
> > > >> > > even
> > > >> > > >>>>>>> if Kafka
> > > >> > > >>>>>>> > > >> > >> implements the interface,
> > > >> > > >>>>>>> > > >> > >> and this will affect the throughput
> somehow. I
> > > >> agree
> > > >> > > >>>>>>> with Becket
> > > >> > > >>>>>>> > > >> > >> we may need a
> > > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users
> to
> > > >> > > configure
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > source
> > > >> > > >>>>>>> > > >> to
> > > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > > >> > > maxDesiredWatermark?
> > > >> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark
> > will
> > > >> > > >>>>>>> greatly affect
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> throughput
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can
> users
> > > >> > > configure
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > > >> alignment
> > > >> > > >>>>>>> > > >> > >> offset?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
> > things in
> > > >> > the
> > > >> > > >>>>>>> future?
> > > >> > > >>>>>>> > For
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> example,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> align read speed, I have
> > > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits.
> This
> > > >> > feature
> > > >> > > >>>>>>> may also
> > > >> > > >>>>>>> > > need
> > > >> > > >>>>>>> > > >> an
> > > >> > > >>>>>>> > > >> > >> interface like this.
> > > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> > > >> support
> > > >> > > >>>>>>> align other
> > > >> > > >>>>>>> > > >> things,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >> > > >>>>>>> `alignWatermarksForSplits`
> > > >> > > >>>>>>> > > >> instead
> > > >> > > >>>>>>> > > >> > of
> > > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I
> think
> > > >> they
> > > >> > > have
> > > >> > > >>>>>>> their
> > > >> > > >>>>>>> > own
> > > >> > > >>>>>>> > > >> > >> advantages.
> > > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> > > >> extending
> > > >> > > >>>>>>> abilities
> > > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods
> in
> > > >> this
> > > >> > > >>>>>>> case. When
> > > >> > > >>>>>>> > you
> > > >> > > >>>>>>> > > >> have
> > > >> > > >>>>>>> > > >> > a
> > > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can
> help
> > to
> > > >> > > >>>>>>> organize them
> > > >> > > >>>>>>> > and
> > > >> > > >>>>>>> > > >> make
> > > >> > > >>>>>>> > > >> > >> users clear which methods
> > > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
> > > >> ability.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >> Jark
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > >> > > >>>>>>> > <
> > > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
> > > >> > > >>>>>>> becket.qin@gmail.com>
> > > >> > > >>>>>>> > <
> > > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies that I
> > > >> somehow
> > > >> > > >>>>>>> misread a
> > > >> > > >>>>>>> > > bunch
> > > >> > > >>>>>>> > > >> of
> > > >> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Regarding 1, by default implementation, I
> was
> > > >> > thinking
> > > >> > > >>>>>>> of the
> > > >> > > >>>>>>> > > default
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> no-op
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> implementation. I am a little worried about
> > the
> > > >> > > >>>>>>> proliferation of
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> decorative
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> interfaces. I think the most important thing
> > about
> > > >> > > >>>>>>> interfaces is
> > > >> > > >>>>>>> > > that
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> they
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> are easy to understand. In this case, I
> prefer
> > > >> adding
> > > >> > > >>>>>>> new method
> > > >> > > >>>>>>> > to
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > >> existing interface for the following
> reasons:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of decorative
> > > >> > interfaces
> > > >> > > >>>>>>> is which
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> interface
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> they can decorate and which combinations of
> > > >> multiple
> > > >> > > >>>>>>> decorative
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> interfaces
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> > > >> > withSplitsAlignment
> > > >> > > >>>>>>> interface
> > > >> > > >>>>>>> > > is
> > > >> > > >>>>>>> > > >> > only
> > > >> > > >>>>>>> > > >> > >> applicable to the SourceReader which means
> it
> > > >> can't
> > > >> > > >>>>>>> decorate any
> > > >> > > >>>>>>> > > >> other
> > > >> > > >>>>>>> > > >> > >> interface. From an interface design
> > perspective, a
> > > >> > > >>>>>>> natural
> > > >> > > >>>>>>> > question
> > > >> > > >>>>>>> > > >> is
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> why
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> > > >> > > >>>>>>> "withSplitsAlignment"? And it
> > > >> > > >>>>>>> > > is
> > > >> > > >>>>>>> > > >> > also
> > > >> > > >>>>>>> > > >> > >> natural to assume that a split reader
> > implementing
> > > >> > both
> > > >> > > >>>>>>> > SplitReader
> > > >> > > >>>>>>> > > >> and
> > > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a
> > source
> > > >> > reader
> > > >> > > >>>>>>> > implementing
> > > >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works.
> > So why
> > > >> > > isn't
> > > >> > > >>>>>>> there an
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> interface
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if
> > there
> > > >> is a
> > > >> > > new
> > > >> > > >>>>>>> feature
> > > >> > > >>>>>>> > > >> added
> > > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware),
> > are
> > > >> we
> > > >> > > >>>>>>> going to
> > > >> > > >>>>>>> > create
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> another
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
> > > >> SortedSplitReader or
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Can they be combined? So I think the
> > additional
> > > >> > > >>>>>>> decorative
> > > >> > > >>>>>>> > > interface
> > > >> > > >>>>>>> > > >> > like
> > > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
> > > >> > > understanding
> > > >> > > >>>>>>> cost of
> > > >> > > >>>>>>> > > >> users
> > > >> > > >>>>>>> > > >> > >> because they have to know what decorative
> > > >> interfaces
> > > >> > > are
> > > >> > > >>>>>>> there,
> > > >> > > >>>>>>> > > which
> > > >> > > >>>>>>> > > >> > >> interface they can decorate and which
> > > >> combinations of
> > > >> > > the
> > > >> > > >>>>>>> > > decorative
> > > >> > > >>>>>>> > > >> > >> interfaces are valid and which are not.
> > Ideally we
> > > >> > want
> > > >> > > >>>>>>> to avoid
> > > >> > > >>>>>>> > > >> that.
> > > >> > > >>>>>>> > > >> > To
> > > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an
> > interface of
> > > >> > > >>>>>>> > > >> withSplitsAlignment,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> it
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> is completely OK to have it as an internal
> > > >> interface
> > > >> > > and
> > > >> > > >>>>>>> let
> > > >> > > >>>>>>> > > >> > SourceReader
> > > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader
> > with a
> > > >> > > default
> > > >> > > >>>>>>> > > >> implementation
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> of
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> no-op would help avoid logic branching in
> the
> > > >> source
> > > >> > > >>>>>>> logic,
> > > >> > > >>>>>>> > > >> especially
> > > >> > > >>>>>>> > > >> > >> given that we agree that the vast majority
> of
> > the
> > > >> > > >>>>>>> SourceReader
> > > >> > > >>>>>>> > > >> > >> implementations, if not all, would just
> extend
> > > >> from
> > > >> > the
> > > >> > > >>>>>>> > > >> > SourceReaderBase.
> > > >> > > >>>>>>> > > >> > >> That means adding a new method to the
> > interface
> > > >> would
> > > >> > > >>>>>>> effectively
> > > >> > > >>>>>>> > > >> give
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> the
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> c) A related design principle that may be
> > worth
> > > >> > > >>>>>>> discussing is how
> > > >> > > >>>>>>> > > do
> > > >> > > >>>>>>> > > >> we
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> let
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> the Source implementations tell Flink what
> > > >> capability
> > > >> > > is
> > > >> > > >>>>>>> > supported
> > > >> > > >>>>>>> > > >> and
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> what
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the most
> > > >> intuitive
> > > >> > > >>>>>>> place to me
> > > >> > > >>>>>>> > > is
> > > >> > > >>>>>>> > > >> in
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> the
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Source itself, because that is the entrance
> > of the
> > > >> > > >>>>>>> entire Source
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> connector
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> logic.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am wondering
> > if the
> > > >> > > >>>>>>> following
> > > >> > > >>>>>>> > > >> interface
> > > >> > > >>>>>>> > > >> > >> would be easier to understand by the users.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal
> > > >> interface,
> > > >> > > >>>>>>> let both
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> SourceReader
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a
> default
> > > >> no-op
> > > >> > > >>>>>>> > > implementation.
> > > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > > >> supportSplitsAlignment()"
> > > >> > > to
> > > >> > > >>>>>>> the
> > > >> > > >>>>>>> > Source
> > > >> > > >>>>>>> > > >> > >> interface, with a default implementation
> > returning
> > > >> > > >>>>>>> false. Sources
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> have
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> implemented the alignment logic can change
> > this to
> > > >> > > >>>>>>> return true,
> > > >> > > >>>>>>> > and
> > > >> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
> > > >> > SourceReader
> > > >> > > /
> > > >> > > >>>>>>> > > SplitReader
> > > >> > > >>>>>>> > > >> if
> > > >> > > >>>>>>> > > >> > >> needed.
> > > >> > > >>>>>>> > > >> > >> - In the future, if a new optional feature
> is
> > > >> going
> > > >> > to
> > > >> > > >>>>>>> be added
> > > >> > > >>>>>>> > to
> > > >> > > >>>>>>> > > >> the
> > > >> > > >>>>>>> > > >> > >> Source, and that feature requires the
> > awareness
> > > >> from
> > > >> > > >>>>>>> Flink, we
> > > >> > > >>>>>>> > can
> > > >> > > >>>>>>> > > >> add
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> more
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> What do you think?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid
> > Wysakowicz <
> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> dwysakowicz@apache.org>
> > <
> > > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> > dwysakowicz@apache.org>
> > > >> <
> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Konstantin:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> `AlignedSplitReader`
> > > >> > > interface
> > > >> > > >>>>>>> (aka the
> > > >> > > >>>>>>> > > >> stop &
> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
> > Kafka and
> > > >> > > >>>>>>> Pulsar only,
> > > >> > > >>>>>>> > > >> correct?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Correct, as far as I know though, those are
> > the
> > > >> only
> > > >> > > >>>>>>> sources
> > > >> > > >>>>>>> > which
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> consume
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> concurrently from multiple splits and thus
> > > >> alignment
> > > >> > > >>>>>>> applies.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Thomas:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > > >> > > >>>>>>> SourceReaderBase and
> > > >> > > >>>>>>> > > then
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> doing
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
> implement
> > > >> > > >>>>>>> AlignedSplitReader
> > > >> > > >>>>>>> > > >> could
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> be
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
> > > >> instead
> > > >> > be
> > > >> > > >>>>>>> added to
> > > >> > > >>>>>>> > the
> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > KafkaSourceReader) to
> > > >> > make
> > > >> > > >>>>>>> it
> > > >> > > >>>>>>> > explicit
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we
> > could
> > > >> > > >>>>>>> actually do
> > > >> > > >>>>>>> > that.
> > > >> > > >>>>>>> > > >> Given
> > > >> > > >>>>>>> > > >> > >> the actual implementation of the
> > > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > > >> > > >>>>>>> > > >> rather
> > > >> > > >>>>>>> > > >> > >> short (just a forward to the corresponding
> > method
> > > >> of
> > > >> > > >>>>>>> > SplitFetcher),
> > > >> > > >>>>>>> > > >> we
> > > >> > > >>>>>>> > > >> > >> could reimplement it in the actual source
> > > >> > > >>>>>>> implementations. This
> > > >> > > >>>>>>> > > >> solution
> > > >> > > >>>>>>> > > >> > >> has the downside though. Authors of new
> > sources
> > > >> would
> > > >> > > >>>>>>> have to do
> > > >> > > >>>>>>> > > two
> > > >> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and
> > > >> implement
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> instead of just extending
> AlignedSplitReader.
> > I
> > > >> would
> > > >> > > be
> > > >> > > >>>>>>> fine
> > > >> > > >>>>>>> > with
> > > >> > > >>>>>>> > > >> such
> > > >> > > >>>>>>> > > >> > a
> > > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Steven:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> For this part from the motivation section,
> is
> > it
> > > >> > > >>>>>>> accurate? Let's
> > > >> > > >>>>>>> > > >> assume
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> one
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
> > one of
> > > >> the
> > > >> > > >>>>>>> partition
> > > >> > > >>>>>>> > is
> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
> > watermark
> > > >> > for
> > > >> > > >>>>>>> this
> > > >> > > >>>>>>> > source
> > > >> > > >>>>>>> > > >> task
> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data
> > from
> > > >> > other
> > > >> > > >>>>>>> two Kafka
> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back
> > the
> > > >> > overall
> > > >> > > >>>>>>> > watermark.
> > > >> > > >>>>>>> > > I
> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have late
> > data
> > > >> for
> > > >> > > >>>>>>> this slow
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> partition.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> It will hold back the watermark. Watermark
> of
> > an
> > > >> > > >>>>>>> operator is the
> > > >> > > >>>>>>> > > >> minimum
> > > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I have another question about the restart.
> Say
> > > >> split
> > > >> > > >>>>>>> alignment is
> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
> > failed and
> > > >> > > >>>>>>> restored from
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > >> > last
> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is
> not
> > > >> > > >>>>>>> checkpointed,
> > > >> > > >>>>>>> > > initially
> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
> > cycle
> > > >> of
> > > >> > > >>>>>>> watermark
> > > >> > > >>>>>>> > > >> > aggregation
> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
> > corner is
> > > >> a
> > > >> > > >>>>>>> problem. Just
> > > >> > > >>>>>>> > > >> want
> > > >> > > >>>>>>> > > >> > to
> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Your understanding is correct.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> @Becket:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
> > > >> general
> > > >> > use
> > > >> > > >>>>>>> case, so
> > > >> > > >>>>>>> > > >> should
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> we
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> > > >> directly
> > > >> > > >>>>>>> instead of
> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > > >> WithSplitAssignment?
> > > >> > > We
> > > >> > > >>>>>>> can
> > > >> > > >>>>>>> > > provide
> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> > > >> compatibility
> > > >> > > >>>>>>> won't be an
> > > >> > > >>>>>>> > > >> issue.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I don't think we can provide a default
> > > >> > implementation.
> > > >> > > >>>>>>> How would
> > > >> > > >>>>>>> > we
> > > >> > > >>>>>>> > > >> do
> > > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it better
> > than
> > > >> > > having
> > > >> > > >>>>>>> an
> > > >> > > >>>>>>> > opt-in
> > > >> > > >>>>>>> > > >> > >> interface? The default implementation would
> > have
> > > >> to
> > > >> > be
> > > >> > > >>>>>>> added
> > > >> > > >>>>>>> > > >> exclusively
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> in
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the
> way
> > > >> notice
> > > >> > > >>>>>>> > > SourceReaderBase
> > > >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
> > > >> effectively
> > > >> > > all
> > > >> > > >>>>>>> > > >> implementations
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> do
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I
> > think
> > > >> it is
> > > >> > > >>>>>>> impossible
> > > >> > > >>>>>>> > to
> > > >> > > >>>>>>> > > >> > >> implement the SourceReader interface
> directly
> > by
> > > >> end
> > > >> > > >>>>>>> users.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
> interface
> > > >> > probably
> > > >> > > >>>>>>> also
> > > >> > > >>>>>>> > needs
> > > >> > > >>>>>>> > > >> some
> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
> > > >> > granularity.
> > > >> > > >>>>>>> Can you
> > > >> > > >>>>>>> > add
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >> interface change into the public interface
> > > >> section as
> > > >> > > >>>>>>> well?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> It has been added from the beginning. See
> > > >> > > >>>>>>> *AlignedSplitReader.*
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> > > >> > assignSplits
> > > >> > > >>>>>>> here,
> > > >> > > >>>>>>> > given
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> it
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> > assignments? It
> > > >> > > seems
> > > >> > > >>>>>>> > something
> > > >> > > >>>>>>> > > >> like
> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > >> adjustSplitsThrottling() is
> > > >> > > >>>>>>> more
> > > >> > > >>>>>>> > > accurate.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not
> > assign. Do
> > > >> you
> > > >> > > >>>>>>> still
> > > >> > > >>>>>>> > prefer
> > > >> > > >>>>>>> > > a
> > > >> > > >>>>>>> > > >> > >> different name for that? Personally, I am
> > open for
> > > >> > > >>>>>>> suggestions
> > > >> > > >>>>>>> > > here.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> [1]
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>>
> > > >> > >
> > > >> >
> > > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I
> > think
> > > >> the
> > > >> > > >>>>>>> motivation
> > > >> > > >>>>>>> > > >> makes a
> > > >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions /
> > questions.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
> > > >> general
> > > >> > use
> > > >> > > >>>>>>> case, so
> > > >> > > >>>>>>> > > >> should
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> we
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> > > >> directly
> > > >> > > >>>>>>> instead of
> > > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > > >> WithSplitAssignment?
> > > >> > > We
> > > >> > > >>>>>>> can
> > > >> > > >>>>>>> > > provide
> > > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> > > >> compatibility
> > > >> > > >>>>>>> won't be an
> > > >> > > >>>>>>> > > >> issue.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader
> interface
> > > >> > probably
> > > >> > > >>>>>>> also
> > > >> > > >>>>>>> > needs
> > > >> > > >>>>>>> > > >> some
> > > >> > > >>>>>>> > > >> > >> change to support throttling at the split
> > > >> > granularity.
> > > >> > > >>>>>>> Can you
> > > >> > > >>>>>>> > add
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >> interface change into the public interface
> > > >> section as
> > > >> > > >>>>>>> well?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> > > >> > assignSplits
> > > >> > > >>>>>>> here,
> > > >> > > >>>>>>> > given
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> it
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> is not actually changing the split
> > assignments? It
> > > >> > > seems
> > > >> > > >>>>>>> > something
> > > >> > > >>>>>>> > > >> like
> > > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > > >> adjustSplitsThrottling() is
> > > >> > > >>>>>>> more
> > > >> > > >>>>>>> > > accurate.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
> > > >> > > >>>>>>> stevenz3wu@gmail.com
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> > > <
> > > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
> > > >> > > >>>>>>> stevenz3wu@gmail.com>
> > > >> > > >>>>>>> > <
> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> However, a single source operator may read
> > data
> > > >> from
> > > >> > > >>>>>>> multiple
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
> > > >> partitions,
> > > >> > > such
> > > >> > > >>>>>>> that
> > > >> > > >>>>>>> > even
> > > >> > > >>>>>>> > > >> with
> > > >> > > >>>>>>> > > >> > >> watermark alignment the source operator may
> > need
> > > >> to
> > > >> > > >>>>>>> buffer
> > > >> > > >>>>>>> > > excessive
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> amount
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> of data if one split emits data faster than
> > > >> another.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> For this part from the motivation section,
> is
> > it
> > > >> > > >>>>>>> accurate? Let's
> > > >> > > >>>>>>> > > >> assume
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> one
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
> > one of
> > > >> the
> > > >> > > >>>>>>> partition
> > > >> > > >>>>>>> > is
> > > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
> > watermark
> > > >> > for
> > > >> > > >>>>>>> this
> > > >> > > >>>>>>> > source
> > > >> > > >>>>>>> > > >> task
> > > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data
> > from
> > > >> > other
> > > >> > > >>>>>>> two Kafka
> > > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back
> > the
> > > >> > overall
> > > >> > > >>>>>>> > watermark.
> > > >> > > >>>>>>> > > I
> > > >> > > >>>>>>> > > >> > >> thought the problem is that we may have late
> > data
> > > >> for
> > > >> > > >>>>>>> this slow
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> partition.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I have another question about the restart.
> Say
> > > >> split
> > > >> > > >>>>>>> alignment is
> > > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
> > failed and
> > > >> > > >>>>>>> restored from
> > > >> > > >>>>>>> > > the
> > > >> > > >>>>>>> > > >> > last
> > > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is
> not
> > > >> > > >>>>>>> checkpointed,
> > > >> > > >>>>>>> > > initially
> > > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
> > cycle
> > > >> of
> > > >> > > >>>>>>> watermark
> > > >> > > >>>>>>> > > >> > aggregation
> > > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
> > corner is
> > > >> a
> > > >> > > >>>>>>> problem. Just
> > > >> > > >>>>>>> > > >> want
> > > >> > > >>>>>>> > > >> > to
> > > >> > > >>>>>>> > > >> > >> understand it more.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas
> Weise <
> > > >> > > >>>>>>> thw@apache.org> <
> > > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <
> thw@apache.org>
> > <
> > > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > > >> > > >>>>>>> SourceReaderBase and
> > > >> > > >>>>>>> > > then
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> doing
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> nothing if the split reader does not
> implement
> > > >> > > >>>>>>> AlignedSplitReader
> > > >> > > >>>>>>> > > >> could
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> be
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
> > > >> instead
> > > >> > be
> > > >> > > >>>>>>> added to
> > > >> > > >>>>>>> > the
> > > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> > KafkaSourceReader) to
> > > >> > make
> > > >> > > >>>>>>> it
> > > >> > > >>>>>>> > explicit
> > > >> > > >>>>>>> > > >> that
> > > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Thanks,
> > > >> > > >>>>>>> > > >> > >> Thomas
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin
> > Knauf <
> > > >> > > >>>>>>> > > knaufk@apache.org>
> > > >> > > >>>>>>> > > >> <
> > > >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
> > > >> > knaufk@apache.org
> > > >> > > >
> > > >> > > >>>>>>> <
> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> As part of this FLIP, the
> `AlignedSplitReader`
> > > >> > > interface
> > > >> > > >>>>>>> (aka the
> > > >> > > >>>>>>> > > >> stop
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> &
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
> > Kafka and
> > > >> > > >>>>>>> Pulsar only,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> correct?
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to
> > > >> complete
> > > >> > the
> > > >> > > >>>>>>> watermark
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> aligned
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Cheers,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Konstantin
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid
> > Wysakowicz
> > > >> <
> > > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <
> dwysakowicz@apache.org>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I
> > support it
> > > >> ;)
> > > >> > I
> > > >> > > >>>>>>> think we
> > > >> > > >>>>>>> > can
> > > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there are no
> > > >> concerns
> > > >> > > so
> > > >> > > >>>>>>> far.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Dawid
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis
> wrote:
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> I would like to open a discussion on FLIP
> 217
> > [1]
> > > >> for
> > > >> > > an
> > > >> > > >>>>>>> > extension
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> of
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment
> also
> > in
> > > >> > > >>>>>>> SplitReaders. To
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> do
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> so,
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and
> > resume
> > > >> > reading
> > > >> > > >>>>>>> from
> > > >> > > >>>>>>> > split
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> sources
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and
> > controlls
> > > >> > > >>>>>>> suspend and
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> resume.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> To
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> gather information about current watermarks
> > of the
> > > >> > > >>>>>>> SplitReaders,
> > > >> > > >>>>>>> > we
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> extend
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and
> > report
> > > >> > > >>>>>>> watermarks to
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> the
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> SourceOperator.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped
> > by
> > > >> Arvid
> > > >> > > >>>>>>> Heise and
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> revised
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> and
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of
> > the
> > > >> > work.)
> > > >> > > >>>>>>> and me.
> > > >> > > >>>>>>> > The
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> changes
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> are backwards compatible in a way that if
> > affected
> > > >> > > >>>>>>> components do
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> not
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> support split alignment the behavior is as
> > before.
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Best,
> > > >> > > >>>>>>> > > >> > >> Sebastian
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> [1]
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>>
> > > >> > >
> > > >> >
> > > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> [2]
> > > >> > > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> --
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > >> > > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> > >>
> > > >> > > >>>>>>> > > >> >
> > > >> > > >>>>>>> > > >>
> > > >> > > >>>>>>> > > >
> > > >> > > >>>>>>> > >
> > > >> > > >>>>>>> >
> > > >> > > >>>>>>>
> > > >> > > >>>>>>
> > > >> > >
> > > >> >
> > > >>
> > > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Steven Wu <st...@gmail.com>.
Both option 1 (default impl in base interface) and option 2 (decorative
interface) are pretty common patterns. I would also be fine with either.
The important thing is that an exception is thrown if a source doesn't
support the alignment capability.

The other point is that we can validate the source capability if alignment
is enabled in WatermarkStrategy. I believe either option can achieve this
goal too.
public interface WatermarkStrategy<T> {
    WatermarkStrategy<T> withWatermarkAlignment(String watermarkGroup,
Duration maxAllowedWatermarkDrift);
}

If I have to pick one, I am slightly favoring option 1 (base interface). As
watermark is already an essential concept of source, maybe watermark
alignment capability can also be a property of the base source/reader
interface.

On Tue, Jun 14, 2022 at 3:44 PM Thomas Weise <th...@apache.org> wrote:

> Hi everyone,
>
> Thank you for all the effort that went into this discussion. The split
> level watermark alignment will be an important feature for Flink that
> will address operational problems for various use cases. From reading
> through this thread it appears that not too much remains to bring this
> FLIP to acceptance and allow development to move forward. I would like
> to contribute if possible.
>
> Regarding option 1 vs. option 2: I don't have a strong preference,
> perhaps slightly leaning towards option 1.
>
> However, from a user perspective, should the split level alignment be
> an opt-in feature, at least for a few releases? If yes, then we would
> require a knob similar to supportsPausingSplits(), which I understand
> won't be part of the revised FLIP. Such control may be beneficial:
>
> * Compare runtime behavior with split level alignment on/off
> * Allow use of sources that don't implement pausing splits yet
>
> The second point would, from the user's perspective, be necessary for
> backward compatibility? While the interface aspect and source
> compatibility has been discussed in great detail, I don't think it
> would be desirable if an application that already uses alignment fails
> after upgrading to the new Flink version, forcing users to lock step
> modify sources for the new non-optional split level alignment.
>
> So I think clarification of the compatibility aspect on the FLIP page
> would be necessary.
>
> Thanks,
> Thomas
>
> On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <pi...@gmail.com>
> wrote:
> >
> > Hi Becket,
> >
> > Thanks for summing this up. Just one correction:
> >
> > > Piotr prefers option 2, his opinions are:
> > >   e) It is OK that the code itself in option 2 indicates the developers
> > that a feature is optional. We will rely on the documentation to correct
> > that and clarify that the feature is actually obligatory.
> >
> > I would say based on a) and b) that feature would be still optional. So
> > both the implementation and the documentation would be saying that. We
> > could add a mention to the docs and release notes, that this feature will
> > be obligatory in the next major release and plan such a release
> accordingly.
> >
> > Re the option 1., as you mentioned:
> > > As for option 1: For developers, the feature is still optional due to
> the
> > default implementation in the interface, regardless of what the default
> > implementation does, because the code compiles without overriding these
> > methods
> >
> > Also importantly, the code will work in most cases.
> >
> > > Obligatory: Jobs may fail if these methods are not implemented
> properly.
> > e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in
> Java,
> > e.g. Iterator.remove() by default throws "UnsupportedOperationException",
> > informing the implementation that things may go wrong if this method is
> not
> > implemented.
> >
> > For me `Iterator#remove()` is an optional feature. Personally, I don't
> > remember if I have ever implemented it.
> >
> > Best,
> > Piotrek
> >
> > pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com> napisał(a):
> >
> > > I had an offline discussion with Piotr and here is the summary. Please
> > > correct me if I miss something, Piotr.
> > >
> > > There are two things we would like to seek more opinions from the
> > > community, so we can make progress on this FLIP.
> > >
> > > 1. The General pattern to add obligatory features to existing
> interfaces.
> > >
> > >
> ***********************************************************************************
> > > For interfaces exposed to the developers for implementation, they are
> > > either intended to be *optional* or *obligatory. *While it is quite
> clear
> > > about how to convey that intention when creating the interfaces, it is
> not
> > > as commonly agreed when we are adding new features to an existing
> > > interface. In general, Flink uses decorative interfaces when adding
> > > optional features to existing interfaces. Both Piotr and I agree that
> looks
> > > good.
> > >
> > > Different opinions are mainly about how to add obligatory features to
> the
> > > existing interfaces, probably due to different understandings of
> > > "obligatory".
> > >
> > > We have discussed about four options:
> > >
> > > *Option 1:*
> > >
> > >    - Just add a new method to the existing interface.
> > >    - For backwards compatibility, the method would have a default
> > >    implementation throwing "UnsupportedOperationException".
> > >    - In the next major version, remove the default implementation.
> > >    - For the developers, any method with a default implementation
> > >    throwing an "UnsupportedOperationException" should be taken as
> obligatory.
> > >
> > > *Option 2:*
> > >
> > >    - Always make the features optional by adding a decorative
> interface,
> > >    just like ordinary optional features.
> > >    - Inform the developers via documentation that this feature is
> > >    obligatory, although it looks like optional from the code.
> > >    - In case the developers did not implement the decorative interface,
> > >    throw an exception
> > >    - In the next major version, move the methods in the decorative
> > >    interface to the base interface, and deprecate the decorative
> interface.
> > >
> > > *Option 3:*
> > >
> > >    - Always bump the major version when a new obligatory feature is
> > >    added, even if we may have to do it frequently.
> > >
> > > *Option 4:*
> > >
> > >    - Add a V2, V3... of the interface affected by the new obligatory
> > >    feature.
> > >    - In the next major versions, deprecate old versions of the
> interfaces.
> > >
> > > Both Piotr and me agreed that option 3 and option 4 have a big side
> effect
> > > and should be avoided. We have different preference between option 1
> and
> > > option 2.
> > >
> > > Personally I prefer option 1, the reasons are:
> > >   a) simple and intuitive. Java 8 introduced the default impl in
> > > interfaces exactly for interface evolving, and this is a common
> pattern in
> > > many projects.
> > >   b) prominent to the developers that the feature is expected to be
> > > implemented, because it explicitly throws an exception in the default
> impl.
> > >   c) low maintenance overhead - the Flink framework can always assume
> the
> > > method exists, so no special handling logic is needed.
> > >   d) communicate a clear semantic boundary between optional and
> obligatory
> > > features in the Flink to the developers.
> > >       - Optional: Jobs still run without exception if these methods are
> > > not implemented. e.g. all the SupportsXXXPushDown interfaces.
> > >       - Obligatory: Jobs may fail if these methods are not implemented
> > > properly. e..g SourceReader#pauseOrResumeSplits(). This is a common
> pattern
> > > in Java, e.g. Iterator.remove() by default throws
> > > "UnsupportedOperationException", informing the implementation that
> things
> > > may go wrong if this method is not implemented.
> > >
> > > As for option 2, Although the API itself sounds clean, it misleads
> people
> > > to think of an obligatory feature to be optional - from the code the
> > > feature is optional, but the documents say it is obligatory. We
> probably
> > > should avoid such code-doc inconsistency, as people will be confused.
> And I
> > > would actually be bewildered that sometimes not implementing an
> "optional"
> > > feature is fine, but sometimes it causes the jobs to fail.
> > >
> > > In response to the argument that the method with a default
> implementation
> > > is always optional, if that is true, it actually means all the
> interfaces
> > > should be immutable once they are created. If we want to add a method
> to an
> > > existing interface, for backwards compatibility, we will have to
> provide a
> > > default implementation. And the fact it has a default implementation
> > > indicates the method is optional. If that method is optional, it should
> > > reside in a separate decorative interface, otherwise it clogs that
> existing
> > > interface. Therefore, people should never add a method to an existing
> > > interface. I find this conclusion a bit extreme.
> > >
> > > Piotr prefers option 2, his opinions are:
> > >     a) Obligatory methods are the methods that fail the code
> compilation
> > > if not implemented.
> > >     b) All obligatory methods should reside in the base interface,
> without
> > > a default implementation. And all the optional methods should be in
> > > decorative interfaces. This is a clean API.
> > >     c) due to b), there isn't a viable solution to add an obligatory
> > > method to an existing interface in a backwards compatible way. Unless
> we
> > > are OK with breaking backwards compatibility, all the interfaces
> should be
> > > treated as immutable. As a compromise, we might as well just treat all
> the
> > > features added later as optional features. This way we keep the API
> clean.
> > >     d) based on b) and c), option 2 has a clean API, while option 1
> does
> > > not.
> > >     e) It is OK that the code itself in option 2 indicates the
> developers
> > > that a feature is optional. We will rely on the documentation to
> correct
> > > that and clarify that the feature is actually obligatory.
> > >     f) Regarding the effectiveness of making people aware that the
> feature
> > > is obligatory, Option 1 and Option 2 are similar. For people that do
> not
> > > read the release note / documentation, they will mistake the feature
> to be
> > > optional anyways.
> > >
> > > As for option 1: For developers, the feature is still optional due to
> the
> > > default implementation in the interface, regardless of what the default
> > > implementation does, because the code compiles without overriding these
> > > methods. Also, another problem of this option is that for users that
> do not
> > > know about the history of the interface, they may be confused by the
> > > default implementation throwing an exception.
> > >
> > >
> > > 2. For this particular FLIP, should it be optional or not?
> > >
> > >
> ***********************************************************************************
> > > As mentioned in the previous email, I feel this FLIP should be
> obligatory,
> > > for the following reasons:
> > > 1. The Flink framework exposes the watermark alignment API to the end
> > > users. From the end users' perspective, the feature should be available
> > > regardless of the implementation details in the pluggables. This is
> true
> > > for any other methods exposed as the Flink API.
> > > 2. If a Source is not pausable, the end user should receive an
> exception
> > > when enable the watermark alignment, (both Piotr and me agree on
> this). In
> > > that case, it meets my criteria of obligatory feature because not
> > > implementing the feature causes a framework API to throw exception and
> > > fails the job.
> > >
> > > On the other hand, Piotr does not have a strong opinion regarding
> whether
> > > this feature should be optional or not.
> > >
> > >
> > > Thanks for reading through this long email. So basically in order to
> make
> > > progress on this FLIP, we want to see what do people feel about the
> above
> > > two topics.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <pn...@apache.org>
> > > wrote:
> > >
> > >> Hi Becket,
> > >>
> > >> I still sustain what I wrote before:
> > >> > I think I would still vote soft -1 on this option, but I wouldn't
> block
> > >> it in case I am out-voted.
> > >>
> > >> > I think it might be helpful to agree on the definition of optional
> in
> > >> our
> > >> case.
> > >>
> > >> For me it doesn't matter whether a default method throwing an
> exception we
> > >> call optional or non-optional. As long as we keep it this way, the
> effect
> > >> is the same. It's effectively a method that a user doesn't have to
> > >> implement. If interface/system allows some methods to be not
> implemented,
> > >> some users will do just that, regardless if we call it and document as
> > >> non-optional. And at the same time it's clogging the base interface.
> > >>
> > >> By the way, just the need for a java-doc/documentation explaining the
> > >> existence of some construct is a bad smell (code should be
> > >> self-documenting
> > >> and default method throwing an UnsupportedOperationException is not).
> > >>
> > >> > Please note that so far we do not assume whether the feature is in
> > >> > the original API or it is added later. A newly added feature can
> also be
> > >> > non-optional, although it might take some time for all the pluggable
> > >> > developers to catch up, and they should still work if the new
> feature is
> > >> > not used until they catch up. In contrast, we may never expect an
> > >> optional
> > >> > feature to catch up, because leaving it unimplemented is also
> blessed.
> > >> >
> > >> > Let's take the checkpointing as an example. Imagine Flink did not
> > >> support
> > >> > checkpointing before release 1.16. And now we are trying to add
> > >> > checkpointing to Flink. So we exposed the checkpoint configuration
> to
> > >> the
> > >> > end users. In the meantime, will we tell the pluggable (e.g.
> operators,
> > >> > connectors) developers that methods like "snapshotState()" is
> optional?
> > >> If
> > >> > we do that, the availability of checkpointing in Flink would be
> severely
> > >> > weakened. But apparently we should still allow the existing
> > >> implementations
> > >> > to work without checkpointing. It looks to me that adding the
> method to
> > >> the
> > >> > pluggable interfaces with a default implementation throwing
> > >> > "UnsupportedOperationException" would be the solution here. Please
> note
> > >> > that in this case, having the default implementation does not mean
> this
> > >> is
> > >> > optional. It is just the technique to support backwards
> compatibility in
> > >> > the feature evolution. The fact that this method is in the base
> > >> interface
> > >> > suggests it is not optional, so the developers SHOULD implement it.
> > >>
> > >> I would soft vote -1 for having the default method throwing
> > >> UnsupportedOperationException as one of thing for this (FLIP-217)
> special
> > >> circumstances.
> > >>
> > >> At the moment, without thinking this over too much, I would vote
> harder -1
> > >> for having this as a general rule when adding new features. If we
> ever end
> > >> up with an API that is littered with default methods throwing
> > >> UnsupportedOperationException that are documented as "non optional" it
> > >> would be IMO a big design failure. I would be ok-ish with that, only
> if
> > >> that was a temporary thing and we had an aggressive plan to release
> more
> > >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
> > >> compatibility, that would get rid of those default methods. Adding
> > >> checkpointing and methods like "snapshotState()" would IMO easily
> justify
> > >> a
> > >> new major Flink release. In that case we could add those methods with
> > >> default implementation for some transition period, a one or two minor
> > >> releases, followed by a clean up in a major release. However I would
> still
> > >> argue that it would be cleaner/better to add a decorative interface
> like
> > >> `CheckpointedOperator` interface instead of adding those default
> methods
> > >> to
> > >> the base `Operator` interface.
> > >>
> > >> I think I can sum up our disagreement here is that I would like to
> keep
> > >> the
> > >> interfaces simpler, with only obligatory methods/features on one side
> and
> > >> clearly optional features on the other. While you would like to add an
> > >> extra third state in between those two?
> > >>
> > >> Best,
> > >> Piotrek
> > >>
> > >>
> > >>
> > >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com>
> napisał(a):
> > >>
> > >> > Thanks for the clarification, Piotr and Sebastian.
> > >> >
> > >> > It looks like the key problem is still whether the implementation of
> > >> > pausable splits in the Sources should be optional or not.
> > >> >
> > >> > I think it might be helpful to agree on the definition of optional
> in
> > >> our
> > >> > case. To me:
> > >> > Optional = "You CAN leave the method unimplemented, and that is
> fine."
> > >> > Non-Optional = "You CAN leave the method unimplemented, but you
> SHOULD
> > >> NOT,
> > >> > because people assume this works."
> > >> >
> > >> > I think one sufficient condition of a Non-Optional feature is that
> if
> > >> the
> > >> > feature is exposed through the framework API, Flink should expect
> the
> > >> > pluggables to support this feature by default. Otherwise the
> > >> availability
> > >> > of that feature becomes undefined.
> > >> >
> > >> > Please note that so far we do not assume whether the feature is in
> > >> > the original API or it is added later. A newly added feature can
> also be
> > >> > non-optional, although it might take some time for all the pluggable
> > >> > developers to catch up, and they should still work if the new
> feature is
> > >> > not used until they catch up. In contrast, we may never expect an
> > >> optional
> > >> > feature to catch up, because leaving it unimplemented is also
> blessed.
> > >> >
> > >> > Let's take the checkpointing as an example. Imagine Flink did not
> > >> support
> > >> > checkpointing before release 1.16. And now we are trying to add
> > >> > checkpointing to Flink. So we exposed the checkpoint configuration
> to
> > >> the
> > >> > end users. In the meantime, will we tell the pluggable (e.g.
> operators,
> > >> > connectors) developers that methods like "snapshotState()" is
> optional?
> > >> If
> > >> > we do that, the availability of checkpointing in Flink would be
> severely
> > >> > weakened. But apparently we should still allow the existing
> > >> implementations
> > >> > to work without checkpointing. It looks to me that adding the
> method to
> > >> the
> > >> > pluggable interfaces with a default implementation throwing
> > >> > "UnsupportedOperationException" would be the solution here. Please
> note
> > >> > that in this case, having the default implementation does not mean
> this
> > >> is
> > >> > optional. It is just the technique to support backwards
> compatibility in
> > >> > the feature evolution. The fact that this method is in the base
> > >> interface
> > >> > suggests it is not optional, so the developers SHOULD implement it.
> > >> >
> > >> > When it comes to this FLIP, I think it meets the criteria of
> > >> non-optional
> > >> > features, so we should just use the evolution path of non-optional
> > >> > features.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> >
> > >> >
> > >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <
> pnowojski@apache.org>
> > >> > wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > Actually previously I thought about having a decorative interface
> and
> > >> > > whenever watermark alignment is enabled, checking that the source
> > >> > > implements the decorative interface. If not, throwing an
> exception.
> > >> > >
> > >> > > The option with default methods in the source interfaces throwing
> > >> > > `UnsupportedOperationException` I think still suffers from the
> same
> > >> > > problems I mentioned before. It's still an optional implementation
> > >> and at
> > >> > > the same time it's clogging the base interface. I think I would
> still
> > >> > vote
> > >> > > soft -1 on this option, but I wouldn't block it in case I am
> > >> out-voted.
> > >> > >
> > >> > > Best,
> > >> > > Piotrek
> > >> > >
> > >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <
> sebastian@ververica.com>
> > >> > > napisał(a):
> > >> > >
> > >> > > > Hi Becket,
> > >> > > >
> > >> > > > Thanks a lot for your fast and detailed response. For me, it
> > >> converges
> > >> > > and
> > >> > > > dropping the supportsX method sounds very reasonable to me.
> (Side
> > >> note:
> > >> > > > With "pausable splits" enabled as "default" I think we
> > >> misunderstood.
> > >> > As
> > >> > > > you described now "default" I understand as that it should be
> the
> > >> new
> > >> > > > recommended way of implementation, and I think that is fully
> valid.
> > >> > > Before,
> > >> > > > I understood "default" here as the default implementation, i.e.,
> > >> > throwing
> > >> > > > UnsupportedOperationException, which is the exact opposite. :) )
> > >> > > >
> > >> > > > Nevertheless: As mentioned, an open question for me is if
> watermark
> > >> > > > alignment should enforce pausable splits. For clarification, the
> > >> > current
> > >> > > > documentation [1] says:
> > >> > > >
> > >> > > > *Note:* As of 1.15, Flink supports aligning across tasks of the
> same
> > >> > > >> source and/or different sources. It does not support aligning
> > >> > > >> splits/partitions/shards in the same task.
> > >> > > >>
> > >> > > >> In a case where there are e.g. two Kafka partitions that
> produce
> > >> > > >> watermarks at different pace, that get assigned to the same
> task
> > >> > > watermark
> > >> > > >> might not behave as expected. Fortunately, worst case it
> should not
> > >> > > perform
> > >> > > >> worse than without alignment.
> > >> > > >>
> > >> > > >> Given the limitation above, we suggest applying watermark
> > >> alignment in
> > >> > > >> two situations:
> > >> > > >>
> > >> > > >>    1. You have two different sources (e.g. Kafka and File) that
> > >> > produce
> > >> > > >>    watermarks at different speeds
> > >> > > >>    2. You run your source with parallelism equal to the number
> of
> > >> > > >>    splits/shards/partitions, which results in every subtask
> being
> > >> > > assigned a
> > >> > > >>    single unit of work.
> > >> > > >>
> > >> > > >> I personally see no issue in implementing and I see no reason
> > >> against
> > >> > > > implementing this dependency of watermark alignment and pausable
> > >> > splits.
> > >> > > (I
> > >> > > > think this would even be a good path towards shaping watermark
> > >> > alignment
> > >> > > in
> > >> > > > 1.16.) However, "I don't see" means that I would be happy to
> hear
> > >> > Dawid's
> > >> > > > and Piotrek's opinions as they implemented watermark alignment
> > >> based on
> > >> > > > FLIP-182 [2] and I don't want to miss relevant
> rationale/background
> > >> > info
> > >> > > > from their side.
> > >> > > >
> > >> > > > *@Piotrek* *@Dawid *What do you think?
> > >> > > >
> > >> > > > Regards,
> > >> > > > Sebastian
> > >> > > >
> > >> > > > [1]
> > >> > > >
> > >> > >
> > >> >
> > >>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > >> > > > [2]
> > >> > > >
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > >> > > >
> > >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <
> becket.qin@gmail.com>
> > >> > wrote:
> > >> > > >
> > >> > > >> +dev
> > >> > > >>
> > >> > > >> Hi Sebastian,
> > >> > > >>
> > >> > > >> Thank you for the summary. Please see the detailed replies
> inline.
> > >> As
> > >> > a
> > >> > > >> recap of my suggestions.
> > >> > > >>
> > >> > > >> 1. Pausable splits API.
> > >> > > >>   a) Add default implementations to methods
> "pauseOrResumeSplits"
> > >> in
> > >> > > both
> > >> > > >> SourceReader and SplitReader where both default implementations
> > >> throw
> > >> > > >>  UnsupportedOperationException.
> > >> > > >>
> > >> > > >> 2. User story.
> > >> > > >>     a) We tell users to enable the watermark alignment as they
> > >> like.
> > >> > > This
> > >> > > >> is exactly what the current Flink API is.
> > >> > > >>     b) We tell the source developers, please implement pausable
> > >> > splits,
> > >> > > >> otherwise bad things may happen. Think of it like you are
> expected
> > >> to
> > >> > > >> implement SourceReader#snapshotState() properly, otherwise
> > >> exceptions
> > >> > > will
> > >> > > >> be thrown when users enable checkpointing.
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >>
> > >> > > >> Jiangjie (Becket) Qin
> > >> > > >>
> > >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> > >> > > >> sebastian@ververica.com> wrote:
> > >> > > >>
> > >> > > >>> Hi Becket, Hi everybody,
> > >> > > >>>
> > >> > > >>> I'm sorry if I misread the messages but I could not derive an
> > >> > agreement
> > >> > > >>> from the mailing list. Nevertheless, if I understand you
> right the
> > >> > > >>> suggestion is:
> > >> > > >>>
> > >> > > >>> * Add default implementations to methods
> "pauseOrResumeSplits" in
> > >> > both
> > >> > > >>> SourceReader and SplitReader where both default
> implementations
> > >> throw
> > >> > > >>> UnsupportedOperationException.
> > >> > > >>>
> > >> > > >> Yes.
> > >> > > >>
> > >> > > >> * Add "supportsPauseOrResumeSplits" to the Source interface.
> (In
> > >> the
> > >> > > >>> following, I refer to supporting this as "pausable splits".)
> > >> > > >>>
> > >> > > >> We may no longer need this if pausable splits are expected to
> be
> > >> > > >> implemented by the source developers, i.e. non-optional. Having
> > >> this
> > >> > > method
> > >> > > >> would then be somewhat misleading as it looks like the sources
> > >> that do
> > >> > > not
> > >> > > >> support pausable splits are also acceptable in the long term.
> So
> > >> API
> > >> > > wise,
> > >> > > >> I'd say maybe we should remove this for this FLIP, although I
> > >> believe
> > >> > > this
> > >> > > >> supportXXX pattern itself is still attractive for optional
> > >> features.
> > >> > > >>
> > >> > > >>
> > >> > > >>>
> > >> > > >>> To make the conclusions explicit:
> > >> > > >>>
> > >> > > >>> 1. The implementation of pauseOrResumeSplits in both
> interfaces
> > >> > > >>> SourceReader and SplitReader are optional where the default is
> > >> that
> > >> > it
> > >> > > >>> doesn't support it. (--> This means that the implementation is
> > >> still
> > >> > > >>> optional for the source developer.)
> > >> > > >>>
> > >> > > >> It is optional for backwards compatibility with existing
> sources,
> > >> as
> > >> > > they
> > >> > > >> may still compile without code change. But starting from this
> FLIP,
> > >> > > Flink
> > >> > > >> will always optimistically assume that all the sources support
> > >> > pausable
> > >> > > >> splits. If a source does not support pausable splits, it goes
> to an
> > >> > > error
> > >> > > >> handling path when watermark alignment is enabled on it. This
> is
> > >> > > different
> > >> > > >> from a usual optional feature, where no error is expected.
> > >> > > >>
> > >> > > >>
> > >> > > >>> 2. If watermark alignment is enabled in the application code
> by
> > >> > adding
> > >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
> > >> SourceReader or
> > >> > > >>> SplitReader do not support pausableSplits, we throw an
> > >> > > >>> UnsupportedOperationException.
> > >> > > >>>
> > >> > > >> Yes.
> > >> > > >>
> > >> > > >>
> > >> > > >>> 3. With regard to your statement:
> > >> > > >>>
> > >> > > >>>> [...] basically means watermark alignment is an non-optional
> > >> feature
> > >> > > to
> > >> > > >>>> the end users.
> > >> > > >>>
> > >> > > >>> You actually mean that "pausable splits" are non-optional for
> the
> > >> app
> > >> > > >>> developer if watermark alignment is enabled. However,
> watermark
> > >> > > alignment
> > >> > > >>> is optional and can be enabled/disabled.
> > >> > > >>>
> > >> > > >> Yes, watermark alignment can be enabled/disabled in individual
> > >> sources
> > >> > > in
> > >> > > >> Flink jobs, which basically means the code supporting watermark
> > >> > > alignment
> > >> > > >> has to already be there. That again means the Source
> developers are
> > >> > also
> > >> > > >> expected to support pausable splits by default. So this way we
> > >> > > essentially
> > >> > > >> tell the end users that you may enable / disable this feature
> as
> > >> you
> > >> > > wish,
> > >> > > >> and tell the source developers that you SHOULD implement this
> > >> because
> > >> > > the
> > >> > > >> end users may turn it on/off at will. And if the source does
> not
> > >> > support
> > >> > > >> pausable splits, that goes to an error handling path when
> watermark
> > >> > > >> alignment is enabled on it. So users know they have to
> explicitly
> > >> > > exclude
> > >> > > >> this source.
> > >> > > >>
> > >> > > >>
> > >> > > >>>
> > >> > > >>> So far it's totally clear to me and I hope this is what you
> mean.
> > >> I
> > >> > > also
> > >> > > >>> agree with both statements:
> > >> > > >>>
> > >> > > >>> So making that expectation aligned with the source developers
> > >> seems
> > >> > > >>>> reasonable.
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>> I think this is a simple and clean solution from both the end
> user
> > >> > and
> > >> > > >>>> source developers' standpoint.
> > >> > > >>>>
> > >> > > >>>
> > >> > > >>> However, a last conclusion derives from 3. and is an open
> question
> > >> > for
> > >> > > >>> me:
> > >> > > >>>
> > >> > > >>> 4. The feature of "pausable splits" is now tightly bound to
> > >> watermark
> > >> > > >>> alignment, i.e., if sources do not support "pausable splits"
> one
> > >> can
> > >> > > not
> > >> > > >>> enable watermark alignment for these sources. This dependency
> is
> > >> not
> > >> > > the
> > >> > > >>> current status of watermark alignment implementation because
> it
> > >> > is/was
> > >> > > >>> implemented without pausable splits. Do we want to introduce
> this
> > >> > > >>> dependency? (This is an open question. I cannot judge that.)
> > >> > > >>>
> > >> > > >> The watermark alignment basically relies on the pausable
> splits,
> > >> > right?
> > >> > > >> So personally I found it quite reasonable that if the source
> does
> > >> not
> > >> > > >> support pausable splits, end users cannot enable watermark
> > >> alignment
> > >> > on
> > >> > > it.
> > >> > > >>
> > >> > > >>
> > >> > > >>> If something is wrong, please correct me.
> > >> > > >>>
> > >> > > >>> Regards,
> > >> > > >>> Sebastian
> > >> > > >>>
> > >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <
> becket.qin@gmail.com>
> > >> > > wrote:
> > >> > > >>>
> > >> > > >>>> Hi Sebastian,
> > >> > > >>>>
> > >> > > >>>> Thanks for the reply and patient discussion. I agree this is
> a
> > >> > tricky
> > >> > > >>>> decision.
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
> which I
> > >> see
> > >> > as
> > >> > > >>>>> follows:
> > >> > > >>>>> (1) An interface with default NOOP implementation makes the
> > >> > > >>>>> implementation optional. And in my opinion, a default
> > >> > implementation
> > >> > > is and
> > >> > > >>>>> will remain a way of making implementation optional because
> > >> even in
> > >> > > future
> > >> > > >>>>> a developer can decide to implement the "old flavor" without
> > >> > support
> > >> > > for
> > >> > > >>>>> pausable splits.
> > >> > > >>>>> (2) It may not be too critical but I also find it suboptimal
> > >> that
> > >> > > with
> > >> > > >>>>> a NOOP default implementation there is no way to check at
> > >> runtime
> > >> > if
> > >> > > >>>>> SourceReader or SplitReader actually support pausing. (To
> do so,
> > >> > one
> > >> > > would
> > >> > > >>>>> need a supportsX method which makes it again more
> complicated.)\
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> Based on the last few messages in the mailing list.  Piotr
> and I
> > >> > > agreed
> > >> > > >>>> that the default implementation should just throw an
> > >> > > >>>> UnsupportedOperationException if the source is unpausable. So
> > >> this
> > >> > > >>>> basically tells the Source developers that this feature is
> > >> expected
> > >> > > to be
> > >> > > >>>> supported. Because we cannot prevent end users from putting
> an
> > >> > > unpausable
> > >> > > >>>> source into the watermark alignment group, that basically
> means
> > >> > > watermark
> > >> > > >>>> alignment is an non-optional feature to the end users. So
> making
> > >> > that
> > >> > > >>>> expectation aligned with the source developers seems
> reasonable.
> > >> > And
> > >> > > if a
> > >> > > >>>> source does not support this feature, the end users should
> > >> > explicitly
> > >> > > >>>> remove that source from the watermark alignment group.
> > >> > > >>>>
> > >> > > >>>> Personally speaking I think this is a simple and clean
> solution
> > >> from
> > >> > > >>>> both the end user and source developers' standpoint.
> > >> > > >>>>
> > >> > > >>>> Does this address your concerns?
> > >> > > >>>>
> > >> > > >>>> Thanks,
> > >> > > >>>>
> > >> > > >>>> Jiangjie (Becket) Qin
> > >> > > >>>>
> > >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> > >> > > >>>> sebastian@ververica.com> wrote:
> > >> > > >>>>
> > >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > >> > > >>>>>
> > >> > > >>>>> we, Dawid and I, discussed the various suggestions/options
> and
> > >> we
> > >> > > >>>>> would be okay either way because we find neither solution is
> > >> > perfect
> > >> > > just
> > >> > > >>>>> because of the already present complexity.
> > >> > > >>>>>
> > >> > > >>>>> Option c) Adding methods to the interfaces of SourceReader
> and
> > >> > > >>>>> SplitReader
> > >> > > >>>>> Option a) Adding decorative interfaces to be used by
> > >> SourceReader
> > >> > and
> > >> > > >>>>> SplitReader
> > >> > > >>>>>
> > >> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is
> based on
> > >> > > >>>>> Option c) which we find acceptable because the complexity
> added
> > >> is
> > >> > > only a
> > >> > > >>>>> single method.
> > >> > > >>>>>
> > >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c)
> which I
> > >> see
> > >> > as
> > >> > > >>>>> follows:
> > >> > > >>>>> (1) An interface with default NOOP implementation makes the
> > >> > > >>>>> implementation optional. And in my opinion, a default
> > >> > implementation
> > >> > > is and
> > >> > > >>>>> will remain a way of making implementation optional because
> > >> even in
> > >> > > future
> > >> > > >>>>> a developer can decide to implement the "old flavor" without
> > >> > support
> > >> > > for
> > >> > > >>>>> pausable splits.
> > >> > > >>>>> (2) It may not be too critical but I also find it suboptimal
> > >> that
> > >> > > with
> > >> > > >>>>> a NOOP default implementation there is no way to check at
> > >> runtime
> > >> > if
> > >> > > >>>>> SourceReader or SplitReader actually support pausing. (To
> do so,
> > >> > one
> > >> > > would
> > >> > > >>>>> need a supportsX method which makes it again more
> complicated.)
> > >> > > >>>>>
> > >> > > >>>>> However, we haven't changed it because Option a) is also not
> > >> > optimal
> > >> > > >>>>> or straight-forward:
> > >> > > >>>>> (1) We need to add two distinct yet similar decorative
> > >> interfaces
> > >> > > >>>>> since, as mentioned, the signatures of the methods are
> > >> different.
> > >> > For
> > >> > > >>>>> example, we would need decorative interfaces like
> > >> > > >>>>> `SplitReaderWithPausableSplits` and
> > >> > `SourceReaderWithPausableSplits`.
> > >> > > >>>>> (2) As a consequence, we would need to somehow document
> > >> how/where
> > >> > to
> > >> > > >>>>> implement both interfaces and how this relates to each
> other.
> > >> This
> > >> > > we could
> > >> > > >>>>> solve by adding a note in the interface of SourceReader and
> > >> > > SplitReader and
> > >> > > >>>>> reference to the decorative interfaces but it still
> increases
> > >> > > complexity
> > >> > > >>>>> too.
> > >> > > >>>>>
> > >> > > >>>>> In summary, we see both as acceptable and preferred over
> other
> > >> > > >>>>> options. The question is if we can find a solution or
> compromise
> > >> > > that is
> > >> > > >>>>> acceptable for everybody to reach consensus.
> > >> > > >>>>>
> > >> > > >>>>> Please let us know what you think because we would be happy
> if
> > >> we
> > >> > can
> > >> > > >>>>> conclude the discussion to avoid dropping the initiative on
> this
> > >> > > FLIP.
> > >> > > >>>>>
> > >> > > >>>>> Regards,
> > >> > > >>>>> Sebastian
> > >> > > >>>>>
> > >> > > >>>>> [1]
> > >> > > >>>>>
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > >> > > >>>>> (v. 12)
> > >> > > >>>>>
> > >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> > >> > pnowojski@apache.org
> > >> > > >
> > >> > > >>>>> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> Hi Guowei,
> > >> > > >>>>>>
> > >> > > >>>>>> as Dawid wrote a couple of messages back:
> > >> > > >>>>>>
> > >> > > >>>>>> > This is covered in the previous FLIP[1] which has been
> > >> already
> > >> > > >>>>>> implemented in 1.15. In short, it must be enabled with the
> > >> > watermark
> > >> > > >>>>>> strategy which also configures drift and update interval
> > >> > > >>>>>>
> > >> > > >>>>>> So by default watermark alignment is disabled, regardless
> if a
> > >> > > source
> > >> > > >>>>>> supports it or not.
> > >> > > >>>>>>
> > >> > > >>>>>> Best,
> > >> > > >>>>>> Piotrek
> > >> > > >>>>>>
> > >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com>
> > >> > > napisał(a):
> > >> > > >>>>>>
> > >> > > >>>>>>> Hi,
> > >> > > >>>>>>>
> > >> > > >>>>>>> We know that in the case of Bounded input Flink supports
> the
> > >> > Batch
> > >> > > >>>>>>> execution mode. Currently in Batch execution mode, flink
> is
> > >> > > executed
> > >> > > >>>>>>> on a
> > >> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark
> alignment
> > >> > > might
> > >> > > >>>>>>> not
> > >> > > >>>>>>> gain much.
> > >> > > >>>>>>>
> > >> > > >>>>>>> So my question is: Is watermark alignment the default
> > >> > behavior(for
> > >> > > >>>>>>> implemented source only)? If so, have you considered
> > >> evaluating
> > >> > the
> > >> > > >>>>>>> impact
> > >> > > >>>>>>> of this behavior on the Batch execution mode? Or thinks
> it is
> > >> not
> > >> > > >>>>>>> necessary.
> > >> > > >>>>>>>
> > >> > > >>>>>>> Correct me if I miss something.
> > >> > > >>>>>>>
> > >> > > >>>>>>> Best,
> > >> > > >>>>>>> Guowei
> > >> > > >>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> > >> > > >>>>>>> piotr.nowojski@gmail.com>
> > >> > > >>>>>>> wrote:
> > >> > > >>>>>>>
> > >> > > >>>>>>> > Hi Becket and Dawid,
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > > I feel that no matter which option we choose this can
> not
> > >> be
> > >> > > >>>>>>> solved
> > >> > > >>>>>>> > entirely in either of the options, because of the point
> > >> above
> > >> > and
> > >> > > >>>>>>> because
> > >> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
> > >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly different
> (one
> > >> > > >>>>>>> identifies
> > >> > > >>>>>>> > splits with splitId the other one passes the splits
> > >> directly).
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > Yes, that's a good point in this case and for features
> that
> > >> > need
> > >> > > >>>>>>> to be
> > >> > > >>>>>>> > implemented in more than one place.
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > > Is there any reason for pausing reading from a split
> an
> > >> > > optional
> > >> > > >>>>>>> feature,
> > >> > > >>>>>>> > > other than that this was not included in the original
> > >> > > interface?
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > An additional argument in favor of making it optional
> is to
> > >> > > >>>>>>> simplify source
> > >> > > >>>>>>> > implementation. But on its own I'm not sure if that
> would be
> > >> > > >>>>>>> enough to
> > >> > > >>>>>>> > justify making this feature optional. Maybe.
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > > I think it would be way simpler and clearer to just
> let
> > >> end
> > >> > > >>>>>>> users and
> > >> > > >>>>>>> > Flink
> > >> > > >>>>>>> > > assume all the connectors will implement this feature.
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > As I wrote above that would be an interesting choice to
> make
> > >> > > (ease
> > >> > > >>>>>>> of
> > >> > > >>>>>>> > implementation for new users, vs system consistency).
> > >> > Regardless
> > >> > > >>>>>>> of that,
> > >> > > >>>>>>> > yes, for me the main argument is the API backward
> > >> > compatibility.
> > >> > > >>>>>>> But let's
> > >> > > >>>>>>> > clear a couple of points:
> > >> > > >>>>>>> > - The current proposal adding methods to the base
> interface
> > >> > with
> > >> > > >>>>>>> default
> > >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
> > >> decorative
> > >> > > >>>>>>> version
> > >> > > >>>>>>> > would be.
> > >> > > >>>>>>> > - Decorative version could implement "throw
> > >> > > >>>>>>> UnsupportedOperationException"
> > >> > > >>>>>>> > if user enabled watermark alignment just as well and I
> agree
> > >> > > >>>>>>> that's a
> > >> > > >>>>>>> > better option compared to logging a warning.
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > Best,
> > >> > > >>>>>>> > Piotrek
> > >> > > >>>>>>> >
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <
> becket.qin@gmail.com>
> > >> > > >>>>>>> napisał(a):
> > >> > > >>>>>>> >
> > >> > > >>>>>>> > > Thanks for the reply and patient discussion, Piotr and
> > >> Dawid.
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > Is there any reason for pausing reading from a split
> an
> > >> > > optional
> > >> > > >>>>>>> feature,
> > >> > > >>>>>>> > > other than that this was not included in the original
> > >> > > interface?
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > To be honest I am really worried about the complexity
> of
> > >> the
> > >> > > >>>>>>> user story
> > >> > > >>>>>>> > > here. Optional features like this have a high
> overhead.
> > >> > Imagine
> > >> > > >>>>>>> this
> > >> > > >>>>>>> > > feature is optional, now a user enabled watermark
> > >> alignment
> > >> > and
> > >> > > >>>>>>> defined a
> > >> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that
> depends on
> > >> > > >>>>>>> whether the
> > >> > > >>>>>>> > > involved Source has implmemented this feature. If the
> > >> Sources
> > >> > > >>>>>>> are well
> > >> > > >>>>>>> > > documented, good luck. Otherwise end users may have to
> > >> look
> > >> > > into
> > >> > > >>>>>>> the code
> > >> > > >>>>>>> > > of the Source to see whether the feature is supported.
> > >> Which
> > >> > is
> > >> > > >>>>>>> something
> > >> > > >>>>>>> > > they shouldn't have to do.
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > I think it would be way simpler and clearer to just
> let
> > >> end
> > >> > > >>>>>>> users and
> > >> > > >>>>>>> > Flink
> > >> > > >>>>>>> > > assume all the connectors will implement this feature.
> > >> After
> > >> > > all
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > > watermark group is not optinoal to the end users. If
> in
> > >> some
> > >> > > >>>>>>> rare cases,
> > >> > > >>>>>>> > > the feature cannot be supported, a clear
> > >> > > >>>>>>> UnsupportedOperationException
> > >> > > >>>>>>> > will
> > >> > > >>>>>>> > > be thrown to tell users to explicitly remove this
> Source
> > >> from
> > >> > > the
> > >> > > >>>>>>> > watermark
> > >> > > >>>>>>> > > group. I don't think we should have a warning message
> > >> here,
> > >> > as
> > >> > > >>>>>>> they tend
> > >> > > >>>>>>> > to
> > >> > > >>>>>>> > > be ignored in many cases. If we do this, we don't even
> > >> need
> > >> > the
> > >> > > >>>>>>> > supportXXX
> > >> > > >>>>>>> > > method in the Source for this feature. In fact this is
> > >> > exactly
> > >> > > >>>>>>> how many
> > >> > > >>>>>>> > > interfaces works today. For example,
> > >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > >> > > >>>>>>> > not
> > >> > > >>>>>>> > > supported by Pravega source because it does not
> support
> > >> > partial
> > >> > > >>>>>>> failover.
> > >> > > >>>>>>> > > In that case, it simply throws an exception to
> trigger a
> > >> > global
> > >> > > >>>>>>> recovery.
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > The reason we add a default implementation in this
> case
> > >> would
> > >> > > >>>>>>> just for
> > >> > > >>>>>>> > the
> > >> > > >>>>>>> > > sake of backwards compatibility so the old source can
> > >> still
> > >> > > >>>>>>> compile.
> > >> > > >>>>>>> > Sure,
> > >> > > >>>>>>> > > in short term, this feature might not be supported by
> many
> > >> > > >>>>>>> existing
> > >> > > >>>>>>> > > sources. That is OK, and it is quite visible to the
> source
> > >> > > >>>>>>> developers
> > >> > > >>>>>>> > that
> > >> > > >>>>>>> > > they did not override the default impl which throws an
> > >> > > >>>>>>> > > UnsupportedOperationException.
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > @Dawid,
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > the Java doc of the SupportXXX() method in the Source
> > >> would
> > >> > be
> > >> > > >>>>>>> the single
> > >> > > >>>>>>> > > >> source of truth regarding how to implement this
> > >> feature.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > I also don't find it entirely true. Half of the
> classes
> > >> are
> > >> > > >>>>>>> theoretically
> > >> > > >>>>>>> > > > optional and are utility classes from the point of
> view
> > >> how
> > >> > > the
> > >> > > >>>>>>> > > interfaces
> > >> > > >>>>>>> > > > are organized. Theoretically users do not need to
> use
> > >> any
> > >> > of
> > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to
> list
> > >> > their
> > >> > > >>>>>>> methods in
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > > Source interface.
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > I think the ultimate goal of java docs is to guide
> users
> > >> to
> > >> > > >>>>>>> implement the
> > >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way to
> > >> > implement a
> > >> > > >>>>>>> > > SourceReader, it seems worth mentioning that. Even the
> > >> Java
> > >> > > >>>>>>> language
> > >> > > >>>>>>> > > documentation interfaces lists the konwn
> implementations
> > >> [1]
> > >> > so
> > >> > > >>>>>>> people
> > >> > > >>>>>>> > can
> > >> > > >>>>>>> > > leverage them. But for this particular case, if we
> make
> > >> the
> > >> > > >>>>>>> feature
> > >> > > >>>>>>> > > non-optional, we don't even need the supportXXX()
> method
> > >> for
> > >> > > now.
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > Thanks,
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > Jiangjie (Becket) Qin
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
> > >> > > >>>>>>> dwysakowicz@apache.org>
> > >> > > >>>>>>> > > wrote:
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > > Hey Piotr and Becket,
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > First of all, let me say I am happy with whichever
> > >> option
> > >> > is
> > >> > > >>>>>>> agreed in
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > > discussion.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > I wanted to clarify a few points from the discussion
> > >> > though:
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > @Becket:
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > The main argument for adding the methods to the
> > >> > SourceReader
> > >> > > >>>>>>> is that
> > >> > > >>>>>>> > > these
> > >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader
> > >> impl,
> > >> > > i.e.
> > >> > > >>>>>>> > starting
> > >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are
> expected
> > >> to
> > >> > > >>>>>>> support this
> > >> > > >>>>>>> > > > method, although some old implementations may not
> have
> > >> > > >>>>>>> implemented this
> > >> > > >>>>>>> > > > feature. I think we should distinguish the new
> features
> > >> > from
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > optional
> > >> > > >>>>>>> > > > features. While the public decorative interface is a
> > >> > solution
> > >> > > >>>>>>> to the
> > >> > > >>>>>>> > > > optional features. We should not use it for the
> features
> > >> > that
> > >> > > >>>>>>> are
> > >> > > >>>>>>> > > > non-optional.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL.
> Even
> > >> > though
> > >> > > >>>>>>> > preferred, I
> > >> > > >>>>>>> > > > still think it can be simply optional.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the
> Source
> > >> would
> > >> > > be
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > single
> > >> > > >>>>>>> > > > source of truth regarding how to implement this
> feature.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > I also don't find it entirely true. Half of the
> classes
> > >> are
> > >> > > >>>>>>> > theoretically
> > >> > > >>>>>>> > > > optional and are utility classes from the point of
> view
> > >> how
> > >> > > the
> > >> > > >>>>>>> > > interfaces
> > >> > > >>>>>>> > > > are organized. Theoretically users do not need to
> use
> > >> any
> > >> > of
> > >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to
> list
> > >> > their
> > >> > > >>>>>>> methods in
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > > Source interface.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > @Piotr
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > If we have all of the methods with default
> > >> implementation
> > >> > in
> > >> > > >>>>>>> the base
> > >> > > >>>>>>> > > > interface, the API doesn't give any clue to the user
> > >> which
> > >> > > set
> > >> > > >>>>>>> of
> > >> > > >>>>>>> > methods
> > >> > > >>>>>>> > > > are required to be implemented at the same time.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > I feel that no matter which option we choose this
> can
> > >> not
> > >> > be
> > >> > > >>>>>>> solved
> > >> > > >>>>>>> > > > entirely in either of the options, because of the
> point
> > >> > above
> > >> > > >>>>>>> and
> > >> > > >>>>>>> > because
> > >> > > >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
> > >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly
> different
> > >> > (one
> > >> > > >>>>>>> identifies
> > >> > > >>>>>>> > > > splits with splitId the other one passes the splits
> > >> > > directly).
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Best,
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Dawid
> > >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Hi Piotr,
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Thanks for the comment.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Just to clarify, I am not against the decorative
> > >> > interfaces,
> > >> > > >>>>>>> but I do
> > >> > > >>>>>>> > > > think we should use them with caution. The main
> argument
> > >> > for
> > >> > > >>>>>>> adding the
> > >> > > >>>>>>> > > > methods to the SourceReader is that these methods
> are
> > >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e.
> > >> > starting
> > >> > > >>>>>>> from this
> > >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to
> support
> > >> > this
> > >> > > >>>>>>> > > > method, although some old implementations may not
> have
> > >> > > >>>>>>> implemented this
> > >> > > >>>>>>> > > > feature. I think we should distinguish the new
> features
> > >> > from
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > optional
> > >> > > >>>>>>> > > > features. While the public decorative interface is a
> > >> > solution
> > >> > > >>>>>>> to the
> > >> > > >>>>>>> > > > optional features. We should not use it for the
> features
> > >> > that
> > >> > > >>>>>>> are
> > >> > > >>>>>>> > > > non-optional.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > That said, this feature is optional for
> SplitReaders.
> > >> > > Arguably
> > >> > > >>>>>>> we can
> > >> > > >>>>>>> > > have
> > >> > > >>>>>>> > > > a decorative interface for that, but for simplicity
> and
> > >> > > >>>>>>> symmetry of the
> > >> > > >>>>>>> > > > interface, personally I prefer just adding a new
> method.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Regarding the advantages you mentioned about the
> > >> decorative
> > >> > > >>>>>>> interfaces,
> > >> > > >>>>>>> > > > they would make sense if:
> > >> > > >>>>>>> > > > 1. The feature is optional.
> > >> > > >>>>>>> > > > 2. There is only one decorative interface involved
> for a
> > >> > > >>>>>>> feature.
> > >> > > >>>>>>> > > > Otherwise the argument that all the methods are
> grouped
> > >> > > >>>>>>> together will
> > >> > > >>>>>>> > not
> > >> > > >>>>>>> > > > stand.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Compared with that, I think the current solution
> works
> > >> fine
> > >> > > in
> > >> > > >>>>>>> all
> > >> > > >>>>>>> > cases,
> > >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and
> default
> > >> > > >>>>>>> methods /
> > >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > The advantages are:
> > >> > > >>>>>>> > > >> - clean and easy to implement base interface
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > In the current approach, the Java doc of the
> > >> SupportXXX()
> > >> > > >>>>>>> method in the
> > >> > > >>>>>>> > > > Source would be the single source of truth regarding
> > >> how to
> > >> > > >>>>>>> implement
> > >> > > >>>>>>> > > this
> > >> > > >>>>>>> > > > feature. It lists the method that has to be
> implemented
> > >> to
> > >> > > >>>>>>> support this
> > >> > > >>>>>>> > > > feature, regardless of how many classes /
> interfaces are
> > >> > > >>>>>>> involved.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > When implementing the base interface, users do not
> need
> > >> to
> > >> > > >>>>>>> implement a
> > >> > > >>>>>>> > > > method with default implementation. If they are
> curious
> > >> > what
> > >> > > >>>>>>> the method
> > >> > > >>>>>>> > > is
> > >> > > >>>>>>> > > > for, the java doc of that method simply points
> users to
> > >> the
> > >> > > >>>>>>> > SupportXXX()
> > >> > > >>>>>>> > > > method in the Source. It seems not adding work to
> the
> > >> users
> > >> > > >>>>>>> compared
> > >> > > >>>>>>> > with
> > >> > > >>>>>>> > > > decorative interfaces, but gives much better
> > >> > discoverability.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > - all of the methods from a single feature are
> grouped
> > >> in a
> > >> > > >>>>>>> single
> > >> > > >>>>>>> > > >> decorator interface, together with their dedicated
> java
> > >> > doc.
> > >> > > >>>>>>> It's also
> > >> > > >>>>>>> > > >> easier to google search for help using the
> decorator
> > >> name
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > - if an optional feature requires two methods to be
> > >> > > >>>>>>> implemented at
> > >> > > >>>>>>> > once,
> > >> > > >>>>>>> > > >> decorator can guarantee that
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > These two points are not true when multiple
> components
> > >> and
> > >> > > >>>>>>> classes are
> > >> > > >>>>>>> > > > involved collaboratively to provide a feature. In
> our
> > >> case,
> > >> > > we
> > >> > > >>>>>>> have
> > >> > > >>>>>>> > both
> > >> > > >>>>>>> > > > SourceReader and SplitReader involved. And there
> might
> > >> be
> > >> > > other
> > >> > > >>>>>>> > > interfaces
> > >> > > >>>>>>> > > > on the JM side involved for some future features.
> So the
> > >> > > >>>>>>> relevant
> > >> > > >>>>>>> > methods
> > >> > > >>>>>>> > > > can actually be scattered over the places. That
> said, we
> > >> > may
> > >> > > >>>>>>> still use
> > >> > > >>>>>>> > > > decorative interfaces for each component, if the
> > >> feature is
> > >> > > >>>>>>> optional,
> > >> > > >>>>>>> > > given
> > >> > > >>>>>>> > > > there is a single source of truth for the feature.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Here I would strongly lean towards making life
> easier
> > >> for
> > >> > new
> > >> > > >>>>>>> users,
> > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
> expense
> > >> > for
> > >> > > >>>>>>> the power
> > >> > > >>>>>>> > > >> users.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > I actually think the current approach is simpler,
> more
> > >> > > >>>>>>> extensible and
> > >> > > >>>>>>> > > more
> > >> > > >>>>>>> > > > general for all the users. Can you articulate a bit
> > >> more on
> > >> > > >>>>>>> which part
> > >> > > >>>>>>> > > you
> > >> > > >>>>>>> > > > think makes users harder to understand?
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > There is another benefit of the decorative
> interfaces
> > >> which
> > >> > > is
> > >> > > >>>>>>> not
> > >> > > >>>>>>> > > > mentioned, but might be worth considering here.
> Usually
> > >> the
> > >> > > >>>>>>> decorative
> > >> > > >>>>>>> > > > interfaces give slightly better backwards
> compatibility
> > >> > than
> > >> > > >>>>>>> the new
> > >> > > >>>>>>> > > > default method in the interfaces. That is when
> users are
> > >> > > using
> > >> > > >>>>>>> a jar
> > >> > > >>>>>>> > that
> > >> > > >>>>>>> > > > was compiled with an older version of Flink which
> does
> > >> not
> > >> > > >>>>>>> have the
> > >> > > >>>>>>> > > default
> > >> > > >>>>>>> > > > method in the interfaces in question. A decorative
> > >> > interface
> > >> > > >>>>>>> may still
> > >> > > >>>>>>> > > > provide backwards compatibility in that case, while
> > >> default
> > >> > > >>>>>>> method impl
> > >> > > >>>>>>> > > > cannot.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > I think in Flink we in general do not guarantee
> custom
> > >> > > >>>>>>> components
> > >> > > >>>>>>> > > compiled
> > >> > > >>>>>>> > > > with an older version can run with a newer version
> of
> > >> > Flink.
> > >> > > A
> > >> > > >>>>>>> > recompile
> > >> > > >>>>>>> > > > with a newer version would be required. That said,
> if
> > >> we do
> > >> > > >>>>>>> care about
> > >> > > >>>>>>> > > > this, we can just change the "supportXXX()" method
> in
> > >> the
> > >> > > >>>>>>> Source
> > >> > > >>>>>>> > > interface
> > >> > > >>>>>>> > > > to use decorative interfaces, and leave the other
> parts
> > >> > > >>>>>>> unchanged.
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Thanks,
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
> > >> > > >>>>>>> pnowojski@apache.org>
> > >> > > >>>>>>> > > > wrote:
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > >> Hi,
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for
> the
> > >> last
> > >> > > two
> > >> > > >>>>>>> weeks.
> > >> > > >>>>>>> > > >> Regarding the interfaces, I would be actually
> against
> > >> > adding
> > >> > > >>>>>>> those
> > >> > > >>>>>>> > > methods
> > >> > > >>>>>>> > > >> to the base interfaces for the reasons mentioned
> above.
> > >> > > >>>>>>> Clogging the
> > >> > > >>>>>>> > > base
> > >> > > >>>>>>> > > >> interface for new users with tons of methods that
> they
> > >> do
> > >> > > not
> > >> > > >>>>>>> need, do
> > >> > > >>>>>>> > > not
> > >> > > >>>>>>> > > >> understand and do not know what to do with them.
> > >> Moreover,
> > >> > > >>>>>>> such
> > >> > > >>>>>>> > > decorative
> > >> > > >>>>>>> > > >> interfaces are solving a problem if a feature
> requires
> > >> two
> > >> > > or
> > >> > > >>>>>>> more
> > >> > > >>>>>>> > > methods
> > >> > > >>>>>>> > > >> to be implemented at the same time. If we have all
> of
> > >> the
> > >> > > >>>>>>> methods with
> > >> > > >>>>>>> > > >> default implementation in the base interface, the
> API
> > >> > > doesn't
> > >> > > >>>>>>> give any
> > >> > > >>>>>>> > > >> clue
> > >> > > >>>>>>> > > >> to the user which set of methods are required to be
> > >> > > >>>>>>> implemented at the
> > >> > > >>>>>>> > > >> same
> > >> > > >>>>>>> > > >> time.
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
> > >> interfaces
> > >> > is
> > >> > > >>>>>>> which
> > >> > > >>>>>>> > > >> interface
> > >> > > >>>>>>> > > >> > they can decorate and which combinations of
> multiple
> > >> > > >>>>>>> decorative
> > >> > > >>>>>>> > > >> interfaces
> > >> > > >>>>>>> > > >> > are valid. (...)
> > >> > > >>>>>>> > > >> > In the future, if there is a new feature added
> > >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are
> we
> > >> > going
> > >> > > >>>>>>> to create
> > >> > > >>>>>>> > > >> another
> > >> > > >>>>>>> > > >> > interface of SplitReader such as
> SortedSplitReader or
> > >> > > >>>>>>> > > >> PrePartitionedAware?
> > >> > > >>>>>>> > > >> > Can they be combined? So I think the additional
> > >> > decorative
> > >> > > >>>>>>> interface
> > >> > > >>>>>>> > > >> like
> > >> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
> > >> understanding
> > >> > > >>>>>>> cost of
> > >> > > >>>>>>> > users
> > >> > > >>>>>>> > > >> > because they have to know what decorative
> interfaces
> > >> are
> > >> > > >>>>>>> there,
> > >> > > >>>>>>> > which
> > >> > > >>>>>>> > > >> > interface they can decorate and which
> combinations of
> > >> > the
> > >> > > >>>>>>> decorative
> > >> > > >>>>>>> > > >> > interfaces are valid and which are not. Ideally
> we
> > >> want
> > >> > to
> > >> > > >>>>>>> avoid
> > >> > > >>>>>>> > that.
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> I'm not sure if I understand how embedding default
> > >> methods
> > >> > > in
> > >> > > >>>>>>> the base
> > >> > > >>>>>>> > > >> interface is solving the problem: what can be
> combined
> > >> or
> > >> > > >>>>>>> not? If
> > >> > > >>>>>>> > there
> > >> > > >>>>>>> > > >> are
> > >> > > >>>>>>> > > >> two conflicting features, having decorative
> interfaces
> > >> > that
> > >> > > >>>>>>> can not be
> > >> > > >>>>>>> > > >> mixed together actually makes much more sense to me
> > >> rather
> > >> > > >>>>>>> than having
> > >> > > >>>>>>> > > >> them
> > >> > > >>>>>>> > > >> all in one base class. How would you allow users to
> > >> > > implement
> > >> > > >>>>>>> only one
> > >> > > >>>>>>> > > of
> > >> > > >>>>>>> > > >> those two features?
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are
> drawbacks:
> > >> > > >>>>>>> > > >> - how can a user discover what decorators are
> there?
> > >> > > >>>>>>> > > >> - how can a user know where the decorator can be
> > >> applied
> > >> > to?
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> However those are drawbacks for more power users,
> that
> > >> can
> > >> > > be
> > >> > > >>>>>>> > mitigated
> > >> > > >>>>>>> > > by
> > >> > > >>>>>>> > > >> the documentation. For example listing all of the
> > >> > decorators
> > >> > > >>>>>>> with
> > >> > > >>>>>>> > > >> detailed explanation both in the docs and in the
> java
> > >> > docs.
> > >> > > >>>>>>> More
> > >> > > >>>>>>> > > >> experienced users will be able to deal with those
> > >> issues
> > >> > > >>>>>>> easier, as
> > >> > > >>>>>>> > they
> > >> > > >>>>>>> > > >> will already have some basic understanding of
> Flink.
> > >> Also
> > >> > if
> > >> > > >>>>>>> user has
> > >> > > >>>>>>> > a
> > >> > > >>>>>>> > > >> problem that he wants to solve, he will google
> search a
> > >> > > >>>>>>> potential
> > >> > > >>>>>>> > > solution
> > >> > > >>>>>>> > > >> to his problem anyway, and while doing that he is
> very
> > >> > > likely
> > >> > > >>>>>>> to
> > >> > > >>>>>>> > > discover
> > >> > > >>>>>>> > > >> the decorator that he needs anyway in the docs.
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> The advantages are:
> > >> > > >>>>>>> > > >> - clean and easy to implement base interface
> > >> > > >>>>>>> > > >> - all of the methods from a single feature are
> grouped
> > >> in
> > >> > a
> > >> > > >>>>>>> single
> > >> > > >>>>>>> > > >> decorator interface, together with their dedicated
> java
> > >> > doc.
> > >> > > >>>>>>> It's also
> > >> > > >>>>>>> > > >> easier to google search for help using the
> decorator
> > >> name
> > >> > > >>>>>>> > > >> - if an optional feature requires two methods to be
> > >> > > >>>>>>> implemented at
> > >> > > >>>>>>> > once,
> > >> > > >>>>>>> > > >> decorator can guarantee that
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> Here I would strongly lean towards making life
> easier
> > >> for
> > >> > > new
> > >> > > >>>>>>> users,
> > >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight
> expense
> > >> > for
> > >> > > >>>>>>> the power
> > >> > > >>>>>>> > > >> users.
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> Best,
> > >> > > >>>>>>> > > >> Piotrek
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> > >> becket.qin@gmail.com
> > >> > >
> > >> > > >>>>>>> > napisał(a):
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > I think Sebastion has a good summary. This is a
> > >> really
> > >> > > >>>>>>> helpful
> > >> > > >>>>>>> > > >> discussion.
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might still
> be
> > >> > better
> > >> > > >>>>>>> to add the
> > >> > > >>>>>>> > > >> > supportsXXX() method in the Source rather than
> > >> > > SourceReader.
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > Generally speaking, what we are trying to do
> here is
> > >> to
> > >> > > let
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > Flink
> > >> > > >>>>>>> > > >> > framework know what the Source is capable of. In
> this
> > >> > > FLIP,
> > >> > > >>>>>>> it
> > >> > > >>>>>>> > happens
> > >> > > >>>>>>> > > >> to
> > >> > > >>>>>>> > > >> > be the capability that only involves
> SourceReader.
> > >> But
> > >> > in
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > future,
> > >> > > >>>>>>> > > >> it is
> > >> > > >>>>>>> > > >> > possible that another functionality involves
> both the
> > >> > > >>>>>>> > SplitEnumerator
> > >> > > >>>>>>> > > >> and
> > >> > > >>>>>>> > > >> > SourceReader. In that case, following the current
> > >> > > approach,
> > >> > > >>>>>>> we
> > >> > > >>>>>>> > should
> > >> > > >>>>>>> > > >> put
> > >> > > >>>>>>> > > >> > the "supportsXXX()" method in both
> SplitEnumerator
> > >> and
> > >> > > >>>>>>> SourceReader.
> > >> > > >>>>>>> > > >> > Because if we only put this in the SourceReader,
> then
> > >> > the
> > >> > > >>>>>>> JM would
> > >> > > >>>>>>> > > have
> > >> > > >>>>>>> > > >> to
> > >> > > >>>>>>> > > >> > create a SourceReader in order to know whether
> this
> > >> > > feature
> > >> > > >>>>>>> is
> > >> > > >>>>>>> > > >> supported,
> > >> > > >>>>>>> > > >> > which is a little ugly. But if we put the
> > >> > "supportsXXX()"
> > >> > > >>>>>>> method in
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > >> > Source, we will break the "symmetric" design
> because
> > >> > this
> > >> > > >>>>>>> FLIP
> > >> > > >>>>>>> > chose a
> > >> > > >>>>>>> > > >> > different way.
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > This is also why I think supportsXXX() method
> seems a
> > >> > good
> > >> > > >>>>>>> thing to
> > >> > > >>>>>>> > > >> have,
> > >> > > >>>>>>> > > >> > because when there are a few interfaces / methods
> > >> that
> > >> > are
> > >> > > >>>>>>> expected
> > >> > > >>>>>>> > to
> > >> > > >>>>>>> > > >> be
> > >> > > >>>>>>> > > >> > implemented at the same time in order to deliver
> a
> > >> > > feature,
> > >> > > >>>>>>> it is
> > >> > > >>>>>>> > > always
> > >> > > >>>>>>> > > >> > good to have a single source of truth to tell the
> > >> > > framework
> > >> > > >>>>>>> what to
> > >> > > >>>>>>> > > do,
> > >> > > >>>>>>> > > >> so
> > >> > > >>>>>>> > > >> > the framework can do consistent things in
> different
> > >> > parts.
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> > >> AlignedSourceReader
> > >> > +
> > >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is
> > >> that we
> > >> > > are
> > >> > > >>>>>>> > > essentially
> > >> > > >>>>>>> > > >> > expecting almost all the SourceReader
> > >> implementations to
> > >> > > >>>>>>> extend
> > >> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
> > >> > SourceReader
> > >> > > >>>>>>> interface
> > >> > > >>>>>>> > > >> > without the pausing support useless. So this
> > >> indicates
> > >> > > that
> > >> > > >>>>>>> public
> > >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for the
> same
> > >> > > >>>>>>> purpose) only
> > >> > > >>>>>>> > > >> > make sense if the original interface is also
> > >> expected to
> > >> > > be
> > >> > > >>>>>>> used.
> > >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add the
> > >> method
> > >> > to
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > original
> > >> > > >>>>>>> > > >> > interface itself.
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > Cheers,
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz
> <
> > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >> > > >>>>>>> > > >> > wrote:
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > I think most of your points aligned with the
> > >> > suggestions
> > >> > > >>>>>>> I made to
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I
> believe
> > >> we
> > >> > > hit
> > >> > > >>>>>>> enter
> > >> > > >>>>>>> > > >> nearly at
> > >> > > >>>>>>> > > >> > > the same time ;) )
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Two points after we synced offline
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > 1. I changed also the
> > >> supportsWatermarksSplitAlignment
> > >> > > to
> > >> > > >>>>>>> > > >> > > supportsPausingSplits to express the general
> > >> > capability
> > >> > > of
> > >> > > >>>>>>> > pausing.
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > 2. As for if we should
> > >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > >> > > >>>>>>> > > (option
> > >> > > >>>>>>> > > >> b)
> > >> > > >>>>>>> > > >> > > or if we should just add the methods (option
> c), I
> > >> > > >>>>>>> suggest to
> > >> > > >>>>>>> > simply
> > >> > > >>>>>>> > > >> add
> > >> > > >>>>>>> > > >> > > the two methods as I felt this is much
> preferred
> > >> > > approach
> > >> > > >>>>>>> Becket,
> > >> > > >>>>>>> > > >> which
> > >> > > >>>>>>> > > >> > > others do not object. Unless there is an
> opposition
> > >> > > let's
> > >> > > >>>>>>> go with
> > >> > > >>>>>>> > > this
> > >> > > >>>>>>> > > >> > > option c.
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Best,
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Dawid
> > >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Hi folks,
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for
> all
> > >> the
> > >> > > >>>>>>> input and
> > >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some
> time
> > >> to
> > >> > > >>>>>>> catch up and
> > >> > > >>>>>>> > > >> > structure
> > >> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted to
> find a
> > >> way
> > >> > > to
> > >> > > >>>>>>> > structure
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > > conclusions. (Also because I had the feeling
> that
> > >> some
> > >> > > >>>>>>> concerns
> > >> > > >>>>>>> > got
> > >> > > >>>>>>> > > >> lost
> > >> > > >>>>>>> > > >> > in
> > >> > > >>>>>>> > > >> > > the discussion.) This is my attempt and please
> > >> correct
> > >> > > me
> > >> > > >>>>>>> if
> > >> > > >>>>>>> > > >> something is
> > >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect and
> > >> > assemble
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > > opinions,
> > >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best of my
> > >> > > >>>>>>> knowledge):
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > # Top A: Should split alignment (pause/resume
> > >> > behavior)
> > >> > > >>>>>>> be a
> > >> > > >>>>>>> > general
> > >> > > >>>>>>> > > >> > > capability?
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > I personally don't see any reason no to have
> it a
> > >> > > general
> > >> > > >>>>>>> > capability
> > >> > > >>>>>>> > > >> > > because for the alignSplit method it is
> actually
> > >> > > >>>>>>> independent of
> > >> > > >>>>>>> > the
> > >> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a
> general
> > >> > > >>>>>>> capability, we
> > >> > > >>>>>>> > > >> should
> > >> > > >>>>>>> > > >> > > also agree on the right wording. Does
> > >> > > >>>>>>> "alignSplits(splitsToResume,
> > >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then actually
> > >> meant?
> > >> > (I
> > >> > > >>>>>>> see it as
> > >> > > >>>>>>> > > >> okay.
> > >> > > >>>>>>> > > >> > I
> > >> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid
> suggested
> > >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > # Top B: Should it be possible do
> enable/disable
> > >> split
> > >> > > >>>>>>> alignment?
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > I would personally not disable the split
> alignment
> > >> on
> > >> > > the
> > >> > > >>>>>>> source
> > >> > > >>>>>>> > > >> reader
> > >> > > >>>>>>> > > >> > > side because if split alignment is used for
> some
> > >> other
> > >> > > >>>>>>> use case
> > >> > > >>>>>>> > (see
> > >> > > >>>>>>> > > >> A)
> > >> > > >>>>>>> > > >> > it
> > >> > > >>>>>>> > > >> > > could have nasty side effects on other/future
> use
> > >> > cases.
> > >> > > >>>>>>> Instead,
> > >> > > >>>>>>> > I
> > >> > > >>>>>>> > > >> would
> > >> > > >>>>>>> > > >> > > disable "watermark split alignment" where I
> think
> > >> it
> > >> > > >>>>>>> should
> > >> > > >>>>>>> > disable
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > > watermark-dependent trigger for split
> alignment.
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > I find it difficult to define the scope of a
> > >> supportsX
> > >> > > >>>>>>> method
> > >> > > >>>>>>> > w.r.t.
> > >> > > >>>>>>> > > >> to
> > >> > > >>>>>>> > > >> > > the following questions: a) Where is it used?
> and
> > >> b)
> > >> > > What
> > >> > > >>>>>>> is the
> > >> > > >>>>>>> > > >> expected
> > >> > > >>>>>>> > > >> > > output? To b), it's not straight-forward to
> > >> provide a
> > >> > > >>>>>>> meaningful
> > >> > > >>>>>>> > > >> output,
> > >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split alignment
> but
> > >> > > >>>>>>> SplitReader
> > >> > > >>>>>>> > not.
> > >> > > >>>>>>> > > >> This
> > >> > > >>>>>>> > > >> > is
> > >> > > >>>>>>> > > >> > > because with the current implementation, we can
> > >> > > determine
> > >> > > >>>>>>> whether
> > >> > > >>>>>>> > > >> split
> > >> > > >>>>>>> > > >> > > alignment is fully supported only during
> runtime
> > >> and
> > >> > > >>>>>>> specifically
> > >> > > >>>>>>> > > >> > actually
> > >> > > >>>>>>> > > >> > > only when calling alignSplits down the call
> > >> hierarchy
> > >> > up
> > >> > > >>>>>>> to the
> > >> > > >>>>>>> > > actual
> > >> > > >>>>>>> > > >> > > SplitReaders.
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Therefore, I would suggest to either raise an
> > >> error or
> > >> > > >>>>>>> warning if
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > >> > > alignment is called but not supported at some
> > >> point. I
> > >> > > >>>>>>> know we
> > >> > > >>>>>>> > > should
> > >> > > >>>>>>> > > >> > > carefully think about when this could be the
> case
> > >> > > because
> > >> > > >>>>>>> we don't
> > >> > > >>>>>>> > > >> want
> > >> > > >>>>>>> > > >> > to
> > >> > > >>>>>>> > > >> > > flood anybody with such warnings. However,
> warnings
> > >> > > could
> > >> > > >>>>>>> be an
> > >> > > >>>>>>> > > >> indicator
> > >> > > >>>>>>> > > >> > > for the user that for watermark split
> alignment use
> > >> > case
> > >> > > >>>>>>> split
> > >> > > >>>>>>> > > >> reading is
> > >> > > >>>>>>> > > >> > > imbalanced with the conclusion to either
> disable
> > >> the
> > >> > > >>>>>>> trigger for
> > >> > > >>>>>>> > > >> > watermark
> > >> > > >>>>>>> > > >> > > split alignment (see Top B) or to
> use/implement a
> > >> > source
> > >> > > >>>>>>> and
> > >> > > >>>>>>> > reader
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > > fully supports split alignment.
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Thanks for structuring the discussion with the
> the
> > >> > > various
> > >> > > >>>>>>> > > >> possibilities
> > >> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I would
> > >> like to
> > >> > > >>>>>>> summarize
> > >> > > >>>>>>> > the
> > >> > > >>>>>>> > > >> > > following requirements:
> > >> > > >>>>>>> > > >> > > - Interfaces should be consistent
> ("symmetric"),
> > >> i.e.,
> > >> > > >>>>>>> similar
> > >> > > >>>>>>> > > >> semantics
> > >> > > >>>>>>> > > >> > > should have similar interfaces with similar
> usage.
> > >> > > >>>>>>> > > >> > > - Make explicit which implementations implement
> > >> > > >>>>>>> interfaces/support
> > >> > > >>>>>>> > > >> > > behavior.
> > >> > > >>>>>>> > > >> > > - Make clear what are default implementations
> and
> > >> how
> > >> > to
> > >> > > >>>>>>> implement
> > >> > > >>>>>>> > > >> > > interfaces with desired behavior.
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > This is a simplified view of the relations
> between
> > >> > > >>>>>>> relevant
> > >> > > >>>>>>> > classes
> > >> > > >>>>>>> > > of
> > >> > > >>>>>>> > > >> > the
> > >> > > >>>>>>> > > >> > > PoC implementation:
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase
> > >> (Internal)
> > >> > > >>>>>>> <|-- ..
> > >> > > >>>>>>> > <|--
> > >> > > >>>>>>> > > >> > > MySourceReader
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager
> (Internal)
> > >> > <>--
> > >> > > >>>>>>> > SplitFetcher
> > >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
> > >> > MySplitReader
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has
> a"
> > >> B)
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Note that SourceReaderBase and
> SplitFetcherManager
> > >> > > >>>>>>> implement most
> > >> > > >>>>>>> > of
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > > "logic" for split alignment just because we
> wanted
> > >> to
> > >> > > >>>>>>> implement
> > >> > > >>>>>>> > > split
> > >> > > >>>>>>> > > >> > > alignment and wanted it to be available as
> kind of
> > >> a
> > >> > > >>>>>>> default. As a
> > >> > > >>>>>>> > > >> > > consequence, we have a "default
> implementation" for
> > >> > > >>>>>>> SourceReader
> > >> > > >>>>>>> > > that
> > >> > > >>>>>>> > > >> > > implements the actual logic for split
> alignment.
> > >> For
> > >> > > that
> > >> > > >>>>>>> reason,
> > >> > > >>>>>>> > I
> > >> > > >>>>>>> > > >> find
> > >> > > >>>>>>> > > >> > it
> > >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
> > >> implementation
> > >> > in
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > > interface
> > >> > > >>>>>>> > > >> for
> > >> > > >>>>>>> > > >> > > the SourceReader. As a consequence, interface
> > >> strategy
> > >> > > c)
> > >> > > >>>>>>> is
> > >> > > >>>>>>> > > difficult
> > >> > > >>>>>>> > > >> > > because this would require NOOP default
> > >> > implementations
> > >> > > >>>>>>> in the
> > >> > > >>>>>>> > > public
> > >> > > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader.
> This is
> > >> > the
> > >> > > >>>>>>> same for
> > >> > > >>>>>>> > > >> strategy
> > >> > > >>>>>>> > > >> > > d) because it would require NOOP default
> > >> > implementation
> > >> > > >>>>>>> in the
> > >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described
> method
> > >> > > >>>>>>> signatures of
> > >> > > >>>>>>> > > >> alignSplit
> > >> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ and it
> > >> would
> > >> > be
> > >> > > >>>>>>> extremely
> > >> > > >>>>>>> > > >> > difficult
> > >> > > >>>>>>> > > >> > > to make the signatures the same (with even
> > >> potential
> > >> > > >>>>>>> performance
> > >> > > >>>>>>> > > >> impact
> > >> > > >>>>>>> > > >> > > because of additional loop-ups of split ids).
> > >> > Therefore,
> > >> > > >>>>>>> having a
> > >> > > >>>>>>> > > >> > symmetric
> > >> > > >>>>>>> > > >> > > decorative interface as of strategy a) is
> actually
> > >> not
> > >> > > >>>>>>> possible
> > >> > > >>>>>>> > and
> > >> > > >>>>>>> > > >> > having
> > >> > > >>>>>>> > > >> > > two decorative interfaces with different method
> > >> > > >>>>>>> signatures is
> > >> > > >>>>>>> > > >> confusing.
> > >> > > >>>>>>> > > >> > My
> > >> > > >>>>>>> > > >> > > conclusion is that we are best with strategy b)
> > >> which
> > >> > > >>>>>>> means to
> > >> > > >>>>>>> > have
> > >> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit from
> the
> > >> > parent
> > >> > > >>>>>>> > interface:
> > >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader,
> SplitReader
> > >> > <|--
> > >> > > >>>>>>> > > >> > AlignedSplitReader
> > >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what the
> > >> > > implications
> > >> > > >>>>>>> are and
> > >> > > >>>>>>> > if
> > >> > > >>>>>>> > > >> this
> > >> > > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid
> and I
> > >> just
> > >> > > >>>>>>> try to
> > >> > > >>>>>>> > > >> implement
> > >> > > >>>>>>> > > >> > and
> > >> > > >>>>>>> > > >> > > see if we like it. :)
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > # Summary
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > In conclusion, please let me know your
> > >> perspectives.
> > >> > > >>>>>>> Please
> > >> > > >>>>>>> > correct
> > >> > > >>>>>>> > > >> me,
> > >> > > >>>>>>> > > >> > if
> > >> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
> > >> something. My
> > >> > > >>>>>>> perspective
> > >> > > >>>>>>> > > >> would
> > >> > > >>>>>>> > > >> > be:
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Top A: Yes
> > >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for
> split
> > >> > > >>>>>>> alignment)
> > >> > > >>>>>>> > > >> > > Top C: No
> > >> > > >>>>>>> > > >> > > Top D: b)
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > Best,
> > >> > > >>>>>>> > > >> > > Sebastian
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid
> Wysakowicz <
> > >> > > >>>>>>> > > >> dwysakowicz@apache.org
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > > wrote:
> > >> > > >>>>>>> > > >> > >
> > >> > > >>>>>>> > > >> > >> @Arvid:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> approach, I
> > >> > fear
> > >> > > >>>>>>> that it
> > >> > > >>>>>>> > > >> doesn't
> > >> > > >>>>>>> > > >> > work
> > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
> always
> > >> be
> > >> > > >>>>>>> aligned
> > >> > > >>>>>>> > > >> cross-task
> > >> > > >>>>>>> > > >> > and
> > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So
> it's
> > >> > > >>>>>>> plausible to put
> > >> > > >>>>>>> > > >> > sources
> > >> > > >>>>>>> > > >> > >> into an alignment group even though they do
> not
> > >> use
> > >> > any
> > >> > > >>>>>>> of the
> > >> > > >>>>>>> > > >> presented
> > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
> > >> warning, if
> > >> > > >>>>>>> they handle
> > >> > > >>>>>>> > > >> > multiple
> > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for
> > >> telling
> > >> > > if
> > >> > > >>>>>>> it
> > >> > > >>>>>>> > > supports
> > >> > > >>>>>>> > > >> > that
> > >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment
> would
> > >> > always
> > >> > > >>>>>>> be
> > >> > > >>>>>>> > > supported.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I updated interfaces to what I believe to be
> > >> closest
> > >> > > to a
> > >> > > >>>>>>> > consensus
> > >> > > >>>>>>> > > >> > >> between all participants. Do you mind taking a
> > >> look?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Dawid
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree
> > >> that
> > >> > > >>>>>>> decorator
> > >> > > >>>>>>> > > >> interfaces
> > >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big fan
> of
> > >> > > >>>>>>> overloading the
> > >> > > >>>>>>> > > base
> > >> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as
> is). The
> > >> > > usual
> > >> > > >>>>>>> feedback
> > >> > > >>>>>>> > > to
> > >> > > >>>>>>> > > >> > >> Source-related interfaces are always that
> they are
> > >> > > >>>>>>> overwhelming
> > >> > > >>>>>>> > and
> > >> > > >>>>>>> > > >> too
> > >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not
> oppose
> > >> 1c)
> > >> > as
> > >> > > >>>>>>> scattered
> > >> > > >>>>>>> > > >> > interfaces
> > >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and
> 1d).
> > >> > > >>>>>>> > > >> > >> While I also like Becket's capability
> approach, I
> > >> > fear
> > >> > > >>>>>>> that it
> > >> > > >>>>>>> > > >> doesn't
> > >> > > >>>>>>> > > >> > work
> > >> > > >>>>>>> > > >> > >> for this particular use case: Sources can
> always
> > >> be
> > >> > > >>>>>>> aligned
> > >> > > >>>>>>> > > >> cross-task
> > >> > > >>>>>>> > > >> > and
> > >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So
> it's
> > >> > > >>>>>>> plausible to put
> > >> > > >>>>>>> > > >> > sources
> > >> > > >>>>>>> > > >> > >> into an alignment group even though they do
> not
> > >> use
> > >> > any
> > >> > > >>>>>>> of the
> > >> > > >>>>>>> > > >> presented
> > >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
> > >> warning, if
> > >> > > >>>>>>> they handle
> > >> > > >>>>>>> > > >> > multiple
> > >> > > >>>>>>> > > >> > >> splits (see motivation section).
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate
> future
> > >> use
> > >> > > >>>>>>> cases makes
> > >> > > >>>>>>> > > >> sense
> > >> > > >>>>>>> > > >> > but
> > >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should
> be
> > >> > > adjusted
> > >> > > >>>>>>> > > >> accordingly.
> > >> > > >>>>>>> > > >> > >> AlignedSourceReader could be
> PausingSourceReader
> > >> and
> > >> > > I'd
> > >> > > >>>>>>> go for
> > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal
> afaik). We
> > >> > could
> > >> > > >>>>>>> also
> > >> > > >>>>>>> > split
> > >> > > >>>>>>> > > it
> > >> > > >>>>>>> > > >> > into
> > >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> > >> pauseOrResumeSplits
> > >> > > >>>>>>> may allow
> > >> > > >>>>>>> > > >> Sources
> > >> > > >>>>>>> > > >> > to
> > >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as
> written
> > >> in
> > >> > > the
> > >> > > >>>>>>> > Javadoc),
> > >> > > >>>>>>> > > >> both
> > >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not
> sure if
> > >> > there
> > >> > > >>>>>>> is a
> > >> > > >>>>>>> > system
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >> can.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> > >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the capability
> > >> > section:
> > >> > > >>>>>>> Sources
> > >> > > >>>>>>> > that
> > >> > > >>>>>>> > > >> > adopt
> > >> > > >>>>>>> > > >> > >> this interface cannot be used in earlier
> > >> versions. So
> > >> > > it
> > >> > > >>>>>>> feels
> > >> > > >>>>>>> > like
> > >> > > >>>>>>> > > >> we
> > >> > > >>>>>>> > > >> > are
> > >> > > >>>>>>> > > >> > >> only forward compatible (old sources can be
> used
> > >> > after
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > change);
> > >> > > >>>>>>> > > >> but
> > >> > > >>>>>>> > > >> > I
> > >> > > >>>>>>> > > >> > >> guess this holds for any API addition.
> > >> > > >>>>>>> > > >> > >> - You might want to add what happens when all
> > >> splits
> > >> > > are
> > >> > > >>>>>>> paused.
> > >> > > >>>>>>> > > >> > >> - You may want to describe how the 3 flavors
> of
> > >> > > >>>>>>> SourceReaderBase
> > >> > > >>>>>>> > > >> > interact
> > >> > > >>>>>>> > > >> > >> with the interface.
> > >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to include
> Kafka
> > >> and
> > >> > > >>>>>>> Pulsar in
> > >> > > >>>>>>> > the
> > >> > > >>>>>>> > > >> > FLIP.
> > >> > > >>>>>>> > > >> > >> For me, this is rather immediate follow-up
> work.
> > >> > (could
> > >> > > >>>>>>> be in the
> > >> > > >>>>>>> > > >> same
> > >> > > >>>>>>> > > >> > >> umbrella ticket)
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Arvid
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid
> Wysakowicz
> > >> <
> > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >> > > >>>>>>> > > >> > >> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > >> > > >>>>>>> WithSplitsAlignment",
> > >> > > >>>>>>> > > >> along
> > >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > >> > > >>>>>>> WithSplitsAlignment",
> > >> > > >>>>>>> > > or
> > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> AlignedSourceReader"
> > >> > and
> > >> > > >>>>>>> > > "MySplitReader
> > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader"
> and
> > >> > > >>>>>>> "MySplitReader
> > >> > > >>>>>>> > > >> > implements
> > >> > > >>>>>>> > > >> > >> SplitReader".
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
> > >> would
> > >> > > be:
> > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader"
> and
> > >> > > >>>>>>> "MySplitReader
> > >> > > >>>>>>> > > >> > implements
> > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > >> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> > >> > speaking I
> > >> > > >>>>>>> think it
> > >> > > >>>>>>> > is
> > >> > > >>>>>>> > > >> > simpler
> > >> > > >>>>>>> > > >> > >> to just add a new method to the split reader
> with
> > >> > > >>>>>>> default impl.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I think that is a good idea to have it
> aligned as
> > >> > much
> > >> > > as
> > >> > > >>>>>>> > possible.
> > >> > > >>>>>>> > > >> I'd
> > >> > > >>>>>>> > > >> > be
> > >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
> > >> > AlignedSplitReader
> > >> > > >>>>>>> with
> > >> > > >>>>>>> > > >> > SplitReader. We
> > >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Dawid
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
> things in
> > >> > the
> > >> > > >>>>>>> future?
> > >> > > >>>>>>> > For
> > >> > > >>>>>>> > > >> > example,
> > >> > > >>>>>>> > > >> > >> align read speed, I have
> > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> > >> > feature
> > >> > > >>>>>>> may also
> > >> > > >>>>>>> > > need
> > >> > > >>>>>>> > > >> an
> > >> > > >>>>>>> > > >> > >> interface like this.
> > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> > >> support
> > >> > > >>>>>>> align other
> > >> > > >>>>>>> > > >> > things, I
> > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >> > > >>>>>>> `alignWatermarksForSplits`
> > >> > > >>>>>>> > > >> instead
> > >> > > >>>>>>> > > >> > of
> > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it would
> > >> usually
> > >> > be
> > >> > > >>>>>>> more
> > >> > > >>>>>>> > > >> extensible
> > >> > > >>>>>>> > > >> > to
> > >> > > >>>>>>> > > >> > >> just describe what the method actually does,
> > >> instead
> > >> > of
> > >> > > >>>>>>> assuming
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > >> > >> purpose of doing this. For example, in this
> case,
> > >> > > >>>>>>> > > >> pauseOrResumeSplits()
> > >> > > >>>>>>> > > >> > >> would be more extensible because this can be
> used
> > >> for
> > >> > > >>>>>>> any kind of
> > >> > > >>>>>>> > > >> flow
> > >> > > >>>>>>> > > >> > >> control, be it watermark alignment or simple
> rate
> > >> > > >>>>>>> limiting.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think
> > >> they
> > >> > > have
> > >> > > >>>>>>> their
> > >> > > >>>>>>> > own
> > >> > > >>>>>>> > > >> > >> advantages.
> > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> > >> extending
> > >> > > >>>>>>> abilities
> > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in
> > >> this
> > >> > > >>>>>>> case. When
> > >> > > >>>>>>> > you
> > >> > > >>>>>>> > > >> have
> > >> > > >>>>>>> > > >> > a
> > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help
> to
> > >> > > >>>>>>> organize them
> > >> > > >>>>>>> > and
> > >> > > >>>>>>> > > >> make
> > >> > > >>>>>>> > > >> > >> users clear which methods
> > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
> > >> ability.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if this is
> a
> > >> > general
> > >> > > >>>>>>> design
> > >> > > >>>>>>> > > >> pattern
> > >> > > >>>>>>> > > >> > in
> > >> > > >>>>>>> > > >> > >> the other components in Flink. But it looks
> like
> > >> the
> > >> > > >>>>>>> current API
> > >> > > >>>>>>> > > >> > proposal
> > >> > > >>>>>>> > > >> > >> is not symmetric.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> The current proposal is essentially
> > >> "MySourceReader
> > >> > > >>>>>>> implements
> > >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
> > >> > > >>>>>>> "MySplitReader
> > >> > > >>>>>>> > > >> implements
> > >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd
> consider any
> > >> > one
> > >> > > >>>>>>> of the
> > >> > > >>>>>>> > > >> following
> > >> > > >>>>>>> > > >> > as
> > >> > > >>>>>>> > > >> > >> symmetric.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > >> > > >>>>>>> WithSplitsAlignment",
> > >> > > >>>>>>> > > >> along
> > >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > >> > > >>>>>>> WithSplitsAlignment",
> > >> > > >>>>>>> > > or
> > >> > > >>>>>>> > > >> > >> b) "MySourceReader implements
> AlignedSourceReader"
> > >> > and
> > >> > > >>>>>>> > > "MySplitReader
> > >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader"
> and
> > >> > > >>>>>>> "MySplitReader
> > >> > > >>>>>>> > > >> > implements
> > >> > > >>>>>>> > > >> > >> SplitReader".
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
> > >> would
> > >> > > be:
> > >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader"
> and
> > >> > > >>>>>>> "MySplitReader
> > >> > > >>>>>>> > > >> > implements
> > >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > >> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> > >> > speaking I
> > >> > > >>>>>>> think it
> > >> > > >>>>>>> > is
> > >> > > >>>>>>> > > >> > simpler
> > >> > > >>>>>>> > > >> > >> to just add a new method to the split reader
> with
> > >> > > >>>>>>> default impl.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
> > >> > > >>>>>>> > dwysakowicz@apache.org
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > > >> <
> > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dwysakowicz@apache.org
> >,
> > >> > thanks
> > >> > > >>>>>>> for the
> > >> > > >>>>>>> > > reply.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference
> > >> and I
> > >> > > >>>>>>> agree most
> > >> > > >>>>>>> > of
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> > >> > following
> > >> > > >>>>>>> your
> > >> > > >>>>>>> > > >> suggestion
> > >> > > >>>>>>> > > >> > to
> > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > >> > > >>>>>>> > > >> but
> > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
> > >> Source
> > >> > to
> > >> > > >>>>>>> keep the
> > >> > > >>>>>>> > > two
> > >> > > >>>>>>> > > >> > >> methods together.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in
> Source
> > >> is
> > >> > > >>>>>>> that this
> > >> > > >>>>>>> > > allows
> > >> > > >>>>>>> > > >> > some
> > >> > > >>>>>>> > > >> > >> compile time check. For example, if a user
> enabled
> > >> > > >>>>>>> watermark
> > >> > > >>>>>>> > > >> alignment
> > >> > > >>>>>>> > > >> > >> while it is not supported by the Source, an
> > >> exception
> > >> > > >>>>>>> can be
> > >> > > >>>>>>> > thrown
> > >> > > >>>>>>> > > >> at
> > >> > > >>>>>>> > > >> > >> compile time. It seems in general useful. That
> > >> said,
> > >> > I
> > >> > > >>>>>>> agree that
> > >> > > >>>>>>> > > API
> > >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the two
> > >> methods
> > >> > > >>>>>>> together.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
> > >> > > >>>>>>> imjark@gmail.com> <
> > >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
> > >> imjark@gmail.com>
> > >> > > >>>>>>> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182.
> Sorry I
> > >> did
> > >> > > >>>>>>> miss it.
> > >> > > >>>>>>> > > >> > >> I don't have other concerns then.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >> Jark
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid
> Wysakowicz <
> > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >> > > >>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org>
> <
> > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >> > > >>>>>>> > > >> > >> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Jark:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
> watermarks
> > >> > when
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > source
> > >> > > >>>>>>> > > >> > >> implements the interface?
> > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
> > >> alignment
> > >> > > even
> > >> > > >>>>>>> if Kafka
> > >> > > >>>>>>> > > >> > >> implements the interface,
> > >> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
> > >> agree
> > >> > > >>>>>>> with Becket
> > >> > > >>>>>>> > > >> > >> we may need a
> > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> > >> > > configure
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > source
> > >> > > >>>>>>> > > >> to
> > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > >> > > maxDesiredWatermark?
> > >> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark
> will
> > >> > > >>>>>>> greatly affect
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> throughput
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> > >> > > configure
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > > >> alignment
> > >> > > >>>>>>> > > >> > >> offset?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1] which
> has
> > >> > been
> > >> > > >>>>>>> already
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> implemented
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the
> > >> > > watermark
> > >> > > >>>>>>> strategy
> > >> > > >>>>>>> > > >> which
> > >> > > >>>>>>> > > >> > >> also configures drift and update interval.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> > >> support
> > >> > > >>>>>>> align other
> > >> > > >>>>>>> > > >> things,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >> > > >>>>>>> `alignWatermarksForSplits`
> > >> > > >>>>>>> > > >> instead
> > >> > > >>>>>>> > > >> > of
> > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Sure let's rename it.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Becket:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I understand your point. On the other hand
> putting
> > >> > all
> > >> > > >>>>>>> methods,
> > >> > > >>>>>>> > > even
> > >> > > >>>>>>> > > >> > with
> > >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain
> > >> features,
> > >> > > >>>>>>> makes the
> > >> > > >>>>>>> > > entry
> > >> > > >>>>>>> > > >> > >> threshold for writing a new source higher.
> > >> Instead of
> > >> > > >>>>>>> focusing on
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> basic
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> and required properties of the Source, the
> person
> > >> > > >>>>>>> implementing a
> > >> > > >>>>>>> > > >> source
> > >> > > >>>>>>> > > >> > >> must bother with and need to figure out what
> all
> > >> of
> > >> > the
> > >> > > >>>>>>> extra
> > >> > > >>>>>>> > > >> features
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> are
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> about and how to deal with them. It makes it
> also
> > >> > > harder
> > >> > > >>>>>>> to
> > >> > > >>>>>>> > > organize
> > >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference
> > >> and I
> > >> > > >>>>>>> agree most
> > >> > > >>>>>>> > of
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> > >> > following
> > >> > > >>>>>>> your
> > >> > > >>>>>>> > > >> suggestion
> > >> > > >>>>>>> > > >> > to
> > >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > >> > > >>>>>>> > > >> but
> > >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
> > >> Source
> > >> > to
> > >> > > >>>>>>> keep the
> > >> > > >>>>>>> > > two
> > >> > > >>>>>>> > > >> > >> methods together.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
> > >> > > >>>>>>> "alignSplits"
> > >> > > >>>>>>> > methods
> > >> > > >>>>>>> > > >> > differ
> > >> > > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The
> > >> reason
> > >> > > for
> > >> > > >>>>>>> that is
> > >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds,
> whereas
> > >> > > >>>>>>> SplitReader
> > >> > > >>>>>>> > > needs
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
> > >> discrepancy
> > >> > > >>>>>>> acceptable
> > >> > > >>>>>>> > for
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > >> sake of simplifying changes significantly,
> > >> especially
> > >> > > as
> > >> > > >>>>>>> they
> > >> > > >>>>>>> > would
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> highly
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> likely impact performance as we would have to
> > >> perform
> > >> > > >>>>>>> additional
> > >> > > >>>>>>> > > >> > lookups.
> > >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary
> interface.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Dawid
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> [1]
> https://cwiki.apache.org/confluence/x/hQYBCw
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I
> missed
> > >> > > >>>>>>> something).
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 1. Will the framework always align with
> watermarks
> > >> > when
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > source
> > >> > > >>>>>>> > > >> > >> implements the interface?
> > >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
> > >> alignment
> > >> > > even
> > >> > > >>>>>>> if Kafka
> > >> > > >>>>>>> > > >> > >> implements the interface,
> > >> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
> > >> agree
> > >> > > >>>>>>> with Becket
> > >> > > >>>>>>> > > >> > >> we may need a
> > >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> > >> > > configure
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > source
> > >> > > >>>>>>> > > >> to
> > >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > >> > > maxDesiredWatermark?
> > >> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark
> will
> > >> > > >>>>>>> greatly affect
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> throughput
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> if the reader is constantly
> > >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> > >> > > configure
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > > >> alignment
> > >> > > >>>>>>> > > >> > >> offset?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > >> > > >>>>>>> > > >> > >> Can the interface be used to align other
> things in
> > >> > the
> > >> > > >>>>>>> future?
> > >> > > >>>>>>> > For
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> example,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> align read speed, I have
> > >> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> > >> > feature
> > >> > > >>>>>>> may also
> > >> > > >>>>>>> > > need
> > >> > > >>>>>>> > > >> an
> > >> > > >>>>>>> > > >> > >> interface like this.
> > >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> > >> support
> > >> > > >>>>>>> align other
> > >> > > >>>>>>> > > >> things,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >> > > >>>>>>> `alignWatermarksForSplits`
> > >> > > >>>>>>> > > >> instead
> > >> > > >>>>>>> > > >> > of
> > >> > > >>>>>>> > > >> > >> `alignSplits`.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think
> > >> they
> > >> > > have
> > >> > > >>>>>>> their
> > >> > > >>>>>>> > own
> > >> > > >>>>>>> > > >> > >> advantages.
> > >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> > >> extending
> > >> > > >>>>>>> abilities
> > >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in
> > >> this
> > >> > > >>>>>>> case. When
> > >> > > >>>>>>> > you
> > >> > > >>>>>>> > > >> have
> > >> > > >>>>>>> > > >> > a
> > >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help
> to
> > >> > > >>>>>>> organize them
> > >> > > >>>>>>> > and
> > >> > > >>>>>>> > > >> make
> > >> > > >>>>>>> > > >> > >> users clear which methods
> > >> > > >>>>>>> > > >> > >> need to implement when you want to have an
> > >> ability.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >> Jark
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
> > >> > > >>>>>>> becket.qin@gmail.com>
> > >> > > >>>>>>> > <
> > >> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
> > >> > > >>>>>>> becket.qin@gmail.com>
> > >> > > >>>>>>> > <
> > >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Hi Dawid,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies that I
> > >> somehow
> > >> > > >>>>>>> misread a
> > >> > > >>>>>>> > > bunch
> > >> > > >>>>>>> > > >> of
> > >> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Regarding 1, by default implementation, I was
> > >> > thinking
> > >> > > >>>>>>> of the
> > >> > > >>>>>>> > > default
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> no-op
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> implementation. I am a little worried about
> the
> > >> > > >>>>>>> proliferation of
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> decorative
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> interfaces. I think the most important thing
> about
> > >> > > >>>>>>> interfaces is
> > >> > > >>>>>>> > > that
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> they
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> are easy to understand. In this case, I prefer
> > >> adding
> > >> > > >>>>>>> new method
> > >> > > >>>>>>> > to
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > >> existing interface for the following reasons:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of decorative
> > >> > interfaces
> > >> > > >>>>>>> is which
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> interface
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> they can decorate and which combinations of
> > >> multiple
> > >> > > >>>>>>> decorative
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> interfaces
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> > >> > withSplitsAlignment
> > >> > > >>>>>>> interface
> > >> > > >>>>>>> > > is
> > >> > > >>>>>>> > > >> > only
> > >> > > >>>>>>> > > >> > >> applicable to the SourceReader which means it
> > >> can't
> > >> > > >>>>>>> decorate any
> > >> > > >>>>>>> > > >> other
> > >> > > >>>>>>> > > >> > >> interface. From an interface design
> perspective, a
> > >> > > >>>>>>> natural
> > >> > > >>>>>>> > question
> > >> > > >>>>>>> > > >> is
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> why
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> > >> > > >>>>>>> "withSplitsAlignment"? And it
> > >> > > >>>>>>> > > is
> > >> > > >>>>>>> > > >> > also
> > >> > > >>>>>>> > > >> > >> natural to assume that a split reader
> implementing
> > >> > both
> > >> > > >>>>>>> > SplitReader
> > >> > > >>>>>>> > > >> and
> > >> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a
> source
> > >> > reader
> > >> > > >>>>>>> > implementing
> > >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works.
> So why
> > >> > > isn't
> > >> > > >>>>>>> there an
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> interface
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if
> there
> > >> is a
> > >> > > new
> > >> > > >>>>>>> feature
> > >> > > >>>>>>> > > >> added
> > >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware),
> are
> > >> we
> > >> > > >>>>>>> going to
> > >> > > >>>>>>> > create
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> another
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> interface of SplitReader such as
> > >> SortedSplitReader or
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Can they be combined? So I think the
> additional
> > >> > > >>>>>>> decorative
> > >> > > >>>>>>> > > interface
> > >> > > >>>>>>> > > >> > like
> > >> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
> > >> > > understanding
> > >> > > >>>>>>> cost of
> > >> > > >>>>>>> > > >> users
> > >> > > >>>>>>> > > >> > >> because they have to know what decorative
> > >> interfaces
> > >> > > are
> > >> > > >>>>>>> there,
> > >> > > >>>>>>> > > which
> > >> > > >>>>>>> > > >> > >> interface they can decorate and which
> > >> combinations of
> > >> > > the
> > >> > > >>>>>>> > > decorative
> > >> > > >>>>>>> > > >> > >> interfaces are valid and which are not.
> Ideally we
> > >> > want
> > >> > > >>>>>>> to avoid
> > >> > > >>>>>>> > > >> that.
> > >> > > >>>>>>> > > >> > To
> > >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an
> interface of
> > >> > > >>>>>>> > > >> withSplitsAlignment,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> it
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> is completely OK to have it as an internal
> > >> interface
> > >> > > and
> > >> > > >>>>>>> let
> > >> > > >>>>>>> > > >> > SourceReader
> > >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader
> with a
> > >> > > default
> > >> > > >>>>>>> > > >> implementation
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> of
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> no-op would help avoid logic branching in the
> > >> source
> > >> > > >>>>>>> logic,
> > >> > > >>>>>>> > > >> especially
> > >> > > >>>>>>> > > >> > >> given that we agree that the vast majority of
> the
> > >> > > >>>>>>> SourceReader
> > >> > > >>>>>>> > > >> > >> implementations, if not all, would just extend
> > >> from
> > >> > the
> > >> > > >>>>>>> > > >> > SourceReaderBase.
> > >> > > >>>>>>> > > >> > >> That means adding a new method to the
> interface
> > >> would
> > >> > > >>>>>>> effectively
> > >> > > >>>>>>> > > >> give
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> the
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> same user experience, but simpler.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> c) A related design principle that may be
> worth
> > >> > > >>>>>>> discussing is how
> > >> > > >>>>>>> > > do
> > >> > > >>>>>>> > > >> we
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> let
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> the Source implementations tell Flink what
> > >> capability
> > >> > > is
> > >> > > >>>>>>> > supported
> > >> > > >>>>>>> > > >> and
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> what
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the most
> > >> intuitive
> > >> > > >>>>>>> place to me
> > >> > > >>>>>>> > > is
> > >> > > >>>>>>> > > >> in
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> the
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Source itself, because that is the entrance
> of the
> > >> > > >>>>>>> entire Source
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> connector
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> logic.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am wondering
> if the
> > >> > > >>>>>>> following
> > >> > > >>>>>>> > > >> interface
> > >> > > >>>>>>> > > >> > >> would be easier to understand by the users.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal
> > >> interface,
> > >> > > >>>>>>> let both
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> SourceReader
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a default
> > >> no-op
> > >> > > >>>>>>> > > implementation.
> > >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> > >> supportSplitsAlignment()"
> > >> > > to
> > >> > > >>>>>>> the
> > >> > > >>>>>>> > Source
> > >> > > >>>>>>> > > >> > >> interface, with a default implementation
> returning
> > >> > > >>>>>>> false. Sources
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> have
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> implemented the alignment logic can change
> this to
> > >> > > >>>>>>> return true,
> > >> > > >>>>>>> > and
> > >> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
> > >> > SourceReader
> > >> > > /
> > >> > > >>>>>>> > > SplitReader
> > >> > > >>>>>>> > > >> if
> > >> > > >>>>>>> > > >> > >> needed.
> > >> > > >>>>>>> > > >> > >> - In the future, if a new optional feature is
> > >> going
> > >> > to
> > >> > > >>>>>>> be added
> > >> > > >>>>>>> > to
> > >> > > >>>>>>> > > >> the
> > >> > > >>>>>>> > > >> > >> Source, and that feature requires the
> awareness
> > >> from
> > >> > > >>>>>>> Flink, we
> > >> > > >>>>>>> > can
> > >> > > >>>>>>> > > >> add
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> more
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> such methods to the Source.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> What do you think?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid
> Wysakowicz <
> > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> <
> > >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > >> > > >>>>>>> > > >> > <dw...@apache.org>
> > >> > > >>>>>>> > > >> > >> <dw...@apache.org> <
> dwysakowicz@apache.org>
> > >> <
> > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Konstantin:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> > >> > > interface
> > >> > > >>>>>>> (aka the
> > >> > > >>>>>>> > > >> stop &
> > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
> Kafka and
> > >> > > >>>>>>> Pulsar only,
> > >> > > >>>>>>> > > >> correct?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Correct, as far as I know though, those are
> the
> > >> only
> > >> > > >>>>>>> sources
> > >> > > >>>>>>> > which
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> consume
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> concurrently from multiple splits and thus
> > >> alignment
> > >> > > >>>>>>> applies.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Thomas:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > >> > > >>>>>>> SourceReaderBase and
> > >> > > >>>>>>> > > then
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> doing
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
> > >> > > >>>>>>> AlignedSplitReader
> > >> > > >>>>>>> > > >> could
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> be
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
> > >> instead
> > >> > be
> > >> > > >>>>>>> added to
> > >> > > >>>>>>> > the
> > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> KafkaSourceReader) to
> > >> > make
> > >> > > >>>>>>> it
> > >> > > >>>>>>> > explicit
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we
> could
> > >> > > >>>>>>> actually do
> > >> > > >>>>>>> > that.
> > >> > > >>>>>>> > > >> Given
> > >> > > >>>>>>> > > >> > >> the actual implementation of the
> > >> > > >>>>>>> SourceReaderBase#alignSplits is
> > >> > > >>>>>>> > > >> rather
> > >> > > >>>>>>> > > >> > >> short (just a forward to the corresponding
> method
> > >> of
> > >> > > >>>>>>> > SplitFetcher),
> > >> > > >>>>>>> > > >> we
> > >> > > >>>>>>> > > >> > >> could reimplement it in the actual source
> > >> > > >>>>>>> implementations. This
> > >> > > >>>>>>> > > >> solution
> > >> > > >>>>>>> > > >> > >> has the downside though. Authors of new
> sources
> > >> would
> > >> > > >>>>>>> have to do
> > >> > > >>>>>>> > > two
> > >> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and
> > >> implement
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> instead of just extending AlignedSplitReader.
> I
> > >> would
> > >> > > be
> > >> > > >>>>>>> fine
> > >> > > >>>>>>> > with
> > >> > > >>>>>>> > > >> such
> > >> > > >>>>>>> > > >> > a
> > >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Steven:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> For this part from the motivation section, is
> it
> > >> > > >>>>>>> accurate? Let's
> > >> > > >>>>>>> > > >> assume
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> one
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
> one of
> > >> the
> > >> > > >>>>>>> partition
> > >> > > >>>>>>> > is
> > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
> watermark
> > >> > for
> > >> > > >>>>>>> this
> > >> > > >>>>>>> > source
> > >> > > >>>>>>> > > >> task
> > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data
> from
> > >> > other
> > >> > > >>>>>>> two Kafka
> > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back
> the
> > >> > overall
> > >> > > >>>>>>> > watermark.
> > >> > > >>>>>>> > > I
> > >> > > >>>>>>> > > >> > >> thought the problem is that we may have late
> data
> > >> for
> > >> > > >>>>>>> this slow
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> partition.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> It will hold back the watermark. Watermark of
> an
> > >> > > >>>>>>> operator is the
> > >> > > >>>>>>> > > >> minimum
> > >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I have another question about the restart. Say
> > >> split
> > >> > > >>>>>>> alignment is
> > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
> failed and
> > >> > > >>>>>>> restored from
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > >> > last
> > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> > >> > > >>>>>>> checkpointed,
> > >> > > >>>>>>> > > initially
> > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
> cycle
> > >> of
> > >> > > >>>>>>> watermark
> > >> > > >>>>>>> > > >> > aggregation
> > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
> corner is
> > >> a
> > >> > > >>>>>>> problem. Just
> > >> > > >>>>>>> > > >> want
> > >> > > >>>>>>> > > >> > to
> > >> > > >>>>>>> > > >> > >> understand it more.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Your understanding is correct.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> @Becket:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
> > >> general
> > >> > use
> > >> > > >>>>>>> case, so
> > >> > > >>>>>>> > > >> should
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> we
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> > >> directly
> > >> > > >>>>>>> instead of
> > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > >> WithSplitAssignment?
> > >> > > We
> > >> > > >>>>>>> can
> > >> > > >>>>>>> > > provide
> > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> > >> compatibility
> > >> > > >>>>>>> won't be an
> > >> > > >>>>>>> > > >> issue.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I don't think we can provide a default
> > >> > implementation.
> > >> > > >>>>>>> How would
> > >> > > >>>>>>> > we
> > >> > > >>>>>>> > > >> do
> > >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it better
> than
> > >> > > having
> > >> > > >>>>>>> an
> > >> > > >>>>>>> > opt-in
> > >> > > >>>>>>> > > >> > >> interface? The default implementation would
> have
> > >> to
> > >> > be
> > >> > > >>>>>>> added
> > >> > > >>>>>>> > > >> exclusively
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> in
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the way
> > >> notice
> > >> > > >>>>>>> > > SourceReaderBase
> > >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
> > >> effectively
> > >> > > all
> > >> > > >>>>>>> > > >> implementations
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> do
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I
> think
> > >> it is
> > >> > > >>>>>>> impossible
> > >> > > >>>>>>> > to
> > >> > > >>>>>>> > > >> > >> implement the SourceReader interface directly
> by
> > >> end
> > >> > > >>>>>>> users.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> > >> > probably
> > >> > > >>>>>>> also
> > >> > > >>>>>>> > needs
> > >> > > >>>>>>> > > >> some
> > >> > > >>>>>>> > > >> > >> change to support throttling at the split
> > >> > granularity.
> > >> > > >>>>>>> Can you
> > >> > > >>>>>>> > add
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >> interface change into the public interface
> > >> section as
> > >> > > >>>>>>> well?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> It has been added from the beginning. See
> > >> > > >>>>>>> *AlignedSplitReader.*
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> > >> > assignSplits
> > >> > > >>>>>>> here,
> > >> > > >>>>>>> > given
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> it
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> is not actually changing the split
> assignments? It
> > >> > > seems
> > >> > > >>>>>>> > something
> > >> > > >>>>>>> > > >> like
> > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > >> adjustSplitsThrottling() is
> > >> > > >>>>>>> more
> > >> > > >>>>>>> > > accurate.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not
> assign. Do
> > >> you
> > >> > > >>>>>>> still
> > >> > > >>>>>>> > prefer
> > >> > > >>>>>>> > > a
> > >> > > >>>>>>> > > >> > >> different name for that? Personally, I am
> open for
> > >> > > >>>>>>> suggestions
> > >> > > >>>>>>> > > here.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Dawid
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> [1]
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> >
> > >> > > >>>>>>>
> > >> > >
> > >> >
> > >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I
> think
> > >> the
> > >> > > >>>>>>> motivation
> > >> > > >>>>>>> > > >> makes a
> > >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions /
> questions.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
> > >> general
> > >> > use
> > >> > > >>>>>>> case, so
> > >> > > >>>>>>> > > >> should
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> we
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> > >> directly
> > >> > > >>>>>>> instead of
> > >> > > >>>>>>> > > >> > >> introducing the new interface of
> > >> WithSplitAssignment?
> > >> > > We
> > >> > > >>>>>>> can
> > >> > > >>>>>>> > > provide
> > >> > > >>>>>>> > > >> > >> default implementations, so backwards
> > >> compatibility
> > >> > > >>>>>>> won't be an
> > >> > > >>>>>>> > > >> issue.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> > >> > probably
> > >> > > >>>>>>> also
> > >> > > >>>>>>> > needs
> > >> > > >>>>>>> > > >> some
> > >> > > >>>>>>> > > >> > >> change to support throttling at the split
> > >> > granularity.
> > >> > > >>>>>>> Can you
> > >> > > >>>>>>> > add
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >> interface change into the public interface
> > >> section as
> > >> > > >>>>>>> well?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> > >> > assignSplits
> > >> > > >>>>>>> here,
> > >> > > >>>>>>> > given
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> it
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> is not actually changing the split
> assignments? It
> > >> > > seems
> > >> > > >>>>>>> > something
> > >> > > >>>>>>> > > >> like
> > >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> > >> adjustSplitsThrottling() is
> > >> > > >>>>>>> more
> > >> > > >>>>>>> > > accurate.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
> > >> > > >>>>>>> stevenz3wu@gmail.com
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> > > <
> > >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
> > >> > > >>>>>>> stevenz3wu@gmail.com>
> > >> > > >>>>>>> > <
> > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> However, a single source operator may read
> data
> > >> from
> > >> > > >>>>>>> multiple
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
> > >> partitions,
> > >> > > such
> > >> > > >>>>>>> that
> > >> > > >>>>>>> > even
> > >> > > >>>>>>> > > >> with
> > >> > > >>>>>>> > > >> > >> watermark alignment the source operator may
> need
> > >> to
> > >> > > >>>>>>> buffer
> > >> > > >>>>>>> > > excessive
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> amount
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> of data if one split emits data faster than
> > >> another.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> For this part from the motivation section, is
> it
> > >> > > >>>>>>> accurate? Let's
> > >> > > >>>>>>> > > >> assume
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> one
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and
> one of
> > >> the
> > >> > > >>>>>>> partition
> > >> > > >>>>>>> > is
> > >> > > >>>>>>> > > >> > >> significantly slower. In this situation,
> watermark
> > >> > for
> > >> > > >>>>>>> this
> > >> > > >>>>>>> > source
> > >> > > >>>>>>> > > >> task
> > >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data
> from
> > >> > other
> > >> > > >>>>>>> two Kafka
> > >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back
> the
> > >> > overall
> > >> > > >>>>>>> > watermark.
> > >> > > >>>>>>> > > I
> > >> > > >>>>>>> > > >> > >> thought the problem is that we may have late
> data
> > >> for
> > >> > > >>>>>>> this slow
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> partition.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I have another question about the restart. Say
> > >> split
> > >> > > >>>>>>> alignment is
> > >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job
> failed and
> > >> > > >>>>>>> restored from
> > >> > > >>>>>>> > > the
> > >> > > >>>>>>> > > >> > last
> > >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> > >> > > >>>>>>> checkpointed,
> > >> > > >>>>>>> > > initially
> > >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a
> cycle
> > >> of
> > >> > > >>>>>>> watermark
> > >> > > >>>>>>> > > >> > aggregation
> > >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this
> corner is
> > >> a
> > >> > > >>>>>>> problem. Just
> > >> > > >>>>>>> > > >> want
> > >> > > >>>>>>> > > >> > to
> > >> > > >>>>>>> > > >> > >> understand it more.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
> > >> > > >>>>>>> thw@apache.org> <
> > >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org>
> <
> > >> > > >>>>>>> > > >> > >> thw@apache.org> <
> > >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks for working on this!
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > >> > > >>>>>>> SourceReaderBase and
> > >> > > >>>>>>> > > then
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> doing
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
> > >> > > >>>>>>> AlignedSplitReader
> > >> > > >>>>>>> > > >> could
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> be
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
> > >> instead
> > >> > be
> > >> > > >>>>>>> added to
> > >> > > >>>>>>> > the
> > >> > > >>>>>>> > > >> > >> specific source reader (i.e.
> KafkaSourceReader) to
> > >> > make
> > >> > > >>>>>>> it
> > >> > > >>>>>>> > explicit
> > >> > > >>>>>>> > > >> that
> > >> > > >>>>>>> > > >> > >> the source actually supports it.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Thanks,
> > >> > > >>>>>>> > > >> > >> Thomas
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin
> Knauf <
> > >> > > >>>>>>> > > knaufk@apache.org>
> > >> > > >>>>>>> > > >> <
> > >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
> > >> > knaufk@apache.org
> > >> > > >
> > >> > > >>>>>>> <
> > >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> > >> > > interface
> > >> > > >>>>>>> (aka the
> > >> > > >>>>>>> > > >> stop
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> &
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for
> Kafka and
> > >> > > >>>>>>> Pulsar only,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> correct?
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to
> > >> complete
> > >> > the
> > >> > > >>>>>>> watermark
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> aligned
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> story with this FLIP.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Cheers,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Konstantin
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid
> Wysakowicz
> > >> <
> > >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I
> support it
> > >> ;)
> > >> > I
> > >> > > >>>>>>> think we
> > >> > > >>>>>>> > can
> > >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there are no
> > >> concerns
> > >> > > so
> > >> > > >>>>>>> far.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Dawid
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Dear Flink developers,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> I would like to open a discussion on FLIP 217
> [1]
> > >> for
> > >> > > an
> > >> > > >>>>>>> > extension
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> of
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment also
> in
> > >> > > >>>>>>> SplitReaders. To
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> do
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> so,
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and
> resume
> > >> > reading
> > >> > > >>>>>>> from
> > >> > > >>>>>>> > split
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> sources
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and
> controlls
> > >> > > >>>>>>> suspend and
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> resume.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> To
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> gather information about current watermarks
> of the
> > >> > > >>>>>>> SplitReaders,
> > >> > > >>>>>>> > we
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> extend
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and
> report
> > >> > > >>>>>>> watermarks to
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> the
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> SourceOperator.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped
> by
> > >> Arvid
> > >> > > >>>>>>> Heise and
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> revised
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> and
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of
> the
> > >> > work.)
> > >> > > >>>>>>> and me.
> > >> > > >>>>>>> > The
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> changes
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> are backwards compatible in a way that if
> affected
> > >> > > >>>>>>> components do
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> not
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> support split alignment the behavior is as
> before.
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Best,
> > >> > > >>>>>>> > > >> > >> Sebastian
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> [1]
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> >
> > >> > > >>>>>>>
> > >> > >
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> [2]
> > >> > > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> --
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > >> > > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> > >>
> > >> > > >>>>>>> > > >> >
> > >> > > >>>>>>> > > >>
> > >> > > >>>>>>> > > >
> > >> > > >>>>>>> > >
> > >> > > >>>>>>> >
> > >> > > >>>>>>>
> > >> > > >>>>>>
> > >> > >
> > >> >
> > >>
> > >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Thomas Weise <th...@apache.org>.
Hi everyone,

Thank you for all the effort that went into this discussion. The split
level watermark alignment will be an important feature for Flink that
will address operational problems for various use cases. From reading
through this thread it appears that not too much remains to bring this
FLIP to acceptance and allow development to move forward. I would like
to contribute if possible.

Regarding option 1 vs. option 2: I don't have a strong preference,
perhaps slightly leaning towards option 1.

However, from a user perspective, should the split level alignment be
an opt-in feature, at least for a few releases? If yes, then we would
require a knob similar to supportsPausingSplits(), which I understand
won't be part of the revised FLIP. Such control may be beneficial:

* Compare runtime behavior with split level alignment on/off
* Allow use of sources that don't implement pausing splits yet

The second point would, from the user's perspective, be necessary for
backward compatibility? While the interface aspect and source
compatibility has been discussed in great detail, I don't think it
would be desirable if an application that already uses alignment fails
after upgrading to the new Flink version, forcing users to lock step
modify sources for the new non-optional split level alignment.

So I think clarification of the compatibility aspect on the FLIP page
would be necessary.

Thanks,
Thomas

On Mon, May 30, 2022 at 3:29 AM Piotr Nowojski <pi...@gmail.com> wrote:
>
> Hi Becket,
>
> Thanks for summing this up. Just one correction:
>
> > Piotr prefers option 2, his opinions are:
> >   e) It is OK that the code itself in option 2 indicates the developers
> that a feature is optional. We will rely on the documentation to correct
> that and clarify that the feature is actually obligatory.
>
> I would say based on a) and b) that feature would be still optional. So
> both the implementation and the documentation would be saying that. We
> could add a mention to the docs and release notes, that this feature will
> be obligatory in the next major release and plan such a release accordingly.
>
> Re the option 1., as you mentioned:
> > As for option 1: For developers, the feature is still optional due to the
> default implementation in the interface, regardless of what the default
> implementation does, because the code compiles without overriding these
> methods
>
> Also importantly, the code will work in most cases.
>
> > Obligatory: Jobs may fail if these methods are not implemented properly.
> e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in Java,
> e.g. Iterator.remove() by default throws "UnsupportedOperationException",
> informing the implementation that things may go wrong if this method is not
> implemented.
>
> For me `Iterator#remove()` is an optional feature. Personally, I don't
> remember if I have ever implemented it.
>
> Best,
> Piotrek
>
> pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com> napisał(a):
>
> > I had an offline discussion with Piotr and here is the summary. Please
> > correct me if I miss something, Piotr.
> >
> > There are two things we would like to seek more opinions from the
> > community, so we can make progress on this FLIP.
> >
> > 1. The General pattern to add obligatory features to existing interfaces.
> >
> > ***********************************************************************************
> > For interfaces exposed to the developers for implementation, they are
> > either intended to be *optional* or *obligatory. *While it is quite clear
> > about how to convey that intention when creating the interfaces, it is not
> > as commonly agreed when we are adding new features to an existing
> > interface. In general, Flink uses decorative interfaces when adding
> > optional features to existing interfaces. Both Piotr and I agree that looks
> > good.
> >
> > Different opinions are mainly about how to add obligatory features to the
> > existing interfaces, probably due to different understandings of
> > "obligatory".
> >
> > We have discussed about four options:
> >
> > *Option 1:*
> >
> >    - Just add a new method to the existing interface.
> >    - For backwards compatibility, the method would have a default
> >    implementation throwing "UnsupportedOperationException".
> >    - In the next major version, remove the default implementation.
> >    - For the developers, any method with a default implementation
> >    throwing an "UnsupportedOperationException" should be taken as obligatory.
> >
> > *Option 2:*
> >
> >    - Always make the features optional by adding a decorative interface,
> >    just like ordinary optional features.
> >    - Inform the developers via documentation that this feature is
> >    obligatory, although it looks like optional from the code.
> >    - In case the developers did not implement the decorative interface,
> >    throw an exception
> >    - In the next major version, move the methods in the decorative
> >    interface to the base interface, and deprecate the decorative interface.
> >
> > *Option 3:*
> >
> >    - Always bump the major version when a new obligatory feature is
> >    added, even if we may have to do it frequently.
> >
> > *Option 4:*
> >
> >    - Add a V2, V3... of the interface affected by the new obligatory
> >    feature.
> >    - In the next major versions, deprecate old versions of the interfaces.
> >
> > Both Piotr and me agreed that option 3 and option 4 have a big side effect
> > and should be avoided. We have different preference between option 1 and
> > option 2.
> >
> > Personally I prefer option 1, the reasons are:
> >   a) simple and intuitive. Java 8 introduced the default impl in
> > interfaces exactly for interface evolving, and this is a common pattern in
> > many projects.
> >   b) prominent to the developers that the feature is expected to be
> > implemented, because it explicitly throws an exception in the default impl.
> >   c) low maintenance overhead - the Flink framework can always assume the
> > method exists, so no special handling logic is needed.
> >   d) communicate a clear semantic boundary between optional and obligatory
> > features in the Flink to the developers.
> >       - Optional: Jobs still run without exception if these methods are
> > not implemented. e.g. all the SupportsXXXPushDown interfaces.
> >       - Obligatory: Jobs may fail if these methods are not implemented
> > properly. e..g SourceReader#pauseOrResumeSplits(). This is a common pattern
> > in Java, e.g. Iterator.remove() by default throws
> > "UnsupportedOperationException", informing the implementation that things
> > may go wrong if this method is not implemented.
> >
> > As for option 2, Although the API itself sounds clean, it misleads people
> > to think of an obligatory feature to be optional - from the code the
> > feature is optional, but the documents say it is obligatory. We probably
> > should avoid such code-doc inconsistency, as people will be confused. And I
> > would actually be bewildered that sometimes not implementing an "optional"
> > feature is fine, but sometimes it causes the jobs to fail.
> >
> > In response to the argument that the method with a default implementation
> > is always optional, if that is true, it actually means all the interfaces
> > should be immutable once they are created. If we want to add a method to an
> > existing interface, for backwards compatibility, we will have to provide a
> > default implementation. And the fact it has a default implementation
> > indicates the method is optional. If that method is optional, it should
> > reside in a separate decorative interface, otherwise it clogs that existing
> > interface. Therefore, people should never add a method to an existing
> > interface. I find this conclusion a bit extreme.
> >
> > Piotr prefers option 2, his opinions are:
> >     a) Obligatory methods are the methods that fail the code compilation
> > if not implemented.
> >     b) All obligatory methods should reside in the base interface, without
> > a default implementation. And all the optional methods should be in
> > decorative interfaces. This is a clean API.
> >     c) due to b), there isn't a viable solution to add an obligatory
> > method to an existing interface in a backwards compatible way. Unless we
> > are OK with breaking backwards compatibility, all the interfaces should be
> > treated as immutable. As a compromise, we might as well just treat all the
> > features added later as optional features. This way we keep the API clean.
> >     d) based on b) and c), option 2 has a clean API, while option 1 does
> > not.
> >     e) It is OK that the code itself in option 2 indicates the developers
> > that a feature is optional. We will rely on the documentation to correct
> > that and clarify that the feature is actually obligatory.
> >     f) Regarding the effectiveness of making people aware that the feature
> > is obligatory, Option 1 and Option 2 are similar. For people that do not
> > read the release note / documentation, they will mistake the feature to be
> > optional anyways.
> >
> > As for option 1: For developers, the feature is still optional due to the
> > default implementation in the interface, regardless of what the default
> > implementation does, because the code compiles without overriding these
> > methods. Also, another problem of this option is that for users that do not
> > know about the history of the interface, they may be confused by the
> > default implementation throwing an exception.
> >
> >
> > 2. For this particular FLIP, should it be optional or not?
> >
> > ***********************************************************************************
> > As mentioned in the previous email, I feel this FLIP should be obligatory,
> > for the following reasons:
> > 1. The Flink framework exposes the watermark alignment API to the end
> > users. From the end users' perspective, the feature should be available
> > regardless of the implementation details in the pluggables. This is true
> > for any other methods exposed as the Flink API.
> > 2. If a Source is not pausable, the end user should receive an exception
> > when enable the watermark alignment, (both Piotr and me agree on this). In
> > that case, it meets my criteria of obligatory feature because not
> > implementing the feature causes a framework API to throw exception and
> > fails the job.
> >
> > On the other hand, Piotr does not have a strong opinion regarding whether
> > this feature should be optional or not.
> >
> >
> > Thanks for reading through this long email. So basically in order to make
> > progress on this FLIP, we want to see what do people feel about the above
> > two topics.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> >> Hi Becket,
> >>
> >> I still sustain what I wrote before:
> >> > I think I would still vote soft -1 on this option, but I wouldn't block
> >> it in case I am out-voted.
> >>
> >> > I think it might be helpful to agree on the definition of optional in
> >> our
> >> case.
> >>
> >> For me it doesn't matter whether a default method throwing an exception we
> >> call optional or non-optional. As long as we keep it this way, the effect
> >> is the same. It's effectively a method that a user doesn't have to
> >> implement. If interface/system allows some methods to be not implemented,
> >> some users will do just that, regardless if we call it and document as
> >> non-optional. And at the same time it's clogging the base interface.
> >>
> >> By the way, just the need for a java-doc/documentation explaining the
> >> existence of some construct is a bad smell (code should be
> >> self-documenting
> >> and default method throwing an UnsupportedOperationException is not).
> >>
> >> > Please note that so far we do not assume whether the feature is in
> >> > the original API or it is added later. A newly added feature can also be
> >> > non-optional, although it might take some time for all the pluggable
> >> > developers to catch up, and they should still work if the new feature is
> >> > not used until they catch up. In contrast, we may never expect an
> >> optional
> >> > feature to catch up, because leaving it unimplemented is also blessed.
> >> >
> >> > Let's take the checkpointing as an example. Imagine Flink did not
> >> support
> >> > checkpointing before release 1.16. And now we are trying to add
> >> > checkpointing to Flink. So we exposed the checkpoint configuration to
> >> the
> >> > end users. In the meantime, will we tell the pluggable (e.g. operators,
> >> > connectors) developers that methods like "snapshotState()" is optional?
> >> If
> >> > we do that, the availability of checkpointing in Flink would be severely
> >> > weakened. But apparently we should still allow the existing
> >> implementations
> >> > to work without checkpointing. It looks to me that adding the method to
> >> the
> >> > pluggable interfaces with a default implementation throwing
> >> > "UnsupportedOperationException" would be the solution here. Please note
> >> > that in this case, having the default implementation does not mean this
> >> is
> >> > optional. It is just the technique to support backwards compatibility in
> >> > the feature evolution. The fact that this method is in the base
> >> interface
> >> > suggests it is not optional, so the developers SHOULD implement it.
> >>
> >> I would soft vote -1 for having the default method throwing
> >> UnsupportedOperationException as one of thing for this (FLIP-217) special
> >> circumstances.
> >>
> >> At the moment, without thinking this over too much, I would vote harder -1
> >> for having this as a general rule when adding new features. If we ever end
> >> up with an API that is littered with default methods throwing
> >> UnsupportedOperationException that are documented as "non optional" it
> >> would be IMO a big design failure. I would be ok-ish with that, only if
> >> that was a temporary thing and we had an aggressive plan to release more
> >> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
> >> compatibility, that would get rid of those default methods. Adding
> >> checkpointing and methods like "snapshotState()" would IMO easily justify
> >> a
> >> new major Flink release. In that case we could add those methods with
> >> default implementation for some transition period, a one or two minor
> >> releases, followed by a clean up in a major release. However I would still
> >> argue that it would be cleaner/better to add a decorative interface like
> >> `CheckpointedOperator` interface instead of adding those default methods
> >> to
> >> the base `Operator` interface.
> >>
> >> I think I can sum up our disagreement here is that I would like to keep
> >> the
> >> interfaces simpler, with only obligatory methods/features on one side and
> >> clearly optional features on the other. While you would like to add an
> >> extra third state in between those two?
> >>
> >> Best,
> >> Piotrek
> >>
> >>
> >>
> >> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com> napisał(a):
> >>
> >> > Thanks for the clarification, Piotr and Sebastian.
> >> >
> >> > It looks like the key problem is still whether the implementation of
> >> > pausable splits in the Sources should be optional or not.
> >> >
> >> > I think it might be helpful to agree on the definition of optional in
> >> our
> >> > case. To me:
> >> > Optional = "You CAN leave the method unimplemented, and that is fine."
> >> > Non-Optional = "You CAN leave the method unimplemented, but you SHOULD
> >> NOT,
> >> > because people assume this works."
> >> >
> >> > I think one sufficient condition of a Non-Optional feature is that if
> >> the
> >> > feature is exposed through the framework API, Flink should expect the
> >> > pluggables to support this feature by default. Otherwise the
> >> availability
> >> > of that feature becomes undefined.
> >> >
> >> > Please note that so far we do not assume whether the feature is in
> >> > the original API or it is added later. A newly added feature can also be
> >> > non-optional, although it might take some time for all the pluggable
> >> > developers to catch up, and they should still work if the new feature is
> >> > not used until they catch up. In contrast, we may never expect an
> >> optional
> >> > feature to catch up, because leaving it unimplemented is also blessed.
> >> >
> >> > Let's take the checkpointing as an example. Imagine Flink did not
> >> support
> >> > checkpointing before release 1.16. And now we are trying to add
> >> > checkpointing to Flink. So we exposed the checkpoint configuration to
> >> the
> >> > end users. In the meantime, will we tell the pluggable (e.g. operators,
> >> > connectors) developers that methods like "snapshotState()" is optional?
> >> If
> >> > we do that, the availability of checkpointing in Flink would be severely
> >> > weakened. But apparently we should still allow the existing
> >> implementations
> >> > to work without checkpointing. It looks to me that adding the method to
> >> the
> >> > pluggable interfaces with a default implementation throwing
> >> > "UnsupportedOperationException" would be the solution here. Please note
> >> > that in this case, having the default implementation does not mean this
> >> is
> >> > optional. It is just the technique to support backwards compatibility in
> >> > the feature evolution. The fact that this method is in the base
> >> interface
> >> > suggests it is not optional, so the developers SHOULD implement it.
> >> >
> >> > When it comes to this FLIP, I think it meets the criteria of
> >> non-optional
> >> > features, so we should just use the evolution path of non-optional
> >> > features.
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> >
> >> >
> >> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <pn...@apache.org>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > Actually previously I thought about having a decorative interface and
> >> > > whenever watermark alignment is enabled, checking that the source
> >> > > implements the decorative interface. If not, throwing an exception.
> >> > >
> >> > > The option with default methods in the source interfaces throwing
> >> > > `UnsupportedOperationException` I think still suffers from the same
> >> > > problems I mentioned before. It's still an optional implementation
> >> and at
> >> > > the same time it's clogging the base interface. I think I would still
> >> > vote
> >> > > soft -1 on this option, but I wouldn't block it in case I am
> >> out-voted.
> >> > >
> >> > > Best,
> >> > > Piotrek
> >> > >
> >> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <se...@ververica.com>
> >> > > napisał(a):
> >> > >
> >> > > > Hi Becket,
> >> > > >
> >> > > > Thanks a lot for your fast and detailed response. For me, it
> >> converges
> >> > > and
> >> > > > dropping the supportsX method sounds very reasonable to me. (Side
> >> note:
> >> > > > With "pausable splits" enabled as "default" I think we
> >> misunderstood.
> >> > As
> >> > > > you described now "default" I understand as that it should be the
> >> new
> >> > > > recommended way of implementation, and I think that is fully valid.
> >> > > Before,
> >> > > > I understood "default" here as the default implementation, i.e.,
> >> > throwing
> >> > > > UnsupportedOperationException, which is the exact opposite. :) )
> >> > > >
> >> > > > Nevertheless: As mentioned, an open question for me is if watermark
> >> > > > alignment should enforce pausable splits. For clarification, the
> >> > current
> >> > > > documentation [1] says:
> >> > > >
> >> > > > *Note:* As of 1.15, Flink supports aligning across tasks of the same
> >> > > >> source and/or different sources. It does not support aligning
> >> > > >> splits/partitions/shards in the same task.
> >> > > >>
> >> > > >> In a case where there are e.g. two Kafka partitions that produce
> >> > > >> watermarks at different pace, that get assigned to the same task
> >> > > watermark
> >> > > >> might not behave as expected. Fortunately, worst case it should not
> >> > > perform
> >> > > >> worse than without alignment.
> >> > > >>
> >> > > >> Given the limitation above, we suggest applying watermark
> >> alignment in
> >> > > >> two situations:
> >> > > >>
> >> > > >>    1. You have two different sources (e.g. Kafka and File) that
> >> > produce
> >> > > >>    watermarks at different speeds
> >> > > >>    2. You run your source with parallelism equal to the number of
> >> > > >>    splits/shards/partitions, which results in every subtask being
> >> > > assigned a
> >> > > >>    single unit of work.
> >> > > >>
> >> > > >> I personally see no issue in implementing and I see no reason
> >> against
> >> > > > implementing this dependency of watermark alignment and pausable
> >> > splits.
> >> > > (I
> >> > > > think this would even be a good path towards shaping watermark
> >> > alignment
> >> > > in
> >> > > > 1.16.) However, "I don't see" means that I would be happy to hear
> >> > Dawid's
> >> > > > and Piotrek's opinions as they implemented watermark alignment
> >> based on
> >> > > > FLIP-182 [2] and I don't want to miss relevant rationale/background
> >> > info
> >> > > > from their side.
> >> > > >
> >> > > > *@Piotrek* *@Dawid *What do you think?
> >> > > >
> >> > > > Regards,
> >> > > > Sebastian
> >> > > >
> >> > > > [1]
> >> > > >
> >> > >
> >> >
> >> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> >> > > > [2]
> >> > > >
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> >> > > >
> >> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <be...@gmail.com>
> >> > wrote:
> >> > > >
> >> > > >> +dev
> >> > > >>
> >> > > >> Hi Sebastian,
> >> > > >>
> >> > > >> Thank you for the summary. Please see the detailed replies inline.
> >> As
> >> > a
> >> > > >> recap of my suggestions.
> >> > > >>
> >> > > >> 1. Pausable splits API.
> >> > > >>   a) Add default implementations to methods "pauseOrResumeSplits"
> >> in
> >> > > both
> >> > > >> SourceReader and SplitReader where both default implementations
> >> throw
> >> > > >>  UnsupportedOperationException.
> >> > > >>
> >> > > >> 2. User story.
> >> > > >>     a) We tell users to enable the watermark alignment as they
> >> like.
> >> > > This
> >> > > >> is exactly what the current Flink API is.
> >> > > >>     b) We tell the source developers, please implement pausable
> >> > splits,
> >> > > >> otherwise bad things may happen. Think of it like you are expected
> >> to
> >> > > >> implement SourceReader#snapshotState() properly, otherwise
> >> exceptions
> >> > > will
> >> > > >> be thrown when users enable checkpointing.
> >> > > >>
> >> > > >> Thanks,
> >> > > >>
> >> > > >> Jiangjie (Becket) Qin
> >> > > >>
> >> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> >> > > >> sebastian@ververica.com> wrote:
> >> > > >>
> >> > > >>> Hi Becket, Hi everybody,
> >> > > >>>
> >> > > >>> I'm sorry if I misread the messages but I could not derive an
> >> > agreement
> >> > > >>> from the mailing list. Nevertheless, if I understand you right the
> >> > > >>> suggestion is:
> >> > > >>>
> >> > > >>> * Add default implementations to methods "pauseOrResumeSplits" in
> >> > both
> >> > > >>> SourceReader and SplitReader where both default implementations
> >> throw
> >> > > >>> UnsupportedOperationException.
> >> > > >>>
> >> > > >> Yes.
> >> > > >>
> >> > > >> * Add "supportsPauseOrResumeSplits" to the Source interface. (In
> >> the
> >> > > >>> following, I refer to supporting this as "pausable splits".)
> >> > > >>>
> >> > > >> We may no longer need this if pausable splits are expected to be
> >> > > >> implemented by the source developers, i.e. non-optional. Having
> >> this
> >> > > method
> >> > > >> would then be somewhat misleading as it looks like the sources
> >> that do
> >> > > not
> >> > > >> support pausable splits are also acceptable in the long term. So
> >> API
> >> > > wise,
> >> > > >> I'd say maybe we should remove this for this FLIP, although I
> >> believe
> >> > > this
> >> > > >> supportXXX pattern itself is still attractive for optional
> >> features.
> >> > > >>
> >> > > >>
> >> > > >>>
> >> > > >>> To make the conclusions explicit:
> >> > > >>>
> >> > > >>> 1. The implementation of pauseOrResumeSplits in both interfaces
> >> > > >>> SourceReader and SplitReader are optional where the default is
> >> that
> >> > it
> >> > > >>> doesn't support it. (--> This means that the implementation is
> >> still
> >> > > >>> optional for the source developer.)
> >> > > >>>
> >> > > >> It is optional for backwards compatibility with existing sources,
> >> as
> >> > > they
> >> > > >> may still compile without code change. But starting from this FLIP,
> >> > > Flink
> >> > > >> will always optimistically assume that all the sources support
> >> > pausable
> >> > > >> splits. If a source does not support pausable splits, it goes to an
> >> > > error
> >> > > >> handling path when watermark alignment is enabled on it. This is
> >> > > different
> >> > > >> from a usual optional feature, where no error is expected.
> >> > > >>
> >> > > >>
> >> > > >>> 2. If watermark alignment is enabled in the application code by
> >> > adding
> >> > > >>> withWatermarkAlignment to the WatermarkStrategy while
> >> SourceReader or
> >> > > >>> SplitReader do not support pausableSplits, we throw an
> >> > > >>> UnsupportedOperationException.
> >> > > >>>
> >> > > >> Yes.
> >> > > >>
> >> > > >>
> >> > > >>> 3. With regard to your statement:
> >> > > >>>
> >> > > >>>> [...] basically means watermark alignment is an non-optional
> >> feature
> >> > > to
> >> > > >>>> the end users.
> >> > > >>>
> >> > > >>> You actually mean that "pausable splits" are non-optional for the
> >> app
> >> > > >>> developer if watermark alignment is enabled. However, watermark
> >> > > alignment
> >> > > >>> is optional and can be enabled/disabled.
> >> > > >>>
> >> > > >> Yes, watermark alignment can be enabled/disabled in individual
> >> sources
> >> > > in
> >> > > >> Flink jobs, which basically means the code supporting watermark
> >> > > alignment
> >> > > >> has to already be there. That again means the Source developers are
> >> > also
> >> > > >> expected to support pausable splits by default. So this way we
> >> > > essentially
> >> > > >> tell the end users that you may enable / disable this feature as
> >> you
> >> > > wish,
> >> > > >> and tell the source developers that you SHOULD implement this
> >> because
> >> > > the
> >> > > >> end users may turn it on/off at will. And if the source does not
> >> > support
> >> > > >> pausable splits, that goes to an error handling path when watermark
> >> > > >> alignment is enabled on it. So users know they have to explicitly
> >> > > exclude
> >> > > >> this source.
> >> > > >>
> >> > > >>
> >> > > >>>
> >> > > >>> So far it's totally clear to me and I hope this is what you mean.
> >> I
> >> > > also
> >> > > >>> agree with both statements:
> >> > > >>>
> >> > > >>> So making that expectation aligned with the source developers
> >> seems
> >> > > >>>> reasonable.
> >> > > >>>>
> >> > > >>>
> >> > > >>> I think this is a simple and clean solution from both the end user
> >> > and
> >> > > >>>> source developers' standpoint.
> >> > > >>>>
> >> > > >>>
> >> > > >>> However, a last conclusion derives from 3. and is an open question
> >> > for
> >> > > >>> me:
> >> > > >>>
> >> > > >>> 4. The feature of "pausable splits" is now tightly bound to
> >> watermark
> >> > > >>> alignment, i.e., if sources do not support "pausable splits" one
> >> can
> >> > > not
> >> > > >>> enable watermark alignment for these sources. This dependency is
> >> not
> >> > > the
> >> > > >>> current status of watermark alignment implementation because it
> >> > is/was
> >> > > >>> implemented without pausable splits. Do we want to introduce this
> >> > > >>> dependency? (This is an open question. I cannot judge that.)
> >> > > >>>
> >> > > >> The watermark alignment basically relies on the pausable splits,
> >> > right?
> >> > > >> So personally I found it quite reasonable that if the source does
> >> not
> >> > > >> support pausable splits, end users cannot enable watermark
> >> alignment
> >> > on
> >> > > it.
> >> > > >>
> >> > > >>
> >> > > >>> If something is wrong, please correct me.
> >> > > >>>
> >> > > >>> Regards,
> >> > > >>> Sebastian
> >> > > >>>
> >> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com>
> >> > > wrote:
> >> > > >>>
> >> > > >>>> Hi Sebastian,
> >> > > >>>>
> >> > > >>>> Thanks for the reply and patient discussion. I agree this is a
> >> > tricky
> >> > > >>>> decision.
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I
> >> see
> >> > as
> >> > > >>>>> follows:
> >> > > >>>>> (1) An interface with default NOOP implementation makes the
> >> > > >>>>> implementation optional. And in my opinion, a default
> >> > implementation
> >> > > is and
> >> > > >>>>> will remain a way of making implementation optional because
> >> even in
> >> > > future
> >> > > >>>>> a developer can decide to implement the "old flavor" without
> >> > support
> >> > > for
> >> > > >>>>> pausable splits.
> >> > > >>>>> (2) It may not be too critical but I also find it suboptimal
> >> that
> >> > > with
> >> > > >>>>> a NOOP default implementation there is no way to check at
> >> runtime
> >> > if
> >> > > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
> >> > one
> >> > > would
> >> > > >>>>> need a supportsX method which makes it again more complicated.)\
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> Based on the last few messages in the mailing list.  Piotr and I
> >> > > agreed
> >> > > >>>> that the default implementation should just throw an
> >> > > >>>> UnsupportedOperationException if the source is unpausable. So
> >> this
> >> > > >>>> basically tells the Source developers that this feature is
> >> expected
> >> > > to be
> >> > > >>>> supported. Because we cannot prevent end users from putting an
> >> > > unpausable
> >> > > >>>> source into the watermark alignment group, that basically means
> >> > > watermark
> >> > > >>>> alignment is an non-optional feature to the end users. So making
> >> > that
> >> > > >>>> expectation aligned with the source developers seems reasonable.
> >> > And
> >> > > if a
> >> > > >>>> source does not support this feature, the end users should
> >> > explicitly
> >> > > >>>> remove that source from the watermark alignment group.
> >> > > >>>>
> >> > > >>>> Personally speaking I think this is a simple and clean solution
> >> from
> >> > > >>>> both the end user and source developers' standpoint.
> >> > > >>>>
> >> > > >>>> Does this address your concerns?
> >> > > >>>>
> >> > > >>>> Thanks,
> >> > > >>>>
> >> > > >>>> Jiangjie (Becket) Qin
> >> > > >>>>
> >> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> >> > > >>>> sebastian@ververica.com> wrote:
> >> > > >>>>
> >> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> >> > > >>>>>
> >> > > >>>>> we, Dawid and I, discussed the various suggestions/options and
> >> we
> >> > > >>>>> would be okay either way because we find neither solution is
> >> > perfect
> >> > > just
> >> > > >>>>> because of the already present complexity.
> >> > > >>>>>
> >> > > >>>>> Option c) Adding methods to the interfaces of SourceReader and
> >> > > >>>>> SplitReader
> >> > > >>>>> Option a) Adding decorative interfaces to be used by
> >> SourceReader
> >> > and
> >> > > >>>>> SplitReader
> >> > > >>>>>
> >> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is based on
> >> > > >>>>> Option c) which we find acceptable because the complexity added
> >> is
> >> > > only a
> >> > > >>>>> single method.
> >> > > >>>>>
> >> > > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I
> >> see
> >> > as
> >> > > >>>>> follows:
> >> > > >>>>> (1) An interface with default NOOP implementation makes the
> >> > > >>>>> implementation optional. And in my opinion, a default
> >> > implementation
> >> > > is and
> >> > > >>>>> will remain a way of making implementation optional because
> >> even in
> >> > > future
> >> > > >>>>> a developer can decide to implement the "old flavor" without
> >> > support
> >> > > for
> >> > > >>>>> pausable splits.
> >> > > >>>>> (2) It may not be too critical but I also find it suboptimal
> >> that
> >> > > with
> >> > > >>>>> a NOOP default implementation there is no way to check at
> >> runtime
> >> > if
> >> > > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
> >> > one
> >> > > would
> >> > > >>>>> need a supportsX method which makes it again more complicated.)
> >> > > >>>>>
> >> > > >>>>> However, we haven't changed it because Option a) is also not
> >> > optimal
> >> > > >>>>> or straight-forward:
> >> > > >>>>> (1) We need to add two distinct yet similar decorative
> >> interfaces
> >> > > >>>>> since, as mentioned, the signatures of the methods are
> >> different.
> >> > For
> >> > > >>>>> example, we would need decorative interfaces like
> >> > > >>>>> `SplitReaderWithPausableSplits` and
> >> > `SourceReaderWithPausableSplits`.
> >> > > >>>>> (2) As a consequence, we would need to somehow document
> >> how/where
> >> > to
> >> > > >>>>> implement both interfaces and how this relates to each other.
> >> This
> >> > > we could
> >> > > >>>>> solve by adding a note in the interface of SourceReader and
> >> > > SplitReader and
> >> > > >>>>> reference to the decorative interfaces but it still increases
> >> > > complexity
> >> > > >>>>> too.
> >> > > >>>>>
> >> > > >>>>> In summary, we see both as acceptable and preferred over other
> >> > > >>>>> options. The question is if we can find a solution or compromise
> >> > > that is
> >> > > >>>>> acceptable for everybody to reach consensus.
> >> > > >>>>>
> >> > > >>>>> Please let us know what you think because we would be happy if
> >> we
> >> > can
> >> > > >>>>> conclude the discussion to avoid dropping the initiative on this
> >> > > FLIP.
> >> > > >>>>>
> >> > > >>>>> Regards,
> >> > > >>>>> Sebastian
> >> > > >>>>>
> >> > > >>>>> [1]
> >> > > >>>>>
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> >> > > >>>>> (v. 12)
> >> > > >>>>>
> >> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> >> > pnowojski@apache.org
> >> > > >
> >> > > >>>>> wrote:
> >> > > >>>>>
> >> > > >>>>>> Hi Guowei,
> >> > > >>>>>>
> >> > > >>>>>> as Dawid wrote a couple of messages back:
> >> > > >>>>>>
> >> > > >>>>>> > This is covered in the previous FLIP[1] which has been
> >> already
> >> > > >>>>>> implemented in 1.15. In short, it must be enabled with the
> >> > watermark
> >> > > >>>>>> strategy which also configures drift and update interval
> >> > > >>>>>>
> >> > > >>>>>> So by default watermark alignment is disabled, regardless if a
> >> > > source
> >> > > >>>>>> supports it or not.
> >> > > >>>>>>
> >> > > >>>>>> Best,
> >> > > >>>>>> Piotrek
> >> > > >>>>>>
> >> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com>
> >> > > napisał(a):
> >> > > >>>>>>
> >> > > >>>>>>> Hi,
> >> > > >>>>>>>
> >> > > >>>>>>> We know that in the case of Bounded input Flink supports the
> >> > Batch
> >> > > >>>>>>> execution mode. Currently in Batch execution mode, flink is
> >> > > executed
> >> > > >>>>>>> on a
> >> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark alignment
> >> > > might
> >> > > >>>>>>> not
> >> > > >>>>>>> gain much.
> >> > > >>>>>>>
> >> > > >>>>>>> So my question is: Is watermark alignment the default
> >> > behavior(for
> >> > > >>>>>>> implemented source only)? If so, have you considered
> >> evaluating
> >> > the
> >> > > >>>>>>> impact
> >> > > >>>>>>> of this behavior on the Batch execution mode? Or thinks it is
> >> not
> >> > > >>>>>>> necessary.
> >> > > >>>>>>>
> >> > > >>>>>>> Correct me if I miss something.
> >> > > >>>>>>>
> >> > > >>>>>>> Best,
> >> > > >>>>>>> Guowei
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> >> > > >>>>>>> piotr.nowojski@gmail.com>
> >> > > >>>>>>> wrote:
> >> > > >>>>>>>
> >> > > >>>>>>> > Hi Becket and Dawid,
> >> > > >>>>>>> >
> >> > > >>>>>>> > > I feel that no matter which option we choose this can not
> >> be
> >> > > >>>>>>> solved
> >> > > >>>>>>> > entirely in either of the options, because of the point
> >> above
> >> > and
> >> > > >>>>>>> because
> >> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
> >> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
> >> > > >>>>>>> identifies
> >> > > >>>>>>> > splits with splitId the other one passes the splits
> >> directly).
> >> > > >>>>>>> >
> >> > > >>>>>>> > Yes, that's a good point in this case and for features that
> >> > need
> >> > > >>>>>>> to be
> >> > > >>>>>>> > implemented in more than one place.
> >> > > >>>>>>> >
> >> > > >>>>>>> > > Is there any reason for pausing reading from a split an
> >> > > optional
> >> > > >>>>>>> feature,
> >> > > >>>>>>> > > other than that this was not included in the original
> >> > > interface?
> >> > > >>>>>>> >
> >> > > >>>>>>> > An additional argument in favor of making it optional is to
> >> > > >>>>>>> simplify source
> >> > > >>>>>>> > implementation. But on its own I'm not sure if that would be
> >> > > >>>>>>> enough to
> >> > > >>>>>>> > justify making this feature optional. Maybe.
> >> > > >>>>>>> >
> >> > > >>>>>>> > > I think it would be way simpler and clearer to just let
> >> end
> >> > > >>>>>>> users and
> >> > > >>>>>>> > Flink
> >> > > >>>>>>> > > assume all the connectors will implement this feature.
> >> > > >>>>>>> >
> >> > > >>>>>>> > As I wrote above that would be an interesting choice to make
> >> > > (ease
> >> > > >>>>>>> of
> >> > > >>>>>>> > implementation for new users, vs system consistency).
> >> > Regardless
> >> > > >>>>>>> of that,
> >> > > >>>>>>> > yes, for me the main argument is the API backward
> >> > compatibility.
> >> > > >>>>>>> But let's
> >> > > >>>>>>> > clear a couple of points:
> >> > > >>>>>>> > - The current proposal adding methods to the base interface
> >> > with
> >> > > >>>>>>> default
> >> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
> >> decorative
> >> > > >>>>>>> version
> >> > > >>>>>>> > would be.
> >> > > >>>>>>> > - Decorative version could implement "throw
> >> > > >>>>>>> UnsupportedOperationException"
> >> > > >>>>>>> > if user enabled watermark alignment just as well and I agree
> >> > > >>>>>>> that's a
> >> > > >>>>>>> > better option compared to logging a warning.
> >> > > >>>>>>> >
> >> > > >>>>>>> > Best,
> >> > > >>>>>>> > Piotrek
> >> > > >>>>>>> >
> >> > > >>>>>>> >
> >> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
> >> > > >>>>>>> napisał(a):
> >> > > >>>>>>> >
> >> > > >>>>>>> > > Thanks for the reply and patient discussion, Piotr and
> >> Dawid.
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > Is there any reason for pausing reading from a split an
> >> > > optional
> >> > > >>>>>>> feature,
> >> > > >>>>>>> > > other than that this was not included in the original
> >> > > interface?
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > To be honest I am really worried about the complexity of
> >> the
> >> > > >>>>>>> user story
> >> > > >>>>>>> > > here. Optional features like this have a high overhead.
> >> > Imagine
> >> > > >>>>>>> this
> >> > > >>>>>>> > > feature is optional, now a user enabled watermark
> >> alignment
> >> > and
> >> > > >>>>>>> defined a
> >> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that depends on
> >> > > >>>>>>> whether the
> >> > > >>>>>>> > > involved Source has implmemented this feature. If the
> >> Sources
> >> > > >>>>>>> are well
> >> > > >>>>>>> > > documented, good luck. Otherwise end users may have to
> >> look
> >> > > into
> >> > > >>>>>>> the code
> >> > > >>>>>>> > > of the Source to see whether the feature is supported.
> >> Which
> >> > is
> >> > > >>>>>>> something
> >> > > >>>>>>> > > they shouldn't have to do.
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > I think it would be way simpler and clearer to just let
> >> end
> >> > > >>>>>>> users and
> >> > > >>>>>>> > Flink
> >> > > >>>>>>> > > assume all the connectors will implement this feature.
> >> After
> >> > > all
> >> > > >>>>>>> the
> >> > > >>>>>>> > > watermark group is not optinoal to the end users. If in
> >> some
> >> > > >>>>>>> rare cases,
> >> > > >>>>>>> > > the feature cannot be supported, a clear
> >> > > >>>>>>> UnsupportedOperationException
> >> > > >>>>>>> > will
> >> > > >>>>>>> > > be thrown to tell users to explicitly remove this Source
> >> from
> >> > > the
> >> > > >>>>>>> > watermark
> >> > > >>>>>>> > > group. I don't think we should have a warning message
> >> here,
> >> > as
> >> > > >>>>>>> they tend
> >> > > >>>>>>> > to
> >> > > >>>>>>> > > be ignored in many cases. If we do this, we don't even
> >> need
> >> > the
> >> > > >>>>>>> > supportXXX
> >> > > >>>>>>> > > method in the Source for this feature. In fact this is
> >> > exactly
> >> > > >>>>>>> how many
> >> > > >>>>>>> > > interfaces works today. For example,
> >> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> >> > > >>>>>>> > not
> >> > > >>>>>>> > > supported by Pravega source because it does not support
> >> > partial
> >> > > >>>>>>> failover.
> >> > > >>>>>>> > > In that case, it simply throws an exception to trigger a
> >> > global
> >> > > >>>>>>> recovery.
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > The reason we add a default implementation in this case
> >> would
> >> > > >>>>>>> just for
> >> > > >>>>>>> > the
> >> > > >>>>>>> > > sake of backwards compatibility so the old source can
> >> still
> >> > > >>>>>>> compile.
> >> > > >>>>>>> > Sure,
> >> > > >>>>>>> > > in short term, this feature might not be supported by many
> >> > > >>>>>>> existing
> >> > > >>>>>>> > > sources. That is OK, and it is quite visible to the source
> >> > > >>>>>>> developers
> >> > > >>>>>>> > that
> >> > > >>>>>>> > > they did not override the default impl which throws an
> >> > > >>>>>>> > > UnsupportedOperationException.
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > @Dawid,
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > the Java doc of the SupportXXX() method in the Source
> >> would
> >> > be
> >> > > >>>>>>> the single
> >> > > >>>>>>> > > >> source of truth regarding how to implement this
> >> feature.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > I also don't find it entirely true. Half of the classes
> >> are
> >> > > >>>>>>> theoretically
> >> > > >>>>>>> > > > optional and are utility classes from the point of view
> >> how
> >> > > the
> >> > > >>>>>>> > > interfaces
> >> > > >>>>>>> > > > are organized. Theoretically users do not need to use
> >> any
> >> > of
> >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
> >> > their
> >> > > >>>>>>> methods in
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > > Source interface.
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > I think the ultimate goal of java docs is to guide users
> >> to
> >> > > >>>>>>> implement the
> >> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way to
> >> > implement a
> >> > > >>>>>>> > > SourceReader, it seems worth mentioning that. Even the
> >> Java
> >> > > >>>>>>> language
> >> > > >>>>>>> > > documentation interfaces lists the konwn implementations
> >> [1]
> >> > so
> >> > > >>>>>>> people
> >> > > >>>>>>> > can
> >> > > >>>>>>> > > leverage them. But for this particular case, if we make
> >> the
> >> > > >>>>>>> feature
> >> > > >>>>>>> > > non-optional, we don't even need the supportXXX() method
> >> for
> >> > > now.
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > Thanks,
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > Jiangjie (Becket) Qin
> >> > > >>>>>>> > >
> >> > > >>>>>>> > >
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
> >> > > >>>>>>> dwysakowicz@apache.org>
> >> > > >>>>>>> > > wrote:
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > > Hey Piotr and Becket,
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > First of all, let me say I am happy with whichever
> >> option
> >> > is
> >> > > >>>>>>> agreed in
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > > discussion.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > I wanted to clarify a few points from the discussion
> >> > though:
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > @Becket:
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > The main argument for adding the methods to the
> >> > SourceReader
> >> > > >>>>>>> is that
> >> > > >>>>>>> > > these
> >> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader
> >> impl,
> >> > > i.e.
> >> > > >>>>>>> > starting
> >> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are expected
> >> to
> >> > > >>>>>>> support this
> >> > > >>>>>>> > > > method, although some old implementations may not have
> >> > > >>>>>>> implemented this
> >> > > >>>>>>> > > > feature. I think we should distinguish the new features
> >> > from
> >> > > >>>>>>> the
> >> > > >>>>>>> > optional
> >> > > >>>>>>> > > > features. While the public decorative interface is a
> >> > solution
> >> > > >>>>>>> to the
> >> > > >>>>>>> > > > optional features. We should not use it for the features
> >> > that
> >> > > >>>>>>> are
> >> > > >>>>>>> > > > non-optional.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even
> >> > though
> >> > > >>>>>>> > preferred, I
> >> > > >>>>>>> > > > still think it can be simply optional.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the Source
> >> would
> >> > > be
> >> > > >>>>>>> the
> >> > > >>>>>>> > single
> >> > > >>>>>>> > > > source of truth regarding how to implement this feature.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > I also don't find it entirely true. Half of the classes
> >> are
> >> > > >>>>>>> > theoretically
> >> > > >>>>>>> > > > optional and are utility classes from the point of view
> >> how
> >> > > the
> >> > > >>>>>>> > > interfaces
> >> > > >>>>>>> > > > are organized. Theoretically users do not need to use
> >> any
> >> > of
> >> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
> >> > their
> >> > > >>>>>>> methods in
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > > Source interface.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > @Piotr
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > If we have all of the methods with default
> >> implementation
> >> > in
> >> > > >>>>>>> the base
> >> > > >>>>>>> > > > interface, the API doesn't give any clue to the user
> >> which
> >> > > set
> >> > > >>>>>>> of
> >> > > >>>>>>> > methods
> >> > > >>>>>>> > > > are required to be implemented at the same time.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > I feel that no matter which option we choose this can
> >> not
> >> > be
> >> > > >>>>>>> solved
> >> > > >>>>>>> > > > entirely in either of the options, because of the point
> >> > above
> >> > > >>>>>>> and
> >> > > >>>>>>> > because
> >> > > >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
> >> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different
> >> > (one
> >> > > >>>>>>> identifies
> >> > > >>>>>>> > > > splits with splitId the other one passes the splits
> >> > > directly).
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Best,
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Dawid
> >> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Hi Piotr,
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Thanks for the comment.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Just to clarify, I am not against the decorative
> >> > interfaces,
> >> > > >>>>>>> but I do
> >> > > >>>>>>> > > > think we should use them with caution. The main argument
> >> > for
> >> > > >>>>>>> adding the
> >> > > >>>>>>> > > > methods to the SourceReader is that these methods are
> >> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e.
> >> > starting
> >> > > >>>>>>> from this
> >> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to support
> >> > this
> >> > > >>>>>>> > > > method, although some old implementations may not have
> >> > > >>>>>>> implemented this
> >> > > >>>>>>> > > > feature. I think we should distinguish the new features
> >> > from
> >> > > >>>>>>> the
> >> > > >>>>>>> > optional
> >> > > >>>>>>> > > > features. While the public decorative interface is a
> >> > solution
> >> > > >>>>>>> to the
> >> > > >>>>>>> > > > optional features. We should not use it for the features
> >> > that
> >> > > >>>>>>> are
> >> > > >>>>>>> > > > non-optional.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > That said, this feature is optional for SplitReaders.
> >> > > Arguably
> >> > > >>>>>>> we can
> >> > > >>>>>>> > > have
> >> > > >>>>>>> > > > a decorative interface for that, but for simplicity and
> >> > > >>>>>>> symmetry of the
> >> > > >>>>>>> > > > interface, personally I prefer just adding a new method.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Regarding the advantages you mentioned about the
> >> decorative
> >> > > >>>>>>> interfaces,
> >> > > >>>>>>> > > > they would make sense if:
> >> > > >>>>>>> > > > 1. The feature is optional.
> >> > > >>>>>>> > > > 2. There is only one decorative interface involved for a
> >> > > >>>>>>> feature.
> >> > > >>>>>>> > > > Otherwise the argument that all the methods are grouped
> >> > > >>>>>>> together will
> >> > > >>>>>>> > not
> >> > > >>>>>>> > > > stand.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Compared with that, I think the current solution works
> >> fine
> >> > > in
> >> > > >>>>>>> all
> >> > > >>>>>>> > cases,
> >> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and default
> >> > > >>>>>>> methods /
> >> > > >>>>>>> > > > decorative interfaces in base interfaces.".
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > The advantages are:
> >> > > >>>>>>> > > >> - clean and easy to implement base interface
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > In the current approach, the Java doc of the
> >> SupportXXX()
> >> > > >>>>>>> method in the
> >> > > >>>>>>> > > > Source would be the single source of truth regarding
> >> how to
> >> > > >>>>>>> implement
> >> > > >>>>>>> > > this
> >> > > >>>>>>> > > > feature. It lists the method that has to be implemented
> >> to
> >> > > >>>>>>> support this
> >> > > >>>>>>> > > > feature, regardless of how many classes / interfaces are
> >> > > >>>>>>> involved.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > When implementing the base interface, users do not need
> >> to
> >> > > >>>>>>> implement a
> >> > > >>>>>>> > > > method with default implementation. If they are curious
> >> > what
> >> > > >>>>>>> the method
> >> > > >>>>>>> > > is
> >> > > >>>>>>> > > > for, the java doc of that method simply points users to
> >> the
> >> > > >>>>>>> > SupportXXX()
> >> > > >>>>>>> > > > method in the Source. It seems not adding work to the
> >> users
> >> > > >>>>>>> compared
> >> > > >>>>>>> > with
> >> > > >>>>>>> > > > decorative interfaces, but gives much better
> >> > discoverability.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > - all of the methods from a single feature are grouped
> >> in a
> >> > > >>>>>>> single
> >> > > >>>>>>> > > >> decorator interface, together with their dedicated java
> >> > doc.
> >> > > >>>>>>> It's also
> >> > > >>>>>>> > > >> easier to google search for help using the decorator
> >> name
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > - if an optional feature requires two methods to be
> >> > > >>>>>>> implemented at
> >> > > >>>>>>> > once,
> >> > > >>>>>>> > > >> decorator can guarantee that
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > These two points are not true when multiple components
> >> and
> >> > > >>>>>>> classes are
> >> > > >>>>>>> > > > involved collaboratively to provide a feature. In our
> >> case,
> >> > > we
> >> > > >>>>>>> have
> >> > > >>>>>>> > both
> >> > > >>>>>>> > > > SourceReader and SplitReader involved. And there might
> >> be
> >> > > other
> >> > > >>>>>>> > > interfaces
> >> > > >>>>>>> > > > on the JM side involved for some future features. So the
> >> > > >>>>>>> relevant
> >> > > >>>>>>> > methods
> >> > > >>>>>>> > > > can actually be scattered over the places. That said, we
> >> > may
> >> > > >>>>>>> still use
> >> > > >>>>>>> > > > decorative interfaces for each component, if the
> >> feature is
> >> > > >>>>>>> optional,
> >> > > >>>>>>> > > given
> >> > > >>>>>>> > > > there is a single source of truth for the feature.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Here I would strongly lean towards making life easier
> >> for
> >> > new
> >> > > >>>>>>> users,
> >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
> >> > for
> >> > > >>>>>>> the power
> >> > > >>>>>>> > > >> users.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > I actually think the current approach is simpler, more
> >> > > >>>>>>> extensible and
> >> > > >>>>>>> > > more
> >> > > >>>>>>> > > > general for all the users. Can you articulate a bit
> >> more on
> >> > > >>>>>>> which part
> >> > > >>>>>>> > > you
> >> > > >>>>>>> > > > think makes users harder to understand?
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > There is another benefit of the decorative interfaces
> >> which
> >> > > is
> >> > > >>>>>>> not
> >> > > >>>>>>> > > > mentioned, but might be worth considering here. Usually
> >> the
> >> > > >>>>>>> decorative
> >> > > >>>>>>> > > > interfaces give slightly better backwards compatibility
> >> > than
> >> > > >>>>>>> the new
> >> > > >>>>>>> > > > default method in the interfaces. That is when users are
> >> > > using
> >> > > >>>>>>> a jar
> >> > > >>>>>>> > that
> >> > > >>>>>>> > > > was compiled with an older version of Flink which does
> >> not
> >> > > >>>>>>> have the
> >> > > >>>>>>> > > default
> >> > > >>>>>>> > > > method in the interfaces in question. A decorative
> >> > interface
> >> > > >>>>>>> may still
> >> > > >>>>>>> > > > provide backwards compatibility in that case, while
> >> default
> >> > > >>>>>>> method impl
> >> > > >>>>>>> > > > cannot.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > I think in Flink we in general do not guarantee custom
> >> > > >>>>>>> components
> >> > > >>>>>>> > > compiled
> >> > > >>>>>>> > > > with an older version can run with a newer version of
> >> > Flink.
> >> > > A
> >> > > >>>>>>> > recompile
> >> > > >>>>>>> > > > with a newer version would be required. That said, if
> >> we do
> >> > > >>>>>>> care about
> >> > > >>>>>>> > > > this, we can just change the "supportXXX()" method in
> >> the
> >> > > >>>>>>> Source
> >> > > >>>>>>> > > interface
> >> > > >>>>>>> > > > to use decorative interfaces, and leave the other parts
> >> > > >>>>>>> unchanged.
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Thanks,
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > Jiangjie (Becket) Qin
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
> >> > > >>>>>>> pnowojski@apache.org>
> >> > > >>>>>>> > > > wrote:
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > >> Hi,
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for the
> >> last
> >> > > two
> >> > > >>>>>>> weeks.
> >> > > >>>>>>> > > >> Regarding the interfaces, I would be actually against
> >> > adding
> >> > > >>>>>>> those
> >> > > >>>>>>> > > methods
> >> > > >>>>>>> > > >> to the base interfaces for the reasons mentioned above.
> >> > > >>>>>>> Clogging the
> >> > > >>>>>>> > > base
> >> > > >>>>>>> > > >> interface for new users with tons of methods that they
> >> do
> >> > > not
> >> > > >>>>>>> need, do
> >> > > >>>>>>> > > not
> >> > > >>>>>>> > > >> understand and do not know what to do with them.
> >> Moreover,
> >> > > >>>>>>> such
> >> > > >>>>>>> > > decorative
> >> > > >>>>>>> > > >> interfaces are solving a problem if a feature requires
> >> two
> >> > > or
> >> > > >>>>>>> more
> >> > > >>>>>>> > > methods
> >> > > >>>>>>> > > >> to be implemented at the same time. If we have all of
> >> the
> >> > > >>>>>>> methods with
> >> > > >>>>>>> > > >> default implementation in the base interface, the API
> >> > > doesn't
> >> > > >>>>>>> give any
> >> > > >>>>>>> > > >> clue
> >> > > >>>>>>> > > >> to the user which set of methods are required to be
> >> > > >>>>>>> implemented at the
> >> > > >>>>>>> > > >> same
> >> > > >>>>>>> > > >> time.
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
> >> interfaces
> >> > is
> >> > > >>>>>>> which
> >> > > >>>>>>> > > >> interface
> >> > > >>>>>>> > > >> > they can decorate and which combinations of multiple
> >> > > >>>>>>> decorative
> >> > > >>>>>>> > > >> interfaces
> >> > > >>>>>>> > > >> > are valid. (...)
> >> > > >>>>>>> > > >> > In the future, if there is a new feature added
> >> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we
> >> > going
> >> > > >>>>>>> to create
> >> > > >>>>>>> > > >> another
> >> > > >>>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
> >> > > >>>>>>> > > >> PrePartitionedAware?
> >> > > >>>>>>> > > >> > Can they be combined? So I think the additional
> >> > decorative
> >> > > >>>>>>> interface
> >> > > >>>>>>> > > >> like
> >> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
> >> understanding
> >> > > >>>>>>> cost of
> >> > > >>>>>>> > users
> >> > > >>>>>>> > > >> > because they have to know what decorative interfaces
> >> are
> >> > > >>>>>>> there,
> >> > > >>>>>>> > which
> >> > > >>>>>>> > > >> > interface they can decorate and which combinations of
> >> > the
> >> > > >>>>>>> decorative
> >> > > >>>>>>> > > >> > interfaces are valid and which are not. Ideally we
> >> want
> >> > to
> >> > > >>>>>>> avoid
> >> > > >>>>>>> > that.
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> I'm not sure if I understand how embedding default
> >> methods
> >> > > in
> >> > > >>>>>>> the base
> >> > > >>>>>>> > > >> interface is solving the problem: what can be combined
> >> or
> >> > > >>>>>>> not? If
> >> > > >>>>>>> > there
> >> > > >>>>>>> > > >> are
> >> > > >>>>>>> > > >> two conflicting features, having decorative interfaces
> >> > that
> >> > > >>>>>>> can not be
> >> > > >>>>>>> > > >> mixed together actually makes much more sense to me
> >> rather
> >> > > >>>>>>> than having
> >> > > >>>>>>> > > >> them
> >> > > >>>>>>> > > >> all in one base class. How would you allow users to
> >> > > implement
> >> > > >>>>>>> only one
> >> > > >>>>>>> > > of
> >> > > >>>>>>> > > >> those two features?
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
> >> > > >>>>>>> > > >> - how can a user discover what decorators are there?
> >> > > >>>>>>> > > >> - how can a user know where the decorator can be
> >> applied
> >> > to?
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> However those are drawbacks for more power users, that
> >> can
> >> > > be
> >> > > >>>>>>> > mitigated
> >> > > >>>>>>> > > by
> >> > > >>>>>>> > > >> the documentation. For example listing all of the
> >> > decorators
> >> > > >>>>>>> with
> >> > > >>>>>>> > > >> detailed explanation both in the docs and in the java
> >> > docs.
> >> > > >>>>>>> More
> >> > > >>>>>>> > > >> experienced users will be able to deal with those
> >> issues
> >> > > >>>>>>> easier, as
> >> > > >>>>>>> > they
> >> > > >>>>>>> > > >> will already have some basic understanding of Flink.
> >> Also
> >> > if
> >> > > >>>>>>> user has
> >> > > >>>>>>> > a
> >> > > >>>>>>> > > >> problem that he wants to solve, he will google search a
> >> > > >>>>>>> potential
> >> > > >>>>>>> > > solution
> >> > > >>>>>>> > > >> to his problem anyway, and while doing that he is very
> >> > > likely
> >> > > >>>>>>> to
> >> > > >>>>>>> > > discover
> >> > > >>>>>>> > > >> the decorator that he needs anyway in the docs.
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> The advantages are:
> >> > > >>>>>>> > > >> - clean and easy to implement base interface
> >> > > >>>>>>> > > >> - all of the methods from a single feature are grouped
> >> in
> >> > a
> >> > > >>>>>>> single
> >> > > >>>>>>> > > >> decorator interface, together with their dedicated java
> >> > doc.
> >> > > >>>>>>> It's also
> >> > > >>>>>>> > > >> easier to google search for help using the decorator
> >> name
> >> > > >>>>>>> > > >> - if an optional feature requires two methods to be
> >> > > >>>>>>> implemented at
> >> > > >>>>>>> > once,
> >> > > >>>>>>> > > >> decorator can guarantee that
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> Here I would strongly lean towards making life easier
> >> for
> >> > > new
> >> > > >>>>>>> users,
> >> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
> >> > for
> >> > > >>>>>>> the power
> >> > > >>>>>>> > > >> users.
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> Best,
> >> > > >>>>>>> > > >> Piotrek
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> >> becket.qin@gmail.com
> >> > >
> >> > > >>>>>>> > napisał(a):
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > I think Sebastion has a good summary. This is a
> >> really
> >> > > >>>>>>> helpful
> >> > > >>>>>>> > > >> discussion.
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might still be
> >> > better
> >> > > >>>>>>> to add the
> >> > > >>>>>>> > > >> > supportsXXX() method in the Source rather than
> >> > > SourceReader.
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > Generally speaking, what we are trying to do here is
> >> to
> >> > > let
> >> > > >>>>>>> the
> >> > > >>>>>>> > Flink
> >> > > >>>>>>> > > >> > framework know what the Source is capable of. In this
> >> > > FLIP,
> >> > > >>>>>>> it
> >> > > >>>>>>> > happens
> >> > > >>>>>>> > > >> to
> >> > > >>>>>>> > > >> > be the capability that only involves SourceReader.
> >> But
> >> > in
> >> > > >>>>>>> the
> >> > > >>>>>>> > future,
> >> > > >>>>>>> > > >> it is
> >> > > >>>>>>> > > >> > possible that another functionality involves both the
> >> > > >>>>>>> > SplitEnumerator
> >> > > >>>>>>> > > >> and
> >> > > >>>>>>> > > >> > SourceReader. In that case, following the current
> >> > > approach,
> >> > > >>>>>>> we
> >> > > >>>>>>> > should
> >> > > >>>>>>> > > >> put
> >> > > >>>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator
> >> and
> >> > > >>>>>>> SourceReader.
> >> > > >>>>>>> > > >> > Because if we only put this in the SourceReader, then
> >> > the
> >> > > >>>>>>> JM would
> >> > > >>>>>>> > > have
> >> > > >>>>>>> > > >> to
> >> > > >>>>>>> > > >> > create a SourceReader in order to know whether this
> >> > > feature
> >> > > >>>>>>> is
> >> > > >>>>>>> > > >> supported,
> >> > > >>>>>>> > > >> > which is a little ugly. But if we put the
> >> > "supportsXXX()"
> >> > > >>>>>>> method in
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > >> > Source, we will break the "symmetric" design because
> >> > this
> >> > > >>>>>>> FLIP
> >> > > >>>>>>> > chose a
> >> > > >>>>>>> > > >> > different way.
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > This is also why I think supportsXXX() method seems a
> >> > good
> >> > > >>>>>>> thing to
> >> > > >>>>>>> > > >> have,
> >> > > >>>>>>> > > >> > because when there are a few interfaces / methods
> >> that
> >> > are
> >> > > >>>>>>> expected
> >> > > >>>>>>> > to
> >> > > >>>>>>> > > >> be
> >> > > >>>>>>> > > >> > implemented at the same time in order to deliver a
> >> > > feature,
> >> > > >>>>>>> it is
> >> > > >>>>>>> > > always
> >> > > >>>>>>> > > >> > good to have a single source of truth to tell the
> >> > > framework
> >> > > >>>>>>> what to
> >> > > >>>>>>> > > do,
> >> > > >>>>>>> > > >> so
> >> > > >>>>>>> > > >> > the framework can do consistent things in different
> >> > parts.
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> >> AlignedSourceReader
> >> > +
> >> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is
> >> that we
> >> > > are
> >> > > >>>>>>> > > essentially
> >> > > >>>>>>> > > >> > expecting almost all the SourceReader
> >> implementations to
> >> > > >>>>>>> extend
> >> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
> >> > SourceReader
> >> > > >>>>>>> interface
> >> > > >>>>>>> > > >> > without the pausing support useless. So this
> >> indicates
> >> > > that
> >> > > >>>>>>> public
> >> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
> >> > > >>>>>>> purpose) only
> >> > > >>>>>>> > > >> > make sense if the original interface is also
> >> expected to
> >> > > be
> >> > > >>>>>>> used.
> >> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add the
> >> method
> >> > to
> >> > > >>>>>>> the
> >> > > >>>>>>> > original
> >> > > >>>>>>> > > >> > interface itself.
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > Cheers,
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >> > > >>>>>>> > > >> > wrote:
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > I think most of your points aligned with the
> >> > suggestions
> >> > > >>>>>>> I made to
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I believe
> >> we
> >> > > hit
> >> > > >>>>>>> enter
> >> > > >>>>>>> > > >> nearly at
> >> > > >>>>>>> > > >> > > the same time ;) )
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Two points after we synced offline
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > 1. I changed also the
> >> supportsWatermarksSplitAlignment
> >> > > to
> >> > > >>>>>>> > > >> > > supportsPausingSplits to express the general
> >> > capability
> >> > > of
> >> > > >>>>>>> > pausing.
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > 2. As for if we should
> >> > > >>>>>>> PausingSourceReader/PausingSplitReader
> >> > > >>>>>>> > > (option
> >> > > >>>>>>> > > >> b)
> >> > > >>>>>>> > > >> > > or if we should just add the methods (option c), I
> >> > > >>>>>>> suggest to
> >> > > >>>>>>> > simply
> >> > > >>>>>>> > > >> add
> >> > > >>>>>>> > > >> > > the two methods as I felt this is much preferred
> >> > > approach
> >> > > >>>>>>> Becket,
> >> > > >>>>>>> > > >> which
> >> > > >>>>>>> > > >> > > others do not object. Unless there is an opposition
> >> > > let's
> >> > > >>>>>>> go with
> >> > > >>>>>>> > > this
> >> > > >>>>>>> > > >> > > option c.
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Best,
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Dawid
> >> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Hi folks,
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all
> >> the
> >> > > >>>>>>> input and
> >> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time
> >> to
> >> > > >>>>>>> catch up and
> >> > > >>>>>>> > > >> > structure
> >> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted to find a
> >> way
> >> > > to
> >> > > >>>>>>> > structure
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > > conclusions. (Also because I had the feeling that
> >> some
> >> > > >>>>>>> concerns
> >> > > >>>>>>> > got
> >> > > >>>>>>> > > >> lost
> >> > > >>>>>>> > > >> > in
> >> > > >>>>>>> > > >> > > the discussion.) This is my attempt and please
> >> correct
> >> > > me
> >> > > >>>>>>> if
> >> > > >>>>>>> > > >> something is
> >> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect and
> >> > assemble
> >> > > >>>>>>> the
> >> > > >>>>>>> > > opinions,
> >> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best of my
> >> > > >>>>>>> knowledge):
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > # Top A: Should split alignment (pause/resume
> >> > behavior)
> >> > > >>>>>>> be a
> >> > > >>>>>>> > general
> >> > > >>>>>>> > > >> > > capability?
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > I personally don't see any reason no to have it a
> >> > > general
> >> > > >>>>>>> > capability
> >> > > >>>>>>> > > >> > > because for the alignSplit method it is actually
> >> > > >>>>>>> independent of
> >> > > >>>>>>> > the
> >> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a general
> >> > > >>>>>>> capability, we
> >> > > >>>>>>> > > >> should
> >> > > >>>>>>> > > >> > > also agree on the right wording. Does
> >> > > >>>>>>> "alignSplits(splitsToResume,
> >> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then actually
> >> meant?
> >> > (I
> >> > > >>>>>>> see it as
> >> > > >>>>>>> > > >> okay.
> >> > > >>>>>>> > > >> > I
> >> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid suggested
> >> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > # Top B: Should it be possible do enable/disable
> >> split
> >> > > >>>>>>> alignment?
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > I would personally not disable the split alignment
> >> on
> >> > > the
> >> > > >>>>>>> source
> >> > > >>>>>>> > > >> reader
> >> > > >>>>>>> > > >> > > side because if split alignment is used for some
> >> other
> >> > > >>>>>>> use case
> >> > > >>>>>>> > (see
> >> > > >>>>>>> > > >> A)
> >> > > >>>>>>> > > >> > it
> >> > > >>>>>>> > > >> > > could have nasty side effects on other/future use
> >> > cases.
> >> > > >>>>>>> Instead,
> >> > > >>>>>>> > I
> >> > > >>>>>>> > > >> would
> >> > > >>>>>>> > > >> > > disable "watermark split alignment" where I think
> >> it
> >> > > >>>>>>> should
> >> > > >>>>>>> > disable
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > > watermark-dependent trigger for split alignment.
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > I find it difficult to define the scope of a
> >> supportsX
> >> > > >>>>>>> method
> >> > > >>>>>>> > w.r.t.
> >> > > >>>>>>> > > >> to
> >> > > >>>>>>> > > >> > > the following questions: a) Where is it used? and
> >> b)
> >> > > What
> >> > > >>>>>>> is the
> >> > > >>>>>>> > > >> expected
> >> > > >>>>>>> > > >> > > output? To b), it's not straight-forward to
> >> provide a
> >> > > >>>>>>> meaningful
> >> > > >>>>>>> > > >> output,
> >> > > >>>>>>> > > >> > > e.g., if SourceReader supports split alignment but
> >> > > >>>>>>> SplitReader
> >> > > >>>>>>> > not.
> >> > > >>>>>>> > > >> This
> >> > > >>>>>>> > > >> > is
> >> > > >>>>>>> > > >> > > because with the current implementation, we can
> >> > > determine
> >> > > >>>>>>> whether
> >> > > >>>>>>> > > >> split
> >> > > >>>>>>> > > >> > > alignment is fully supported only during runtime
> >> and
> >> > > >>>>>>> specifically
> >> > > >>>>>>> > > >> > actually
> >> > > >>>>>>> > > >> > > only when calling alignSplits down the call
> >> hierarchy
> >> > up
> >> > > >>>>>>> to the
> >> > > >>>>>>> > > actual
> >> > > >>>>>>> > > >> > > SplitReaders.
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Therefore, I would suggest to either raise an
> >> error or
> >> > > >>>>>>> warning if
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > >> > > alignment is called but not supported at some
> >> point. I
> >> > > >>>>>>> know we
> >> > > >>>>>>> > > should
> >> > > >>>>>>> > > >> > > carefully think about when this could be the case
> >> > > because
> >> > > >>>>>>> we don't
> >> > > >>>>>>> > > >> want
> >> > > >>>>>>> > > >> > to
> >> > > >>>>>>> > > >> > > flood anybody with such warnings. However, warnings
> >> > > could
> >> > > >>>>>>> be an
> >> > > >>>>>>> > > >> indicator
> >> > > >>>>>>> > > >> > > for the user that for watermark split alignment use
> >> > case
> >> > > >>>>>>> split
> >> > > >>>>>>> > > >> reading is
> >> > > >>>>>>> > > >> > > imbalanced with the conclusion to either disable
> >> the
> >> > > >>>>>>> trigger for
> >> > > >>>>>>> > > >> > watermark
> >> > > >>>>>>> > > >> > > split alignment (see Top B) or to use/implement a
> >> > source
> >> > > >>>>>>> and
> >> > > >>>>>>> > reader
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > > fully supports split alignment.
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Thanks for structuring the discussion with the the
> >> > > various
> >> > > >>>>>>> > > >> possibilities
> >> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I would
> >> like to
> >> > > >>>>>>> summarize
> >> > > >>>>>>> > the
> >> > > >>>>>>> > > >> > > following requirements:
> >> > > >>>>>>> > > >> > > - Interfaces should be consistent ("symmetric"),
> >> i.e.,
> >> > > >>>>>>> similar
> >> > > >>>>>>> > > >> semantics
> >> > > >>>>>>> > > >> > > should have similar interfaces with similar usage.
> >> > > >>>>>>> > > >> > > - Make explicit which implementations implement
> >> > > >>>>>>> interfaces/support
> >> > > >>>>>>> > > >> > > behavior.
> >> > > >>>>>>> > > >> > > - Make clear what are default implementations and
> >> how
> >> > to
> >> > > >>>>>>> implement
> >> > > >>>>>>> > > >> > > interfaces with desired behavior.
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > This is a simplified view of the relations between
> >> > > >>>>>>> relevant
> >> > > >>>>>>> > classes
> >> > > >>>>>>> > > of
> >> > > >>>>>>> > > >> > the
> >> > > >>>>>>> > > >> > > PoC implementation:
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase
> >> (Internal)
> >> > > >>>>>>> <|-- ..
> >> > > >>>>>>> > <|--
> >> > > >>>>>>> > > >> > > MySourceReader
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal)
> >> > <>--
> >> > > >>>>>>> > SplitFetcher
> >> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
> >> > MySplitReader
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a"
> >> B)
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
> >> > > >>>>>>> implement most
> >> > > >>>>>>> > of
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > > "logic" for split alignment just because we wanted
> >> to
> >> > > >>>>>>> implement
> >> > > >>>>>>> > > split
> >> > > >>>>>>> > > >> > > alignment and wanted it to be available as kind of
> >> a
> >> > > >>>>>>> default. As a
> >> > > >>>>>>> > > >> > > consequence, we have a "default implementation" for
> >> > > >>>>>>> SourceReader
> >> > > >>>>>>> > > that
> >> > > >>>>>>> > > >> > > implements the actual logic for split alignment.
> >> For
> >> > > that
> >> > > >>>>>>> reason,
> >> > > >>>>>>> > I
> >> > > >>>>>>> > > >> find
> >> > > >>>>>>> > > >> > it
> >> > > >>>>>>> > > >> > > very confusing to have a NOOP default
> >> implementation
> >> > in
> >> > > >>>>>>> the
> >> > > >>>>>>> > > interface
> >> > > >>>>>>> > > >> for
> >> > > >>>>>>> > > >> > > the SourceReader. As a consequence, interface
> >> strategy
> >> > > c)
> >> > > >>>>>>> is
> >> > > >>>>>>> > > difficult
> >> > > >>>>>>> > > >> > > because this would require NOOP default
> >> > implementations
> >> > > >>>>>>> in the
> >> > > >>>>>>> > > public
> >> > > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is
> >> > the
> >> > > >>>>>>> same for
> >> > > >>>>>>> > > >> strategy
> >> > > >>>>>>> > > >> > > d) because it would require NOOP default
> >> > implementation
> >> > > >>>>>>> in the
> >> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described method
> >> > > >>>>>>> signatures of
> >> > > >>>>>>> > > >> alignSplit
> >> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ and it
> >> would
> >> > be
> >> > > >>>>>>> extremely
> >> > > >>>>>>> > > >> > difficult
> >> > > >>>>>>> > > >> > > to make the signatures the same (with even
> >> potential
> >> > > >>>>>>> performance
> >> > > >>>>>>> > > >> impact
> >> > > >>>>>>> > > >> > > because of additional loop-ups of split ids).
> >> > Therefore,
> >> > > >>>>>>> having a
> >> > > >>>>>>> > > >> > symmetric
> >> > > >>>>>>> > > >> > > decorative interface as of strategy a) is actually
> >> not
> >> > > >>>>>>> possible
> >> > > >>>>>>> > and
> >> > > >>>>>>> > > >> > having
> >> > > >>>>>>> > > >> > > two decorative interfaces with different method
> >> > > >>>>>>> signatures is
> >> > > >>>>>>> > > >> confusing.
> >> > > >>>>>>> > > >> > My
> >> > > >>>>>>> > > >> > > conclusion is that we are best with strategy b)
> >> which
> >> > > >>>>>>> means to
> >> > > >>>>>>> > have
> >> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit from the
> >> > parent
> >> > > >>>>>>> > interface:
> >> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader
> >> > <|--
> >> > > >>>>>>> > > >> > AlignedSplitReader
> >> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what the
> >> > > implications
> >> > > >>>>>>> are and
> >> > > >>>>>>> > if
> >> > > >>>>>>> > > >> this
> >> > > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid and I
> >> just
> >> > > >>>>>>> try to
> >> > > >>>>>>> > > >> implement
> >> > > >>>>>>> > > >> > and
> >> > > >>>>>>> > > >> > > see if we like it. :)
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > # Summary
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > In conclusion, please let me know your
> >> perspectives.
> >> > > >>>>>>> Please
> >> > > >>>>>>> > correct
> >> > > >>>>>>> > > >> me,
> >> > > >>>>>>> > > >> > if
> >> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
> >> something. My
> >> > > >>>>>>> perspective
> >> > > >>>>>>> > > >> would
> >> > > >>>>>>> > > >> > be:
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Top A: Yes
> >> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
> >> > > >>>>>>> alignment)
> >> > > >>>>>>> > > >> > > Top C: No
> >> > > >>>>>>> > > >> > > Top D: b)
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > Best,
> >> > > >>>>>>> > > >> > > Sebastian
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> >> > > >>>>>>> > > >> dwysakowicz@apache.org
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > > wrote:
> >> > > >>>>>>> > > >> > >
> >> > > >>>>>>> > > >> > >> @Arvid:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
> >> > fear
> >> > > >>>>>>> that it
> >> > > >>>>>>> > > >> doesn't
> >> > > >>>>>>> > > >> > work
> >> > > >>>>>>> > > >> > >> for this particular use case: Sources can always
> >> be
> >> > > >>>>>>> aligned
> >> > > >>>>>>> > > >> cross-task
> >> > > >>>>>>> > > >> > and
> >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> >> > > >>>>>>> plausible to put
> >> > > >>>>>>> > > >> > sources
> >> > > >>>>>>> > > >> > >> into an alignment group even though they do not
> >> use
> >> > any
> >> > > >>>>>>> of the
> >> > > >>>>>>> > > >> presented
> >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
> >> warning, if
> >> > > >>>>>>> they handle
> >> > > >>>>>>> > > >> > multiple
> >> > > >>>>>>> > > >> > >> splits (see motivation section).
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for
> >> telling
> >> > > if
> >> > > >>>>>>> it
> >> > > >>>>>>> > > supports
> >> > > >>>>>>> > > >> > that
> >> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment would
> >> > always
> >> > > >>>>>>> be
> >> > > >>>>>>> > > supported.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I updated interfaces to what I believe to be
> >> closest
> >> > > to a
> >> > > >>>>>>> > consensus
> >> > > >>>>>>> > > >> > >> between all participants. Do you mind taking a
> >> look?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Dawid
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree
> >> that
> >> > > >>>>>>> decorator
> >> > > >>>>>>> > > >> interfaces
> >> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big fan of
> >> > > >>>>>>> overloading the
> >> > > >>>>>>> > > base
> >> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as is). The
> >> > > usual
> >> > > >>>>>>> feedback
> >> > > >>>>>>> > > to
> >> > > >>>>>>> > > >> > >> Source-related interfaces are always that they are
> >> > > >>>>>>> overwhelming
> >> > > >>>>>>> > and
> >> > > >>>>>>> > > >> too
> >> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not oppose
> >> 1c)
> >> > as
> >> > > >>>>>>> scattered
> >> > > >>>>>>> > > >> > interfaces
> >> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> >> > > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
> >> > fear
> >> > > >>>>>>> that it
> >> > > >>>>>>> > > >> doesn't
> >> > > >>>>>>> > > >> > work
> >> > > >>>>>>> > > >> > >> for this particular use case: Sources can always
> >> be
> >> > > >>>>>>> aligned
> >> > > >>>>>>> > > >> cross-task
> >> > > >>>>>>> > > >> > and
> >> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> >> > > >>>>>>> plausible to put
> >> > > >>>>>>> > > >> > sources
> >> > > >>>>>>> > > >> > >> into an alignment group even though they do not
> >> use
> >> > any
> >> > > >>>>>>> of the
> >> > > >>>>>>> > > >> presented
> >> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
> >> warning, if
> >> > > >>>>>>> they handle
> >> > > >>>>>>> > > >> > multiple
> >> > > >>>>>>> > > >> > >> splits (see motivation section).
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate future
> >> use
> >> > > >>>>>>> cases makes
> >> > > >>>>>>> > > >> sense
> >> > > >>>>>>> > > >> > but
> >> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be
> >> > > adjusted
> >> > > >>>>>>> > > >> accordingly.
> >> > > >>>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader
> >> and
> >> > > I'd
> >> > > >>>>>>> go for
> >> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We
> >> > could
> >> > > >>>>>>> also
> >> > > >>>>>>> > split
> >> > > >>>>>>> > > it
> >> > > >>>>>>> > > >> > into
> >> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> >> pauseOrResumeSplits
> >> > > >>>>>>> may allow
> >> > > >>>>>>> > > >> Sources
> >> > > >>>>>>> > > >> > to
> >> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as written
> >> in
> >> > > the
> >> > > >>>>>>> > Javadoc),
> >> > > >>>>>>> > > >> both
> >> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if
> >> > there
> >> > > >>>>>>> is a
> >> > > >>>>>>> > system
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >> can.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> >> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> >> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the capability
> >> > section:
> >> > > >>>>>>> Sources
> >> > > >>>>>>> > that
> >> > > >>>>>>> > > >> > adopt
> >> > > >>>>>>> > > >> > >> this interface cannot be used in earlier
> >> versions. So
> >> > > it
> >> > > >>>>>>> feels
> >> > > >>>>>>> > like
> >> > > >>>>>>> > > >> we
> >> > > >>>>>>> > > >> > are
> >> > > >>>>>>> > > >> > >> only forward compatible (old sources can be used
> >> > after
> >> > > >>>>>>> the
> >> > > >>>>>>> > change);
> >> > > >>>>>>> > > >> but
> >> > > >>>>>>> > > >> > I
> >> > > >>>>>>> > > >> > >> guess this holds for any API addition.
> >> > > >>>>>>> > > >> > >> - You might want to add what happens when all
> >> splits
> >> > > are
> >> > > >>>>>>> paused.
> >> > > >>>>>>> > > >> > >> - You may want to describe how the 3 flavors of
> >> > > >>>>>>> SourceReaderBase
> >> > > >>>>>>> > > >> > interact
> >> > > >>>>>>> > > >> > >> with the interface.
> >> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka
> >> and
> >> > > >>>>>>> Pulsar in
> >> > > >>>>>>> > the
> >> > > >>>>>>> > > >> > FLIP.
> >> > > >>>>>>> > > >> > >> For me, this is rather immediate follow-up work.
> >> > (could
> >> > > >>>>>>> be in the
> >> > > >>>>>>> > > >> same
> >> > > >>>>>>> > > >> > >> umbrella ticket)
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Arvid
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz
> >> <
> >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> >> > > >>>>>>> > > >> > >> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> >> > > >>>>>>> WithSplitsAlignment",
> >> > > >>>>>>> > > >> along
> >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> >> > > >>>>>>> WithSplitsAlignment",
> >> > > >>>>>>> > > or
> >> > > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
> >> > and
> >> > > >>>>>>> > > "MySplitReader
> >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> >> > > >>>>>>> "MySplitReader
> >> > > >>>>>>> > > >> > implements
> >> > > >>>>>>> > > >> > >> SplitReader".
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
> >> would
> >> > > be:
> >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> >> > > >>>>>>> "MySplitReader
> >> > > >>>>>>> > > >> > implements
> >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> >> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> >> > speaking I
> >> > > >>>>>>> think it
> >> > > >>>>>>> > is
> >> > > >>>>>>> > > >> > simpler
> >> > > >>>>>>> > > >> > >> to just add a new method to the split reader with
> >> > > >>>>>>> default impl.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I think that is a good idea to have it aligned as
> >> > much
> >> > > as
> >> > > >>>>>>> > possible.
> >> > > >>>>>>> > > >> I'd
> >> > > >>>>>>> > > >> > be
> >> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
> >> > AlignedSplitReader
> >> > > >>>>>>> with
> >> > > >>>>>>> > > >> > SplitReader. We
> >> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Dawid
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Can the interface be used to align other things in
> >> > the
> >> > > >>>>>>> future?
> >> > > >>>>>>> > For
> >> > > >>>>>>> > > >> > example,
> >> > > >>>>>>> > > >> > >> align read speed, I have
> >> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> >> > feature
> >> > > >>>>>>> may also
> >> > > >>>>>>> > > need
> >> > > >>>>>>> > > >> an
> >> > > >>>>>>> > > >> > >> interface like this.
> >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> >> support
> >> > > >>>>>>> align other
> >> > > >>>>>>> > > >> > things, I
> >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> >> > > >>>>>>> `alignWatermarksForSplits`
> >> > > >>>>>>> > > >> instead
> >> > > >>>>>>> > > >> > of
> >> > > >>>>>>> > > >> > >> `alignSplits`.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it would
> >> usually
> >> > be
> >> > > >>>>>>> more
> >> > > >>>>>>> > > >> extensible
> >> > > >>>>>>> > > >> > to
> >> > > >>>>>>> > > >> > >> just describe what the method actually does,
> >> instead
> >> > of
> >> > > >>>>>>> assuming
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > >> > >> purpose of doing this. For example, in this case,
> >> > > >>>>>>> > > >> pauseOrResumeSplits()
> >> > > >>>>>>> > > >> > >> would be more extensible because this can be used
> >> for
> >> > > >>>>>>> any kind of
> >> > > >>>>>>> > > >> flow
> >> > > >>>>>>> > > >> > >> control, be it watermark alignment or simple rate
> >> > > >>>>>>> limiting.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think
> >> they
> >> > > have
> >> > > >>>>>>> their
> >> > > >>>>>>> > own
> >> > > >>>>>>> > > >> > >> advantages.
> >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> >> extending
> >> > > >>>>>>> abilities
> >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in
> >> this
> >> > > >>>>>>> case. When
> >> > > >>>>>>> > you
> >> > > >>>>>>> > > >> have
> >> > > >>>>>>> > > >> > a
> >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> >> > > >>>>>>> organize them
> >> > > >>>>>>> > and
> >> > > >>>>>>> > > >> make
> >> > > >>>>>>> > > >> > >> users clear which methods
> >> > > >>>>>>> > > >> > >> need to implement when you want to have an
> >> ability.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if this is a
> >> > general
> >> > > >>>>>>> design
> >> > > >>>>>>> > > >> pattern
> >> > > >>>>>>> > > >> > in
> >> > > >>>>>>> > > >> > >> the other components in Flink. But it looks like
> >> the
> >> > > >>>>>>> current API
> >> > > >>>>>>> > > >> > proposal
> >> > > >>>>>>> > > >> > >> is not symmetric.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> The current proposal is essentially
> >> "MySourceReader
> >> > > >>>>>>> implements
> >> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
> >> > > >>>>>>> "MySplitReader
> >> > > >>>>>>> > > >> implements
> >> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd consider any
> >> > one
> >> > > >>>>>>> of the
> >> > > >>>>>>> > > >> following
> >> > > >>>>>>> > > >> > as
> >> > > >>>>>>> > > >> > >> symmetric.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> >> > > >>>>>>> WithSplitsAlignment",
> >> > > >>>>>>> > > >> along
> >> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> >> > > >>>>>>> WithSplitsAlignment",
> >> > > >>>>>>> > > or
> >> > > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
> >> > and
> >> > > >>>>>>> > > "MySplitReader
> >> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> >> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> >> > > >>>>>>> "MySplitReader
> >> > > >>>>>>> > > >> > implements
> >> > > >>>>>>> > > >> > >> SplitReader".
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
> >> would
> >> > > be:
> >> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> >> > > >>>>>>> "MySplitReader
> >> > > >>>>>>> > > >> > implements
> >> > > >>>>>>> > > >> > >> AlignedSplitReader".
> >> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> >> > speaking I
> >> > > >>>>>>> think it
> >> > > >>>>>>> > is
> >> > > >>>>>>> > > >> > simpler
> >> > > >>>>>>> > > >> > >> to just add a new method to the split reader with
> >> > > >>>>>>> default impl.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
> >> > > >>>>>>> > dwysakowicz@apache.org
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > > >> <
> >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>,
> >> > thanks
> >> > > >>>>>>> for the
> >> > > >>>>>>> > > reply.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference
> >> and I
> >> > > >>>>>>> agree most
> >> > > >>>>>>> > of
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> >> > following
> >> > > >>>>>>> your
> >> > > >>>>>>> > > >> suggestion
> >> > > >>>>>>> > > >> > to
> >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> >> > > >>>>>>> > > >> but
> >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
> >> Source
> >> > to
> >> > > >>>>>>> keep the
> >> > > >>>>>>> > > two
> >> > > >>>>>>> > > >> > >> methods together.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source
> >> is
> >> > > >>>>>>> that this
> >> > > >>>>>>> > > allows
> >> > > >>>>>>> > > >> > some
> >> > > >>>>>>> > > >> > >> compile time check. For example, if a user enabled
> >> > > >>>>>>> watermark
> >> > > >>>>>>> > > >> alignment
> >> > > >>>>>>> > > >> > >> while it is not supported by the Source, an
> >> exception
> >> > > >>>>>>> can be
> >> > > >>>>>>> > thrown
> >> > > >>>>>>> > > >> at
> >> > > >>>>>>> > > >> > >> compile time. It seems in general useful. That
> >> said,
> >> > I
> >> > > >>>>>>> agree that
> >> > > >>>>>>> > > API
> >> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the two
> >> methods
> >> > > >>>>>>> together.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
> >> > > >>>>>>> imjark@gmail.com> <
> >> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
> >> imjark@gmail.com>
> >> > > >>>>>>> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I
> >> did
> >> > > >>>>>>> miss it.
> >> > > >>>>>>> > > >> > >> I don't have other concerns then.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >> Jark
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >> > > >>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
> >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >> > > >>>>>>> > > >> > >> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Jark:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
> >> > when
> >> > > >>>>>>> the
> >> > > >>>>>>> > source
> >> > > >>>>>>> > > >> > >> implements the interface?
> >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
> >> alignment
> >> > > even
> >> > > >>>>>>> if Kafka
> >> > > >>>>>>> > > >> > >> implements the interface,
> >> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
> >> agree
> >> > > >>>>>>> with Becket
> >> > > >>>>>>> > > >> > >> we may need a
> >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> >> > > configure
> >> > > >>>>>>> the
> >> > > >>>>>>> > source
> >> > > >>>>>>> > > >> to
> >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> >> > > maxDesiredWatermark?
> >> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> >> > > >>>>>>> greatly affect
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> throughput
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> if the reader is constantly
> >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> >> > > configure
> >> > > >>>>>>> the
> >> > > >>>>>>> > > >> alignment
> >> > > >>>>>>> > > >> > >> offset?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1] which has
> >> > been
> >> > > >>>>>>> already
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> implemented
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the
> >> > > watermark
> >> > > >>>>>>> strategy
> >> > > >>>>>>> > > >> which
> >> > > >>>>>>> > > >> > >> also configures drift and update interval.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> >> support
> >> > > >>>>>>> align other
> >> > > >>>>>>> > > >> things,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> >> > > >>>>>>> `alignWatermarksForSplits`
> >> > > >>>>>>> > > >> instead
> >> > > >>>>>>> > > >> > of
> >> > > >>>>>>> > > >> > >> `alignSplits`.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Sure let's rename it.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Becket:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I understand your point. On the other hand putting
> >> > all
> >> > > >>>>>>> methods,
> >> > > >>>>>>> > > even
> >> > > >>>>>>> > > >> > with
> >> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain
> >> features,
> >> > > >>>>>>> makes the
> >> > > >>>>>>> > > entry
> >> > > >>>>>>> > > >> > >> threshold for writing a new source higher.
> >> Instead of
> >> > > >>>>>>> focusing on
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> basic
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> and required properties of the Source, the person
> >> > > >>>>>>> implementing a
> >> > > >>>>>>> > > >> source
> >> > > >>>>>>> > > >> > >> must bother with and need to figure out what all
> >> of
> >> > the
> >> > > >>>>>>> extra
> >> > > >>>>>>> > > >> features
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> are
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> about and how to deal with them. It makes it also
> >> > > harder
> >> > > >>>>>>> to
> >> > > >>>>>>> > > organize
> >> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference
> >> and I
> >> > > >>>>>>> agree most
> >> > > >>>>>>> > of
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> >> > following
> >> > > >>>>>>> your
> >> > > >>>>>>> > > >> suggestion
> >> > > >>>>>>> > > >> > to
> >> > > >>>>>>> > > >> > >> have the SourceReader extending from
> >> > > >>>>>>> > WithWatermarksSplitsAlignment,
> >> > > >>>>>>> > > >> but
> >> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
> >> Source
> >> > to
> >> > > >>>>>>> keep the
> >> > > >>>>>>> > > two
> >> > > >>>>>>> > > >> > >> methods together.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
> >> > > >>>>>>> "alignSplits"
> >> > > >>>>>>> > methods
> >> > > >>>>>>> > > >> > differ
> >> > > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The
> >> reason
> >> > > for
> >> > > >>>>>>> that is
> >> > > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
> >> > > >>>>>>> SplitReader
> >> > > >>>>>>> > > needs
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
> >> discrepancy
> >> > > >>>>>>> acceptable
> >> > > >>>>>>> > for
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > >> sake of simplifying changes significantly,
> >> especially
> >> > > as
> >> > > >>>>>>> they
> >> > > >>>>>>> > would
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> highly
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> likely impact performance as we would have to
> >> perform
> >> > > >>>>>>> additional
> >> > > >>>>>>> > > >> > lookups.
> >> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Dawid
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I missed
> >> > > >>>>>>> something).
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
> >> > when
> >> > > >>>>>>> the
> >> > > >>>>>>> > source
> >> > > >>>>>>> > > >> > >> implements the interface?
> >> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
> >> alignment
> >> > > even
> >> > > >>>>>>> if Kafka
> >> > > >>>>>>> > > >> > >> implements the interface,
> >> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
> >> agree
> >> > > >>>>>>> with Becket
> >> > > >>>>>>> > > >> > >> we may need a
> >> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> >> > > configure
> >> > > >>>>>>> the
> >> > > >>>>>>> > source
> >> > > >>>>>>> > > >> to
> >> > > >>>>>>> > > >> > >> enable/disable the alignment.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> >> > > maxDesiredWatermark?
> >> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> >> > > >>>>>>> greatly affect
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> throughput
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> if the reader is constantly
> >> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> >> > > configure
> >> > > >>>>>>> the
> >> > > >>>>>>> > > >> alignment
> >> > > >>>>>>> > > >> > >> offset?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> >> > > >>>>>>> > > >> > >> Can the interface be used to align other things in
> >> > the
> >> > > >>>>>>> future?
> >> > > >>>>>>> > For
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> example,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> align read speed, I have
> >> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> >> > feature
> >> > > >>>>>>> may also
> >> > > >>>>>>> > > need
> >> > > >>>>>>> > > >> an
> >> > > >>>>>>> > > >> > >> interface like this.
> >> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> >> support
> >> > > >>>>>>> align other
> >> > > >>>>>>> > > >> things,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> suggest explicitly declaring
> >> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> >> > > >>>>>>> `alignWatermarksForSplits`
> >> > > >>>>>>> > > >> instead
> >> > > >>>>>>> > > >> > of
> >> > > >>>>>>> > > >> > >> `alignSplits`.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 4. Interface or Method.
> >> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think
> >> they
> >> > > have
> >> > > >>>>>>> their
> >> > > >>>>>>> > own
> >> > > >>>>>>> > > >> > >> advantages.
> >> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> >> extending
> >> > > >>>>>>> abilities
> >> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> >> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in
> >> this
> >> > > >>>>>>> case. When
> >> > > >>>>>>> > you
> >> > > >>>>>>> > > >> have
> >> > > >>>>>>> > > >> > a
> >> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> >> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> >> > > >>>>>>> organize them
> >> > > >>>>>>> > and
> >> > > >>>>>>> > > >> make
> >> > > >>>>>>> > > >> > >> users clear which methods
> >> > > >>>>>>> > > >> > >> need to implement when you want to have an
> >> ability.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >> Jark
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
> >> > > >>>>>>> becket.qin@gmail.com>
> >> > > >>>>>>> > <
> >> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
> >> > > >>>>>>> becket.qin@gmail.com>
> >> > > >>>>>>> > <
> >> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Hi Dawid,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies that I
> >> somehow
> >> > > >>>>>>> misread a
> >> > > >>>>>>> > > bunch
> >> > > >>>>>>> > > >> of
> >> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Regarding 1, by default implementation, I was
> >> > thinking
> >> > > >>>>>>> of the
> >> > > >>>>>>> > > default
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> no-op
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> implementation. I am a little worried about the
> >> > > >>>>>>> proliferation of
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> decorative
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> interfaces. I think the most important thing about
> >> > > >>>>>>> interfaces is
> >> > > >>>>>>> > > that
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> they
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> are easy to understand. In this case, I prefer
> >> adding
> >> > > >>>>>>> new method
> >> > > >>>>>>> > to
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > >> existing interface for the following reasons:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of decorative
> >> > interfaces
> >> > > >>>>>>> is which
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> interface
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> they can decorate and which combinations of
> >> multiple
> >> > > >>>>>>> decorative
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> interfaces
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> >> > withSplitsAlignment
> >> > > >>>>>>> interface
> >> > > >>>>>>> > > is
> >> > > >>>>>>> > > >> > only
> >> > > >>>>>>> > > >> > >> applicable to the SourceReader which means it
> >> can't
> >> > > >>>>>>> decorate any
> >> > > >>>>>>> > > >> other
> >> > > >>>>>>> > > >> > >> interface. From an interface design perspective, a
> >> > > >>>>>>> natural
> >> > > >>>>>>> > question
> >> > > >>>>>>> > > >> is
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> why
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> >> > > >>>>>>> "withSplitsAlignment"? And it
> >> > > >>>>>>> > > is
> >> > > >>>>>>> > > >> > also
> >> > > >>>>>>> > > >> > >> natural to assume that a split reader implementing
> >> > both
> >> > > >>>>>>> > SplitReader
> >> > > >>>>>>> > > >> and
> >> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a source
> >> > reader
> >> > > >>>>>>> > implementing
> >> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why
> >> > > isn't
> >> > > >>>>>>> there an
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> interface
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if there
> >> is a
> >> > > new
> >> > > >>>>>>> feature
> >> > > >>>>>>> > > >> added
> >> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are
> >> we
> >> > > >>>>>>> going to
> >> > > >>>>>>> > create
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> another
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> interface of SplitReader such as
> >> SortedSplitReader or
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> PrePartitionedAware?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Can they be combined? So I think the additional
> >> > > >>>>>>> decorative
> >> > > >>>>>>> > > interface
> >> > > >>>>>>> > > >> > like
> >> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
> >> > > understanding
> >> > > >>>>>>> cost of
> >> > > >>>>>>> > > >> users
> >> > > >>>>>>> > > >> > >> because they have to know what decorative
> >> interfaces
> >> > > are
> >> > > >>>>>>> there,
> >> > > >>>>>>> > > which
> >> > > >>>>>>> > > >> > >> interface they can decorate and which
> >> combinations of
> >> > > the
> >> > > >>>>>>> > > decorative
> >> > > >>>>>>> > > >> > >> interfaces are valid and which are not. Ideally we
> >> > want
> >> > > >>>>>>> to avoid
> >> > > >>>>>>> > > >> that.
> >> > > >>>>>>> > > >> > To
> >> > > >>>>>>> > > >> > >> be clear, I am not opposing having an interface of
> >> > > >>>>>>> > > >> withSplitsAlignment,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> it
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> is completely OK to have it as an internal
> >> interface
> >> > > and
> >> > > >>>>>>> let
> >> > > >>>>>>> > > >> > SourceReader
> >> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader with a
> >> > > default
> >> > > >>>>>>> > > >> implementation
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> of
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> no-op would help avoid logic branching in the
> >> source
> >> > > >>>>>>> logic,
> >> > > >>>>>>> > > >> especially
> >> > > >>>>>>> > > >> > >> given that we agree that the vast majority of the
> >> > > >>>>>>> SourceReader
> >> > > >>>>>>> > > >> > >> implementations, if not all, would just extend
> >> from
> >> > the
> >> > > >>>>>>> > > >> > SourceReaderBase.
> >> > > >>>>>>> > > >> > >> That means adding a new method to the interface
> >> would
> >> > > >>>>>>> effectively
> >> > > >>>>>>> > > >> give
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> the
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> same user experience, but simpler.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> c) A related design principle that may be worth
> >> > > >>>>>>> discussing is how
> >> > > >>>>>>> > > do
> >> > > >>>>>>> > > >> we
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> let
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> the Source implementations tell Flink what
> >> capability
> >> > > is
> >> > > >>>>>>> > supported
> >> > > >>>>>>> > > >> and
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> what
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the most
> >> intuitive
> >> > > >>>>>>> place to me
> >> > > >>>>>>> > > is
> >> > > >>>>>>> > > >> in
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> the
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Source itself, because that is the entrance of the
> >> > > >>>>>>> entire Source
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> connector
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> logic.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
> >> > > >>>>>>> following
> >> > > >>>>>>> > > >> interface
> >> > > >>>>>>> > > >> > >> would be easier to understand by the users.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal
> >> interface,
> >> > > >>>>>>> let both
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> SourceReader
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a default
> >> no-op
> >> > > >>>>>>> > > implementation.
> >> > > >>>>>>> > > >> > >> - Add a new method "boolean
> >> supportSplitsAlignment()"
> >> > > to
> >> > > >>>>>>> the
> >> > > >>>>>>> > Source
> >> > > >>>>>>> > > >> > >> interface, with a default implementation returning
> >> > > >>>>>>> false. Sources
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> have
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> implemented the alignment logic can change this to
> >> > > >>>>>>> return true,
> >> > > >>>>>>> > and
> >> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
> >> > SourceReader
> >> > > /
> >> > > >>>>>>> > > SplitReader
> >> > > >>>>>>> > > >> if
> >> > > >>>>>>> > > >> > >> needed.
> >> > > >>>>>>> > > >> > >> - In the future, if a new optional feature is
> >> going
> >> > to
> >> > > >>>>>>> be added
> >> > > >>>>>>> > to
> >> > > >>>>>>> > > >> the
> >> > > >>>>>>> > > >> > >> Source, and that feature requires the awareness
> >> from
> >> > > >>>>>>> Flink, we
> >> > > >>>>>>> > can
> >> > > >>>>>>> > > >> add
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> more
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> such methods to the Source.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> What do you think?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
> >> > > >>>>>>> > > >> dwysakowicz@apache.org>
> >> > > >>>>>>> > > >> > <dw...@apache.org>
> >> > > >>>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org>
> >> <
> >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Konstantin:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> >> > > interface
> >> > > >>>>>>> (aka the
> >> > > >>>>>>> > > >> stop &
> >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> >> > > >>>>>>> Pulsar only,
> >> > > >>>>>>> > > >> correct?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Correct, as far as I know though, those are the
> >> only
> >> > > >>>>>>> sources
> >> > > >>>>>>> > which
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> consume
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> concurrently from multiple splits and thus
> >> alignment
> >> > > >>>>>>> applies.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Thomas:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> >> > > >>>>>>> SourceReaderBase and
> >> > > >>>>>>> > > then
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> doing
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
> >> > > >>>>>>> AlignedSplitReader
> >> > > >>>>>>> > > >> could
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> be
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
> >> instead
> >> > be
> >> > > >>>>>>> added to
> >> > > >>>>>>> > the
> >> > > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
> >> > make
> >> > > >>>>>>> it
> >> > > >>>>>>> > explicit
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >> the source actually supports it.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we could
> >> > > >>>>>>> actually do
> >> > > >>>>>>> > that.
> >> > > >>>>>>> > > >> Given
> >> > > >>>>>>> > > >> > >> the actual implementation of the
> >> > > >>>>>>> SourceReaderBase#alignSplits is
> >> > > >>>>>>> > > >> rather
> >> > > >>>>>>> > > >> > >> short (just a forward to the corresponding method
> >> of
> >> > > >>>>>>> > SplitFetcher),
> >> > > >>>>>>> > > >> we
> >> > > >>>>>>> > > >> > >> could reimplement it in the actual source
> >> > > >>>>>>> implementations. This
> >> > > >>>>>>> > > >> solution
> >> > > >>>>>>> > > >> > >> has the downside though. Authors of new sources
> >> would
> >> > > >>>>>>> have to do
> >> > > >>>>>>> > > two
> >> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and
> >> implement
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> instead of just extending AlignedSplitReader. I
> >> would
> >> > > be
> >> > > >>>>>>> fine
> >> > > >>>>>>> > with
> >> > > >>>>>>> > > >> such
> >> > > >>>>>>> > > >> > a
> >> > > >>>>>>> > > >> > >> tradeoff though. What others think?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Steven:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> For this part from the motivation section, is it
> >> > > >>>>>>> accurate? Let's
> >> > > >>>>>>> > > >> assume
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> one
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of
> >> the
> >> > > >>>>>>> partition
> >> > > >>>>>>> > is
> >> > > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
> >> > for
> >> > > >>>>>>> this
> >> > > >>>>>>> > source
> >> > > >>>>>>> > > >> task
> >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
> >> > other
> >> > > >>>>>>> two Kafka
> >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
> >> > overall
> >> > > >>>>>>> > watermark.
> >> > > >>>>>>> > > I
> >> > > >>>>>>> > > >> > >> thought the problem is that we may have late data
> >> for
> >> > > >>>>>>> this slow
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> partition.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> It will hold back the watermark. Watermark of an
> >> > > >>>>>>> operator is the
> >> > > >>>>>>> > > >> minimum
> >> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I have another question about the restart. Say
> >> split
> >> > > >>>>>>> alignment is
> >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> >> > > >>>>>>> restored from
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > >> > last
> >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> >> > > >>>>>>> checkpointed,
> >> > > >>>>>>> > > initially
> >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle
> >> of
> >> > > >>>>>>> watermark
> >> > > >>>>>>> > > >> > aggregation
> >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is
> >> a
> >> > > >>>>>>> problem. Just
> >> > > >>>>>>> > > >> want
> >> > > >>>>>>> > > >> > to
> >> > > >>>>>>> > > >> > >> understand it more.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Your understanding is correct.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> @Becket:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
> >> general
> >> > use
> >> > > >>>>>>> case, so
> >> > > >>>>>>> > > >> should
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> we
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> >> directly
> >> > > >>>>>>> instead of
> >> > > >>>>>>> > > >> > >> introducing the new interface of
> >> WithSplitAssignment?
> >> > > We
> >> > > >>>>>>> can
> >> > > >>>>>>> > > provide
> >> > > >>>>>>> > > >> > >> default implementations, so backwards
> >> compatibility
> >> > > >>>>>>> won't be an
> >> > > >>>>>>> > > >> issue.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I don't think we can provide a default
> >> > implementation.
> >> > > >>>>>>> How would
> >> > > >>>>>>> > we
> >> > > >>>>>>> > > >> do
> >> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it better than
> >> > > having
> >> > > >>>>>>> an
> >> > > >>>>>>> > opt-in
> >> > > >>>>>>> > > >> > >> interface? The default implementation would have
> >> to
> >> > be
> >> > > >>>>>>> added
> >> > > >>>>>>> > > >> exclusively
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> in
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the way
> >> notice
> >> > > >>>>>>> > > SourceReaderBase
> >> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
> >> effectively
> >> > > all
> >> > > >>>>>>> > > >> implementations
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> do
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I think
> >> it is
> >> > > >>>>>>> impossible
> >> > > >>>>>>> > to
> >> > > >>>>>>> > > >> > >> implement the SourceReader interface directly by
> >> end
> >> > > >>>>>>> users.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> >> > probably
> >> > > >>>>>>> also
> >> > > >>>>>>> > needs
> >> > > >>>>>>> > > >> some
> >> > > >>>>>>> > > >> > >> change to support throttling at the split
> >> > granularity.
> >> > > >>>>>>> Can you
> >> > > >>>>>>> > add
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >> interface change into the public interface
> >> section as
> >> > > >>>>>>> well?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> It has been added from the beginning. See
> >> > > >>>>>>> *AlignedSplitReader.*
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> >> > assignSplits
> >> > > >>>>>>> here,
> >> > > >>>>>>> > given
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> it
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> is not actually changing the split assignments? It
> >> > > seems
> >> > > >>>>>>> > something
> >> > > >>>>>>> > > >> like
> >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> >> adjustSplitsThrottling() is
> >> > > >>>>>>> more
> >> > > >>>>>>> > > accurate.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do
> >> you
> >> > > >>>>>>> still
> >> > > >>>>>>> > prefer
> >> > > >>>>>>> > > a
> >> > > >>>>>>> > > >> > >> different name for that? Personally, I am open for
> >> > > >>>>>>> suggestions
> >> > > >>>>>>> > > here.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Dawid
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> [1]
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > >
> >> > > >>>>>>> >
> >> > > >>>>>>>
> >> > >
> >> >
> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think
> >> the
> >> > > >>>>>>> motivation
> >> > > >>>>>>> > > >> makes a
> >> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
> >> general
> >> > use
> >> > > >>>>>>> case, so
> >> > > >>>>>>> > > >> should
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> we
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> >> directly
> >> > > >>>>>>> instead of
> >> > > >>>>>>> > > >> > >> introducing the new interface of
> >> WithSplitAssignment?
> >> > > We
> >> > > >>>>>>> can
> >> > > >>>>>>> > > provide
> >> > > >>>>>>> > > >> > >> default implementations, so backwards
> >> compatibility
> >> > > >>>>>>> won't be an
> >> > > >>>>>>> > > >> issue.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> >> > probably
> >> > > >>>>>>> also
> >> > > >>>>>>> > needs
> >> > > >>>>>>> > > >> some
> >> > > >>>>>>> > > >> > >> change to support throttling at the split
> >> > granularity.
> >> > > >>>>>>> Can you
> >> > > >>>>>>> > add
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >> interface change into the public interface
> >> section as
> >> > > >>>>>>> well?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> >> > assignSplits
> >> > > >>>>>>> here,
> >> > > >>>>>>> > given
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> it
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> is not actually changing the split assignments? It
> >> > > seems
> >> > > >>>>>>> > something
> >> > > >>>>>>> > > >> like
> >> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
> >> adjustSplitsThrottling() is
> >> > > >>>>>>> more
> >> > > >>>>>>> > > accurate.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
> >> > > >>>>>>> stevenz3wu@gmail.com
> >> > > >>>>>>> > >
> >> > > >>>>>>> > > <
> >> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
> >> > > >>>>>>> stevenz3wu@gmail.com>
> >> > > >>>>>>> > <
> >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> >> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> However, a single source operator may read data
> >> from
> >> > > >>>>>>> multiple
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
> >> partitions,
> >> > > such
> >> > > >>>>>>> that
> >> > > >>>>>>> > even
> >> > > >>>>>>> > > >> with
> >> > > >>>>>>> > > >> > >> watermark alignment the source operator may need
> >> to
> >> > > >>>>>>> buffer
> >> > > >>>>>>> > > excessive
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> amount
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> of data if one split emits data faster than
> >> another.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> For this part from the motivation section, is it
> >> > > >>>>>>> accurate? Let's
> >> > > >>>>>>> > > >> assume
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> one
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of
> >> the
> >> > > >>>>>>> partition
> >> > > >>>>>>> > is
> >> > > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
> >> > for
> >> > > >>>>>>> this
> >> > > >>>>>>> > source
> >> > > >>>>>>> > > >> task
> >> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
> >> > other
> >> > > >>>>>>> two Kafka
> >> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
> >> > overall
> >> > > >>>>>>> > watermark.
> >> > > >>>>>>> > > I
> >> > > >>>>>>> > > >> > >> thought the problem is that we may have late data
> >> for
> >> > > >>>>>>> this slow
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> partition.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I have another question about the restart. Say
> >> split
> >> > > >>>>>>> alignment is
> >> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> >> > > >>>>>>> restored from
> >> > > >>>>>>> > > the
> >> > > >>>>>>> > > >> > last
> >> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> >> > > >>>>>>> checkpointed,
> >> > > >>>>>>> > > initially
> >> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle
> >> of
> >> > > >>>>>>> watermark
> >> > > >>>>>>> > > >> > aggregation
> >> > > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is
> >> a
> >> > > >>>>>>> problem. Just
> >> > > >>>>>>> > > >> want
> >> > > >>>>>>> > > >> > to
> >> > > >>>>>>> > > >> > >> understand it more.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
> >> > > >>>>>>> thw@apache.org> <
> >> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> >> > > >>>>>>> > > >> > >> thw@apache.org> <
> >> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks for working on this!
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> >> > > >>>>>>> SourceReaderBase and
> >> > > >>>>>>> > > then
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> doing
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
> >> > > >>>>>>> AlignedSplitReader
> >> > > >>>>>>> > > >> could
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> be
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
> >> instead
> >> > be
> >> > > >>>>>>> added to
> >> > > >>>>>>> > the
> >> > > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
> >> > make
> >> > > >>>>>>> it
> >> > > >>>>>>> > explicit
> >> > > >>>>>>> > > >> that
> >> > > >>>>>>> > > >> > >> the source actually supports it.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Thanks,
> >> > > >>>>>>> > > >> > >> Thomas
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
> >> > > >>>>>>> > > knaufk@apache.org>
> >> > > >>>>>>> > > >> <
> >> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
> >> > knaufk@apache.org
> >> > > >
> >> > > >>>>>>> <
> >> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> >> > > >>>>>>> > > >> > >> knaufk@apache.org>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> >> > > interface
> >> > > >>>>>>> (aka the
> >> > > >>>>>>> > > >> stop
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> &
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> >> > > >>>>>>> Pulsar only,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> correct?
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to
> >> complete
> >> > the
> >> > > >>>>>>> watermark
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> aligned
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> story with this FLIP.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Cheers,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Konstantin
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz
> >> <
> >> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I support it
> >> ;)
> >> > I
> >> > > >>>>>>> think we
> >> > > >>>>>>> > can
> >> > > >>>>>>> > > >> > >> start a vote thread soonish, as there are no
> >> concerns
> >> > > so
> >> > > >>>>>>> far.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Dawid
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Dear Flink developers,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1]
> >> for
> >> > > an
> >> > > >>>>>>> > extension
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> of
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment also in
> >> > > >>>>>>> SplitReaders. To
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> do
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> so,
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and resume
> >> > reading
> >> > > >>>>>>> from
> >> > > >>>>>>> > split
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> sources
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and controlls
> >> > > >>>>>>> suspend and
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> resume.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> To
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> gather information about current watermarks of the
> >> > > >>>>>>> SplitReaders,
> >> > > >>>>>>> > we
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> extend
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
> >> > > >>>>>>> watermarks to
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> the
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> SourceOperator.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by
> >> Arvid
> >> > > >>>>>>> Heise and
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> revised
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> and
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the
> >> > work.)
> >> > > >>>>>>> and me.
> >> > > >>>>>>> > The
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> changes
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> are backwards compatible in a way that if affected
> >> > > >>>>>>> components do
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> not
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> support split alignment the behavior is as before.
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Best,
> >> > > >>>>>>> > > >> > >> Sebastian
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> [1]
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > >
> >> > > >>>>>>> >
> >> > > >>>>>>>
> >> > >
> >> >
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> [2]
> >> > > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> --
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> >> > > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> > >>
> >> > > >>>>>>> > > >> >
> >> > > >>>>>>> > > >>
> >> > > >>>>>>> > > >
> >> > > >>>>>>> > >
> >> > > >>>>>>> >
> >> > > >>>>>>>
> >> > > >>>>>>
> >> > >
> >> >
> >>
> >

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pi...@gmail.com>.
Hi Becket,

Thanks for summing this up. Just one correction:

> Piotr prefers option 2, his opinions are:
>   e) It is OK that the code itself in option 2 indicates the developers
that a feature is optional. We will rely on the documentation to correct
that and clarify that the feature is actually obligatory.

I would say based on a) and b) that feature would be still optional. So
both the implementation and the documentation would be saying that. We
could add a mention to the docs and release notes, that this feature will
be obligatory in the next major release and plan such a release accordingly.

Re the option 1., as you mentioned:
> As for option 1: For developers, the feature is still optional due to the
default implementation in the interface, regardless of what the default
implementation does, because the code compiles without overriding these
methods

Also importantly, the code will work in most cases.

> Obligatory: Jobs may fail if these methods are not implemented properly.
e..g SourceReader#pauseOrResumeSplits(). This is a common pattern in Java,
e.g. Iterator.remove() by default throws "UnsupportedOperationException",
informing the implementation that things may go wrong if this method is not
implemented.

For me `Iterator#remove()` is an optional feature. Personally, I don't
remember if I have ever implemented it.

Best,
Piotrek

pt., 27 maj 2022 o 10:14 Becket Qin <be...@gmail.com> napisał(a):

> I had an offline discussion with Piotr and here is the summary. Please
> correct me if I miss something, Piotr.
>
> There are two things we would like to seek more opinions from the
> community, so we can make progress on this FLIP.
>
> 1. The General pattern to add obligatory features to existing interfaces.
>
> ***********************************************************************************
> For interfaces exposed to the developers for implementation, they are
> either intended to be *optional* or *obligatory. *While it is quite clear
> about how to convey that intention when creating the interfaces, it is not
> as commonly agreed when we are adding new features to an existing
> interface. In general, Flink uses decorative interfaces when adding
> optional features to existing interfaces. Both Piotr and I agree that looks
> good.
>
> Different opinions are mainly about how to add obligatory features to the
> existing interfaces, probably due to different understandings of
> "obligatory".
>
> We have discussed about four options:
>
> *Option 1:*
>
>    - Just add a new method to the existing interface.
>    - For backwards compatibility, the method would have a default
>    implementation throwing "UnsupportedOperationException".
>    - In the next major version, remove the default implementation.
>    - For the developers, any method with a default implementation
>    throwing an "UnsupportedOperationException" should be taken as obligatory.
>
> *Option 2:*
>
>    - Always make the features optional by adding a decorative interface,
>    just like ordinary optional features.
>    - Inform the developers via documentation that this feature is
>    obligatory, although it looks like optional from the code.
>    - In case the developers did not implement the decorative interface,
>    throw an exception
>    - In the next major version, move the methods in the decorative
>    interface to the base interface, and deprecate the decorative interface.
>
> *Option 3:*
>
>    - Always bump the major version when a new obligatory feature is
>    added, even if we may have to do it frequently.
>
> *Option 4:*
>
>    - Add a V2, V3... of the interface affected by the new obligatory
>    feature.
>    - In the next major versions, deprecate old versions of the interfaces.
>
> Both Piotr and me agreed that option 3 and option 4 have a big side effect
> and should be avoided. We have different preference between option 1 and
> option 2.
>
> Personally I prefer option 1, the reasons are:
>   a) simple and intuitive. Java 8 introduced the default impl in
> interfaces exactly for interface evolving, and this is a common pattern in
> many projects.
>   b) prominent to the developers that the feature is expected to be
> implemented, because it explicitly throws an exception in the default impl.
>   c) low maintenance overhead - the Flink framework can always assume the
> method exists, so no special handling logic is needed.
>   d) communicate a clear semantic boundary between optional and obligatory
> features in the Flink to the developers.
>       - Optional: Jobs still run without exception if these methods are
> not implemented. e.g. all the SupportsXXXPushDown interfaces.
>       - Obligatory: Jobs may fail if these methods are not implemented
> properly. e..g SourceReader#pauseOrResumeSplits(). This is a common pattern
> in Java, e.g. Iterator.remove() by default throws
> "UnsupportedOperationException", informing the implementation that things
> may go wrong if this method is not implemented.
>
> As for option 2, Although the API itself sounds clean, it misleads people
> to think of an obligatory feature to be optional - from the code the
> feature is optional, but the documents say it is obligatory. We probably
> should avoid such code-doc inconsistency, as people will be confused. And I
> would actually be bewildered that sometimes not implementing an "optional"
> feature is fine, but sometimes it causes the jobs to fail.
>
> In response to the argument that the method with a default implementation
> is always optional, if that is true, it actually means all the interfaces
> should be immutable once they are created. If we want to add a method to an
> existing interface, for backwards compatibility, we will have to provide a
> default implementation. And the fact it has a default implementation
> indicates the method is optional. If that method is optional, it should
> reside in a separate decorative interface, otherwise it clogs that existing
> interface. Therefore, people should never add a method to an existing
> interface. I find this conclusion a bit extreme.
>
> Piotr prefers option 2, his opinions are:
>     a) Obligatory methods are the methods that fail the code compilation
> if not implemented.
>     b) All obligatory methods should reside in the base interface, without
> a default implementation. And all the optional methods should be in
> decorative interfaces. This is a clean API.
>     c) due to b), there isn't a viable solution to add an obligatory
> method to an existing interface in a backwards compatible way. Unless we
> are OK with breaking backwards compatibility, all the interfaces should be
> treated as immutable. As a compromise, we might as well just treat all the
> features added later as optional features. This way we keep the API clean.
>     d) based on b) and c), option 2 has a clean API, while option 1 does
> not.
>     e) It is OK that the code itself in option 2 indicates the developers
> that a feature is optional. We will rely on the documentation to correct
> that and clarify that the feature is actually obligatory.
>     f) Regarding the effectiveness of making people aware that the feature
> is obligatory, Option 1 and Option 2 are similar. For people that do not
> read the release note / documentation, they will mistake the feature to be
> optional anyways.
>
> As for option 1: For developers, the feature is still optional due to the
> default implementation in the interface, regardless of what the default
> implementation does, because the code compiles without overriding these
> methods. Also, another problem of this option is that for users that do not
> know about the history of the interface, they may be confused by the
> default implementation throwing an exception.
>
>
> 2. For this particular FLIP, should it be optional or not?
>
> ***********************************************************************************
> As mentioned in the previous email, I feel this FLIP should be obligatory,
> for the following reasons:
> 1. The Flink framework exposes the watermark alignment API to the end
> users. From the end users' perspective, the feature should be available
> regardless of the implementation details in the pluggables. This is true
> for any other methods exposed as the Flink API.
> 2. If a Source is not pausable, the end user should receive an exception
> when enable the watermark alignment, (both Piotr and me agree on this). In
> that case, it meets my criteria of obligatory feature because not
> implementing the feature causes a framework API to throw exception and
> fails the job.
>
> On the other hand, Piotr does not have a strong opinion regarding whether
> this feature should be optional or not.
>
>
> Thanks for reading through this long email. So basically in order to make
> progress on this FLIP, we want to see what do people feel about the above
> two topics.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
>> Hi Becket,
>>
>> I still sustain what I wrote before:
>> > I think I would still vote soft -1 on this option, but I wouldn't block
>> it in case I am out-voted.
>>
>> > I think it might be helpful to agree on the definition of optional in
>> our
>> case.
>>
>> For me it doesn't matter whether a default method throwing an exception we
>> call optional or non-optional. As long as we keep it this way, the effect
>> is the same. It's effectively a method that a user doesn't have to
>> implement. If interface/system allows some methods to be not implemented,
>> some users will do just that, regardless if we call it and document as
>> non-optional. And at the same time it's clogging the base interface.
>>
>> By the way, just the need for a java-doc/documentation explaining the
>> existence of some construct is a bad smell (code should be
>> self-documenting
>> and default method throwing an UnsupportedOperationException is not).
>>
>> > Please note that so far we do not assume whether the feature is in
>> > the original API or it is added later. A newly added feature can also be
>> > non-optional, although it might take some time for all the pluggable
>> > developers to catch up, and they should still work if the new feature is
>> > not used until they catch up. In contrast, we may never expect an
>> optional
>> > feature to catch up, because leaving it unimplemented is also blessed.
>> >
>> > Let's take the checkpointing as an example. Imagine Flink did not
>> support
>> > checkpointing before release 1.16. And now we are trying to add
>> > checkpointing to Flink. So we exposed the checkpoint configuration to
>> the
>> > end users. In the meantime, will we tell the pluggable (e.g. operators,
>> > connectors) developers that methods like "snapshotState()" is optional?
>> If
>> > we do that, the availability of checkpointing in Flink would be severely
>> > weakened. But apparently we should still allow the existing
>> implementations
>> > to work without checkpointing. It looks to me that adding the method to
>> the
>> > pluggable interfaces with a default implementation throwing
>> > "UnsupportedOperationException" would be the solution here. Please note
>> > that in this case, having the default implementation does not mean this
>> is
>> > optional. It is just the technique to support backwards compatibility in
>> > the feature evolution. The fact that this method is in the base
>> interface
>> > suggests it is not optional, so the developers SHOULD implement it.
>>
>> I would soft vote -1 for having the default method throwing
>> UnsupportedOperationException as one of thing for this (FLIP-217) special
>> circumstances.
>>
>> At the moment, without thinking this over too much, I would vote harder -1
>> for having this as a general rule when adding new features. If we ever end
>> up with an API that is littered with default methods throwing
>> UnsupportedOperationException that are documented as "non optional" it
>> would be IMO a big design failure. I would be ok-ish with that, only if
>> that was a temporary thing and we had an aggressive plan to release more
>> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
>> compatibility, that would get rid of those default methods. Adding
>> checkpointing and methods like "snapshotState()" would IMO easily justify
>> a
>> new major Flink release. In that case we could add those methods with
>> default implementation for some transition period, a one or two minor
>> releases, followed by a clean up in a major release. However I would still
>> argue that it would be cleaner/better to add a decorative interface like
>> `CheckpointedOperator` interface instead of adding those default methods
>> to
>> the base `Operator` interface.
>>
>> I think I can sum up our disagreement here is that I would like to keep
>> the
>> interfaces simpler, with only obligatory methods/features on one side and
>> clearly optional features on the other. While you would like to add an
>> extra third state in between those two?
>>
>> Best,
>> Piotrek
>>
>>
>>
>> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com> napisał(a):
>>
>> > Thanks for the clarification, Piotr and Sebastian.
>> >
>> > It looks like the key problem is still whether the implementation of
>> > pausable splits in the Sources should be optional or not.
>> >
>> > I think it might be helpful to agree on the definition of optional in
>> our
>> > case. To me:
>> > Optional = "You CAN leave the method unimplemented, and that is fine."
>> > Non-Optional = "You CAN leave the method unimplemented, but you SHOULD
>> NOT,
>> > because people assume this works."
>> >
>> > I think one sufficient condition of a Non-Optional feature is that if
>> the
>> > feature is exposed through the framework API, Flink should expect the
>> > pluggables to support this feature by default. Otherwise the
>> availability
>> > of that feature becomes undefined.
>> >
>> > Please note that so far we do not assume whether the feature is in
>> > the original API or it is added later. A newly added feature can also be
>> > non-optional, although it might take some time for all the pluggable
>> > developers to catch up, and they should still work if the new feature is
>> > not used until they catch up. In contrast, we may never expect an
>> optional
>> > feature to catch up, because leaving it unimplemented is also blessed.
>> >
>> > Let's take the checkpointing as an example. Imagine Flink did not
>> support
>> > checkpointing before release 1.16. And now we are trying to add
>> > checkpointing to Flink. So we exposed the checkpoint configuration to
>> the
>> > end users. In the meantime, will we tell the pluggable (e.g. operators,
>> > connectors) developers that methods like "snapshotState()" is optional?
>> If
>> > we do that, the availability of checkpointing in Flink would be severely
>> > weakened. But apparently we should still allow the existing
>> implementations
>> > to work without checkpointing. It looks to me that adding the method to
>> the
>> > pluggable interfaces with a default implementation throwing
>> > "UnsupportedOperationException" would be the solution here. Please note
>> > that in this case, having the default implementation does not mean this
>> is
>> > optional. It is just the technique to support backwards compatibility in
>> > the feature evolution. The fact that this method is in the base
>> interface
>> > suggests it is not optional, so the developers SHOULD implement it.
>> >
>> > When it comes to this FLIP, I think it meets the criteria of
>> non-optional
>> > features, so we should just use the evolution path of non-optional
>> > features.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> >
>> >
>> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <pn...@apache.org>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > Actually previously I thought about having a decorative interface and
>> > > whenever watermark alignment is enabled, checking that the source
>> > > implements the decorative interface. If not, throwing an exception.
>> > >
>> > > The option with default methods in the source interfaces throwing
>> > > `UnsupportedOperationException` I think still suffers from the same
>> > > problems I mentioned before. It's still an optional implementation
>> and at
>> > > the same time it's clogging the base interface. I think I would still
>> > vote
>> > > soft -1 on this option, but I wouldn't block it in case I am
>> out-voted.
>> > >
>> > > Best,
>> > > Piotrek
>> > >
>> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <se...@ververica.com>
>> > > napisał(a):
>> > >
>> > > > Hi Becket,
>> > > >
>> > > > Thanks a lot for your fast and detailed response. For me, it
>> converges
>> > > and
>> > > > dropping the supportsX method sounds very reasonable to me. (Side
>> note:
>> > > > With "pausable splits" enabled as "default" I think we
>> misunderstood.
>> > As
>> > > > you described now "default" I understand as that it should be the
>> new
>> > > > recommended way of implementation, and I think that is fully valid.
>> > > Before,
>> > > > I understood "default" here as the default implementation, i.e.,
>> > throwing
>> > > > UnsupportedOperationException, which is the exact opposite. :) )
>> > > >
>> > > > Nevertheless: As mentioned, an open question for me is if watermark
>> > > > alignment should enforce pausable splits. For clarification, the
>> > current
>> > > > documentation [1] says:
>> > > >
>> > > > *Note:* As of 1.15, Flink supports aligning across tasks of the same
>> > > >> source and/or different sources. It does not support aligning
>> > > >> splits/partitions/shards in the same task.
>> > > >>
>> > > >> In a case where there are e.g. two Kafka partitions that produce
>> > > >> watermarks at different pace, that get assigned to the same task
>> > > watermark
>> > > >> might not behave as expected. Fortunately, worst case it should not
>> > > perform
>> > > >> worse than without alignment.
>> > > >>
>> > > >> Given the limitation above, we suggest applying watermark
>> alignment in
>> > > >> two situations:
>> > > >>
>> > > >>    1. You have two different sources (e.g. Kafka and File) that
>> > produce
>> > > >>    watermarks at different speeds
>> > > >>    2. You run your source with parallelism equal to the number of
>> > > >>    splits/shards/partitions, which results in every subtask being
>> > > assigned a
>> > > >>    single unit of work.
>> > > >>
>> > > >> I personally see no issue in implementing and I see no reason
>> against
>> > > > implementing this dependency of watermark alignment and pausable
>> > splits.
>> > > (I
>> > > > think this would even be a good path towards shaping watermark
>> > alignment
>> > > in
>> > > > 1.16.) However, "I don't see" means that I would be happy to hear
>> > Dawid's
>> > > > and Piotrek's opinions as they implemented watermark alignment
>> based on
>> > > > FLIP-182 [2] and I don't want to miss relevant rationale/background
>> > info
>> > > > from their side.
>> > > >
>> > > > *@Piotrek* *@Dawid *What do you think?
>> > > >
>> > > > Regards,
>> > > > Sebastian
>> > > >
>> > > > [1]
>> > > >
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
>> > > > [2]
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
>> > > >
>> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <be...@gmail.com>
>> > wrote:
>> > > >
>> > > >> +dev
>> > > >>
>> > > >> Hi Sebastian,
>> > > >>
>> > > >> Thank you for the summary. Please see the detailed replies inline.
>> As
>> > a
>> > > >> recap of my suggestions.
>> > > >>
>> > > >> 1. Pausable splits API.
>> > > >>   a) Add default implementations to methods "pauseOrResumeSplits"
>> in
>> > > both
>> > > >> SourceReader and SplitReader where both default implementations
>> throw
>> > > >>  UnsupportedOperationException.
>> > > >>
>> > > >> 2. User story.
>> > > >>     a) We tell users to enable the watermark alignment as they
>> like.
>> > > This
>> > > >> is exactly what the current Flink API is.
>> > > >>     b) We tell the source developers, please implement pausable
>> > splits,
>> > > >> otherwise bad things may happen. Think of it like you are expected
>> to
>> > > >> implement SourceReader#snapshotState() properly, otherwise
>> exceptions
>> > > will
>> > > >> be thrown when users enable checkpointing.
>> > > >>
>> > > >> Thanks,
>> > > >>
>> > > >> Jiangjie (Becket) Qin
>> > > >>
>> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
>> > > >> sebastian@ververica.com> wrote:
>> > > >>
>> > > >>> Hi Becket, Hi everybody,
>> > > >>>
>> > > >>> I'm sorry if I misread the messages but I could not derive an
>> > agreement
>> > > >>> from the mailing list. Nevertheless, if I understand you right the
>> > > >>> suggestion is:
>> > > >>>
>> > > >>> * Add default implementations to methods "pauseOrResumeSplits" in
>> > both
>> > > >>> SourceReader and SplitReader where both default implementations
>> throw
>> > > >>> UnsupportedOperationException.
>> > > >>>
>> > > >> Yes.
>> > > >>
>> > > >> * Add "supportsPauseOrResumeSplits" to the Source interface. (In
>> the
>> > > >>> following, I refer to supporting this as "pausable splits".)
>> > > >>>
>> > > >> We may no longer need this if pausable splits are expected to be
>> > > >> implemented by the source developers, i.e. non-optional. Having
>> this
>> > > method
>> > > >> would then be somewhat misleading as it looks like the sources
>> that do
>> > > not
>> > > >> support pausable splits are also acceptable in the long term. So
>> API
>> > > wise,
>> > > >> I'd say maybe we should remove this for this FLIP, although I
>> believe
>> > > this
>> > > >> supportXXX pattern itself is still attractive for optional
>> features.
>> > > >>
>> > > >>
>> > > >>>
>> > > >>> To make the conclusions explicit:
>> > > >>>
>> > > >>> 1. The implementation of pauseOrResumeSplits in both interfaces
>> > > >>> SourceReader and SplitReader are optional where the default is
>> that
>> > it
>> > > >>> doesn't support it. (--> This means that the implementation is
>> still
>> > > >>> optional for the source developer.)
>> > > >>>
>> > > >> It is optional for backwards compatibility with existing sources,
>> as
>> > > they
>> > > >> may still compile without code change. But starting from this FLIP,
>> > > Flink
>> > > >> will always optimistically assume that all the sources support
>> > pausable
>> > > >> splits. If a source does not support pausable splits, it goes to an
>> > > error
>> > > >> handling path when watermark alignment is enabled on it. This is
>> > > different
>> > > >> from a usual optional feature, where no error is expected.
>> > > >>
>> > > >>
>> > > >>> 2. If watermark alignment is enabled in the application code by
>> > adding
>> > > >>> withWatermarkAlignment to the WatermarkStrategy while
>> SourceReader or
>> > > >>> SplitReader do not support pausableSplits, we throw an
>> > > >>> UnsupportedOperationException.
>> > > >>>
>> > > >> Yes.
>> > > >>
>> > > >>
>> > > >>> 3. With regard to your statement:
>> > > >>>
>> > > >>>> [...] basically means watermark alignment is an non-optional
>> feature
>> > > to
>> > > >>>> the end users.
>> > > >>>
>> > > >>> You actually mean that "pausable splits" are non-optional for the
>> app
>> > > >>> developer if watermark alignment is enabled. However, watermark
>> > > alignment
>> > > >>> is optional and can be enabled/disabled.
>> > > >>>
>> > > >> Yes, watermark alignment can be enabled/disabled in individual
>> sources
>> > > in
>> > > >> Flink jobs, which basically means the code supporting watermark
>> > > alignment
>> > > >> has to already be there. That again means the Source developers are
>> > also
>> > > >> expected to support pausable splits by default. So this way we
>> > > essentially
>> > > >> tell the end users that you may enable / disable this feature as
>> you
>> > > wish,
>> > > >> and tell the source developers that you SHOULD implement this
>> because
>> > > the
>> > > >> end users may turn it on/off at will. And if the source does not
>> > support
>> > > >> pausable splits, that goes to an error handling path when watermark
>> > > >> alignment is enabled on it. So users know they have to explicitly
>> > > exclude
>> > > >> this source.
>> > > >>
>> > > >>
>> > > >>>
>> > > >>> So far it's totally clear to me and I hope this is what you mean.
>> I
>> > > also
>> > > >>> agree with both statements:
>> > > >>>
>> > > >>> So making that expectation aligned with the source developers
>> seems
>> > > >>>> reasonable.
>> > > >>>>
>> > > >>>
>> > > >>> I think this is a simple and clean solution from both the end user
>> > and
>> > > >>>> source developers' standpoint.
>> > > >>>>
>> > > >>>
>> > > >>> However, a last conclusion derives from 3. and is an open question
>> > for
>> > > >>> me:
>> > > >>>
>> > > >>> 4. The feature of "pausable splits" is now tightly bound to
>> watermark
>> > > >>> alignment, i.e., if sources do not support "pausable splits" one
>> can
>> > > not
>> > > >>> enable watermark alignment for these sources. This dependency is
>> not
>> > > the
>> > > >>> current status of watermark alignment implementation because it
>> > is/was
>> > > >>> implemented without pausable splits. Do we want to introduce this
>> > > >>> dependency? (This is an open question. I cannot judge that.)
>> > > >>>
>> > > >> The watermark alignment basically relies on the pausable splits,
>> > right?
>> > > >> So personally I found it quite reasonable that if the source does
>> not
>> > > >> support pausable splits, end users cannot enable watermark
>> alignment
>> > on
>> > > it.
>> > > >>
>> > > >>
>> > > >>> If something is wrong, please correct me.
>> > > >>>
>> > > >>> Regards,
>> > > >>> Sebastian
>> > > >>>
>> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com>
>> > > wrote:
>> > > >>>
>> > > >>>> Hi Sebastian,
>> > > >>>>
>> > > >>>> Thanks for the reply and patient discussion. I agree this is a
>> > tricky
>> > > >>>> decision.
>> > > >>>>
>> > > >>>>
>> > > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I
>> see
>> > as
>> > > >>>>> follows:
>> > > >>>>> (1) An interface with default NOOP implementation makes the
>> > > >>>>> implementation optional. And in my opinion, a default
>> > implementation
>> > > is and
>> > > >>>>> will remain a way of making implementation optional because
>> even in
>> > > future
>> > > >>>>> a developer can decide to implement the "old flavor" without
>> > support
>> > > for
>> > > >>>>> pausable splits.
>> > > >>>>> (2) It may not be too critical but I also find it suboptimal
>> that
>> > > with
>> > > >>>>> a NOOP default implementation there is no way to check at
>> runtime
>> > if
>> > > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
>> > one
>> > > would
>> > > >>>>> need a supportsX method which makes it again more complicated.)\
>> > > >>>>
>> > > >>>>
>> > > >>>> Based on the last few messages in the mailing list.  Piotr and I
>> > > agreed
>> > > >>>> that the default implementation should just throw an
>> > > >>>> UnsupportedOperationException if the source is unpausable. So
>> this
>> > > >>>> basically tells the Source developers that this feature is
>> expected
>> > > to be
>> > > >>>> supported. Because we cannot prevent end users from putting an
>> > > unpausable
>> > > >>>> source into the watermark alignment group, that basically means
>> > > watermark
>> > > >>>> alignment is an non-optional feature to the end users. So making
>> > that
>> > > >>>> expectation aligned with the source developers seems reasonable.
>> > And
>> > > if a
>> > > >>>> source does not support this feature, the end users should
>> > explicitly
>> > > >>>> remove that source from the watermark alignment group.
>> > > >>>>
>> > > >>>> Personally speaking I think this is a simple and clean solution
>> from
>> > > >>>> both the end user and source developers' standpoint.
>> > > >>>>
>> > > >>>> Does this address your concerns?
>> > > >>>>
>> > > >>>> Thanks,
>> > > >>>>
>> > > >>>> Jiangjie (Becket) Qin
>> > > >>>>
>> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
>> > > >>>> sebastian@ververica.com> wrote:
>> > > >>>>
>> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
>> > > >>>>>
>> > > >>>>> we, Dawid and I, discussed the various suggestions/options and
>> we
>> > > >>>>> would be okay either way because we find neither solution is
>> > perfect
>> > > just
>> > > >>>>> because of the already present complexity.
>> > > >>>>>
>> > > >>>>> Option c) Adding methods to the interfaces of SourceReader and
>> > > >>>>> SplitReader
>> > > >>>>> Option a) Adding decorative interfaces to be used by
>> SourceReader
>> > and
>> > > >>>>> SplitReader
>> > > >>>>>
>> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is based on
>> > > >>>>> Option c) which we find acceptable because the complexity added
>> is
>> > > only a
>> > > >>>>> single method.
>> > > >>>>>
>> > > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I
>> see
>> > as
>> > > >>>>> follows:
>> > > >>>>> (1) An interface with default NOOP implementation makes the
>> > > >>>>> implementation optional. And in my opinion, a default
>> > implementation
>> > > is and
>> > > >>>>> will remain a way of making implementation optional because
>> even in
>> > > future
>> > > >>>>> a developer can decide to implement the "old flavor" without
>> > support
>> > > for
>> > > >>>>> pausable splits.
>> > > >>>>> (2) It may not be too critical but I also find it suboptimal
>> that
>> > > with
>> > > >>>>> a NOOP default implementation there is no way to check at
>> runtime
>> > if
>> > > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
>> > one
>> > > would
>> > > >>>>> need a supportsX method which makes it again more complicated.)
>> > > >>>>>
>> > > >>>>> However, we haven't changed it because Option a) is also not
>> > optimal
>> > > >>>>> or straight-forward:
>> > > >>>>> (1) We need to add two distinct yet similar decorative
>> interfaces
>> > > >>>>> since, as mentioned, the signatures of the methods are
>> different.
>> > For
>> > > >>>>> example, we would need decorative interfaces like
>> > > >>>>> `SplitReaderWithPausableSplits` and
>> > `SourceReaderWithPausableSplits`.
>> > > >>>>> (2) As a consequence, we would need to somehow document
>> how/where
>> > to
>> > > >>>>> implement both interfaces and how this relates to each other.
>> This
>> > > we could
>> > > >>>>> solve by adding a note in the interface of SourceReader and
>> > > SplitReader and
>> > > >>>>> reference to the decorative interfaces but it still increases
>> > > complexity
>> > > >>>>> too.
>> > > >>>>>
>> > > >>>>> In summary, we see both as acceptable and preferred over other
>> > > >>>>> options. The question is if we can find a solution or compromise
>> > > that is
>> > > >>>>> acceptable for everybody to reach consensus.
>> > > >>>>>
>> > > >>>>> Please let us know what you think because we would be happy if
>> we
>> > can
>> > > >>>>> conclude the discussion to avoid dropping the initiative on this
>> > > FLIP.
>> > > >>>>>
>> > > >>>>> Regards,
>> > > >>>>> Sebastian
>> > > >>>>>
>> > > >>>>> [1]
>> > > >>>>>
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
>> > > >>>>> (v. 12)
>> > > >>>>>
>> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
>> > pnowojski@apache.org
>> > > >
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>>> Hi Guowei,
>> > > >>>>>>
>> > > >>>>>> as Dawid wrote a couple of messages back:
>> > > >>>>>>
>> > > >>>>>> > This is covered in the previous FLIP[1] which has been
>> already
>> > > >>>>>> implemented in 1.15. In short, it must be enabled with the
>> > watermark
>> > > >>>>>> strategy which also configures drift and update interval
>> > > >>>>>>
>> > > >>>>>> So by default watermark alignment is disabled, regardless if a
>> > > source
>> > > >>>>>> supports it or not.
>> > > >>>>>>
>> > > >>>>>> Best,
>> > > >>>>>> Piotrek
>> > > >>>>>>
>> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com>
>> > > napisał(a):
>> > > >>>>>>
>> > > >>>>>>> Hi,
>> > > >>>>>>>
>> > > >>>>>>> We know that in the case of Bounded input Flink supports the
>> > Batch
>> > > >>>>>>> execution mode. Currently in Batch execution mode, flink is
>> > > executed
>> > > >>>>>>> on a
>> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark alignment
>> > > might
>> > > >>>>>>> not
>> > > >>>>>>> gain much.
>> > > >>>>>>>
>> > > >>>>>>> So my question is: Is watermark alignment the default
>> > behavior(for
>> > > >>>>>>> implemented source only)? If so, have you considered
>> evaluating
>> > the
>> > > >>>>>>> impact
>> > > >>>>>>> of this behavior on the Batch execution mode? Or thinks it is
>> not
>> > > >>>>>>> necessary.
>> > > >>>>>>>
>> > > >>>>>>> Correct me if I miss something.
>> > > >>>>>>>
>> > > >>>>>>> Best,
>> > > >>>>>>> Guowei
>> > > >>>>>>>
>> > > >>>>>>>
>> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
>> > > >>>>>>> piotr.nowojski@gmail.com>
>> > > >>>>>>> wrote:
>> > > >>>>>>>
>> > > >>>>>>> > Hi Becket and Dawid,
>> > > >>>>>>> >
>> > > >>>>>>> > > I feel that no matter which option we choose this can not
>> be
>> > > >>>>>>> solved
>> > > >>>>>>> > entirely in either of the options, because of the point
>> above
>> > and
>> > > >>>>>>> because
>> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
>> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
>> > > >>>>>>> identifies
>> > > >>>>>>> > splits with splitId the other one passes the splits
>> directly).
>> > > >>>>>>> >
>> > > >>>>>>> > Yes, that's a good point in this case and for features that
>> > need
>> > > >>>>>>> to be
>> > > >>>>>>> > implemented in more than one place.
>> > > >>>>>>> >
>> > > >>>>>>> > > Is there any reason for pausing reading from a split an
>> > > optional
>> > > >>>>>>> feature,
>> > > >>>>>>> > > other than that this was not included in the original
>> > > interface?
>> > > >>>>>>> >
>> > > >>>>>>> > An additional argument in favor of making it optional is to
>> > > >>>>>>> simplify source
>> > > >>>>>>> > implementation. But on its own I'm not sure if that would be
>> > > >>>>>>> enough to
>> > > >>>>>>> > justify making this feature optional. Maybe.
>> > > >>>>>>> >
>> > > >>>>>>> > > I think it would be way simpler and clearer to just let
>> end
>> > > >>>>>>> users and
>> > > >>>>>>> > Flink
>> > > >>>>>>> > > assume all the connectors will implement this feature.
>> > > >>>>>>> >
>> > > >>>>>>> > As I wrote above that would be an interesting choice to make
>> > > (ease
>> > > >>>>>>> of
>> > > >>>>>>> > implementation for new users, vs system consistency).
>> > Regardless
>> > > >>>>>>> of that,
>> > > >>>>>>> > yes, for me the main argument is the API backward
>> > compatibility.
>> > > >>>>>>> But let's
>> > > >>>>>>> > clear a couple of points:
>> > > >>>>>>> > - The current proposal adding methods to the base interface
>> > with
>> > > >>>>>>> default
>> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
>> decorative
>> > > >>>>>>> version
>> > > >>>>>>> > would be.
>> > > >>>>>>> > - Decorative version could implement "throw
>> > > >>>>>>> UnsupportedOperationException"
>> > > >>>>>>> > if user enabled watermark alignment just as well and I agree
>> > > >>>>>>> that's a
>> > > >>>>>>> > better option compared to logging a warning.
>> > > >>>>>>> >
>> > > >>>>>>> > Best,
>> > > >>>>>>> > Piotrek
>> > > >>>>>>> >
>> > > >>>>>>> >
>> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
>> > > >>>>>>> napisał(a):
>> > > >>>>>>> >
>> > > >>>>>>> > > Thanks for the reply and patient discussion, Piotr and
>> Dawid.
>> > > >>>>>>> > >
>> > > >>>>>>> > > Is there any reason for pausing reading from a split an
>> > > optional
>> > > >>>>>>> feature,
>> > > >>>>>>> > > other than that this was not included in the original
>> > > interface?
>> > > >>>>>>> > >
>> > > >>>>>>> > > To be honest I am really worried about the complexity of
>> the
>> > > >>>>>>> user story
>> > > >>>>>>> > > here. Optional features like this have a high overhead.
>> > Imagine
>> > > >>>>>>> this
>> > > >>>>>>> > > feature is optional, now a user enabled watermark
>> alignment
>> > and
>> > > >>>>>>> defined a
>> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that depends on
>> > > >>>>>>> whether the
>> > > >>>>>>> > > involved Source has implmemented this feature. If the
>> Sources
>> > > >>>>>>> are well
>> > > >>>>>>> > > documented, good luck. Otherwise end users may have to
>> look
>> > > into
>> > > >>>>>>> the code
>> > > >>>>>>> > > of the Source to see whether the feature is supported.
>> Which
>> > is
>> > > >>>>>>> something
>> > > >>>>>>> > > they shouldn't have to do.
>> > > >>>>>>> > >
>> > > >>>>>>> > > I think it would be way simpler and clearer to just let
>> end
>> > > >>>>>>> users and
>> > > >>>>>>> > Flink
>> > > >>>>>>> > > assume all the connectors will implement this feature.
>> After
>> > > all
>> > > >>>>>>> the
>> > > >>>>>>> > > watermark group is not optinoal to the end users. If in
>> some
>> > > >>>>>>> rare cases,
>> > > >>>>>>> > > the feature cannot be supported, a clear
>> > > >>>>>>> UnsupportedOperationException
>> > > >>>>>>> > will
>> > > >>>>>>> > > be thrown to tell users to explicitly remove this Source
>> from
>> > > the
>> > > >>>>>>> > watermark
>> > > >>>>>>> > > group. I don't think we should have a warning message
>> here,
>> > as
>> > > >>>>>>> they tend
>> > > >>>>>>> > to
>> > > >>>>>>> > > be ignored in many cases. If we do this, we don't even
>> need
>> > the
>> > > >>>>>>> > supportXXX
>> > > >>>>>>> > > method in the Source for this feature. In fact this is
>> > exactly
>> > > >>>>>>> how many
>> > > >>>>>>> > > interfaces works today. For example,
>> > > >>>>>>> SplitEnumerator#addSplitsBack() is
>> > > >>>>>>> > not
>> > > >>>>>>> > > supported by Pravega source because it does not support
>> > partial
>> > > >>>>>>> failover.
>> > > >>>>>>> > > In that case, it simply throws an exception to trigger a
>> > global
>> > > >>>>>>> recovery.
>> > > >>>>>>> > >
>> > > >>>>>>> > > The reason we add a default implementation in this case
>> would
>> > > >>>>>>> just for
>> > > >>>>>>> > the
>> > > >>>>>>> > > sake of backwards compatibility so the old source can
>> still
>> > > >>>>>>> compile.
>> > > >>>>>>> > Sure,
>> > > >>>>>>> > > in short term, this feature might not be supported by many
>> > > >>>>>>> existing
>> > > >>>>>>> > > sources. That is OK, and it is quite visible to the source
>> > > >>>>>>> developers
>> > > >>>>>>> > that
>> > > >>>>>>> > > they did not override the default impl which throws an
>> > > >>>>>>> > > UnsupportedOperationException.
>> > > >>>>>>> > >
>> > > >>>>>>> > > @Dawid,
>> > > >>>>>>> > >
>> > > >>>>>>> > > the Java doc of the SupportXXX() method in the Source
>> would
>> > be
>> > > >>>>>>> the single
>> > > >>>>>>> > > >> source of truth regarding how to implement this
>> feature.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > >
>> > > >>>>>>> > >
>> > > >>>>>>> > > I also don't find it entirely true. Half of the classes
>> are
>> > > >>>>>>> theoretically
>> > > >>>>>>> > > > optional and are utility classes from the point of view
>> how
>> > > the
>> > > >>>>>>> > > interfaces
>> > > >>>>>>> > > > are organized. Theoretically users do not need to use
>> any
>> > of
>> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
>> > their
>> > > >>>>>>> methods in
>> > > >>>>>>> > > the
>> > > >>>>>>> > > > Source interface.
>> > > >>>>>>> > >
>> > > >>>>>>> > > I think the ultimate goal of java docs is to guide users
>> to
>> > > >>>>>>> implement the
>> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way to
>> > implement a
>> > > >>>>>>> > > SourceReader, it seems worth mentioning that. Even the
>> Java
>> > > >>>>>>> language
>> > > >>>>>>> > > documentation interfaces lists the konwn implementations
>> [1]
>> > so
>> > > >>>>>>> people
>> > > >>>>>>> > can
>> > > >>>>>>> > > leverage them. But for this particular case, if we make
>> the
>> > > >>>>>>> feature
>> > > >>>>>>> > > non-optional, we don't even need the supportXXX() method
>> for
>> > > now.
>> > > >>>>>>> > >
>> > > >>>>>>> > > Thanks,
>> > > >>>>>>> > >
>> > > >>>>>>> > > Jiangjie (Becket) Qin
>> > > >>>>>>> > >
>> > > >>>>>>> > >
>> > > >>>>>>> > >
>> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>> > > >>>>>>> dwysakowicz@apache.org>
>> > > >>>>>>> > > wrote:
>> > > >>>>>>> > >
>> > > >>>>>>> > > > Hey Piotr and Becket,
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > First of all, let me say I am happy with whichever
>> option
>> > is
>> > > >>>>>>> agreed in
>> > > >>>>>>> > > the
>> > > >>>>>>> > > > discussion.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > I wanted to clarify a few points from the discussion
>> > though:
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > @Becket:
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > The main argument for adding the methods to the
>> > SourceReader
>> > > >>>>>>> is that
>> > > >>>>>>> > > these
>> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader
>> impl,
>> > > i.e.
>> > > >>>>>>> > starting
>> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are expected
>> to
>> > > >>>>>>> support this
>> > > >>>>>>> > > > method, although some old implementations may not have
>> > > >>>>>>> implemented this
>> > > >>>>>>> > > > feature. I think we should distinguish the new features
>> > from
>> > > >>>>>>> the
>> > > >>>>>>> > optional
>> > > >>>>>>> > > > features. While the public decorative interface is a
>> > solution
>> > > >>>>>>> to the
>> > > >>>>>>> > > > optional features. We should not use it for the features
>> > that
>> > > >>>>>>> are
>> > > >>>>>>> > > > non-optional.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even
>> > though
>> > > >>>>>>> > preferred, I
>> > > >>>>>>> > > > still think it can be simply optional.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the Source
>> would
>> > > be
>> > > >>>>>>> the
>> > > >>>>>>> > single
>> > > >>>>>>> > > > source of truth regarding how to implement this feature.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > I also don't find it entirely true. Half of the classes
>> are
>> > > >>>>>>> > theoretically
>> > > >>>>>>> > > > optional and are utility classes from the point of view
>> how
>> > > the
>> > > >>>>>>> > > interfaces
>> > > >>>>>>> > > > are organized. Theoretically users do not need to use
>> any
>> > of
>> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
>> > their
>> > > >>>>>>> methods in
>> > > >>>>>>> > > the
>> > > >>>>>>> > > > Source interface.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > @Piotr
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > If we have all of the methods with default
>> implementation
>> > in
>> > > >>>>>>> the base
>> > > >>>>>>> > > > interface, the API doesn't give any clue to the user
>> which
>> > > set
>> > > >>>>>>> of
>> > > >>>>>>> > methods
>> > > >>>>>>> > > > are required to be implemented at the same time.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > I feel that no matter which option we choose this can
>> not
>> > be
>> > > >>>>>>> solved
>> > > >>>>>>> > > > entirely in either of the options, because of the point
>> > above
>> > > >>>>>>> and
>> > > >>>>>>> > because
>> > > >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
>> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different
>> > (one
>> > > >>>>>>> identifies
>> > > >>>>>>> > > > splits with splitId the other one passes the splits
>> > > directly).
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Best,
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Dawid
>> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Hi Piotr,
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Thanks for the comment.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Just to clarify, I am not against the decorative
>> > interfaces,
>> > > >>>>>>> but I do
>> > > >>>>>>> > > > think we should use them with caution. The main argument
>> > for
>> > > >>>>>>> adding the
>> > > >>>>>>> > > > methods to the SourceReader is that these methods are
>> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e.
>> > starting
>> > > >>>>>>> from this
>> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to support
>> > this
>> > > >>>>>>> > > > method, although some old implementations may not have
>> > > >>>>>>> implemented this
>> > > >>>>>>> > > > feature. I think we should distinguish the new features
>> > from
>> > > >>>>>>> the
>> > > >>>>>>> > optional
>> > > >>>>>>> > > > features. While the public decorative interface is a
>> > solution
>> > > >>>>>>> to the
>> > > >>>>>>> > > > optional features. We should not use it for the features
>> > that
>> > > >>>>>>> are
>> > > >>>>>>> > > > non-optional.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > That said, this feature is optional for SplitReaders.
>> > > Arguably
>> > > >>>>>>> we can
>> > > >>>>>>> > > have
>> > > >>>>>>> > > > a decorative interface for that, but for simplicity and
>> > > >>>>>>> symmetry of the
>> > > >>>>>>> > > > interface, personally I prefer just adding a new method.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Regarding the advantages you mentioned about the
>> decorative
>> > > >>>>>>> interfaces,
>> > > >>>>>>> > > > they would make sense if:
>> > > >>>>>>> > > > 1. The feature is optional.
>> > > >>>>>>> > > > 2. There is only one decorative interface involved for a
>> > > >>>>>>> feature.
>> > > >>>>>>> > > > Otherwise the argument that all the methods are grouped
>> > > >>>>>>> together will
>> > > >>>>>>> > not
>> > > >>>>>>> > > > stand.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Compared with that, I think the current solution works
>> fine
>> > > in
>> > > >>>>>>> all
>> > > >>>>>>> > cases,
>> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and default
>> > > >>>>>>> methods /
>> > > >>>>>>> > > > decorative interfaces in base interfaces.".
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > The advantages are:
>> > > >>>>>>> > > >> - clean and easy to implement base interface
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > In the current approach, the Java doc of the
>> SupportXXX()
>> > > >>>>>>> method in the
>> > > >>>>>>> > > > Source would be the single source of truth regarding
>> how to
>> > > >>>>>>> implement
>> > > >>>>>>> > > this
>> > > >>>>>>> > > > feature. It lists the method that has to be implemented
>> to
>> > > >>>>>>> support this
>> > > >>>>>>> > > > feature, regardless of how many classes / interfaces are
>> > > >>>>>>> involved.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > When implementing the base interface, users do not need
>> to
>> > > >>>>>>> implement a
>> > > >>>>>>> > > > method with default implementation. If they are curious
>> > what
>> > > >>>>>>> the method
>> > > >>>>>>> > > is
>> > > >>>>>>> > > > for, the java doc of that method simply points users to
>> the
>> > > >>>>>>> > SupportXXX()
>> > > >>>>>>> > > > method in the Source. It seems not adding work to the
>> users
>> > > >>>>>>> compared
>> > > >>>>>>> > with
>> > > >>>>>>> > > > decorative interfaces, but gives much better
>> > discoverability.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > - all of the methods from a single feature are grouped
>> in a
>> > > >>>>>>> single
>> > > >>>>>>> > > >> decorator interface, together with their dedicated java
>> > doc.
>> > > >>>>>>> It's also
>> > > >>>>>>> > > >> easier to google search for help using the decorator
>> name
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > - if an optional feature requires two methods to be
>> > > >>>>>>> implemented at
>> > > >>>>>>> > once,
>> > > >>>>>>> > > >> decorator can guarantee that
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > These two points are not true when multiple components
>> and
>> > > >>>>>>> classes are
>> > > >>>>>>> > > > involved collaboratively to provide a feature. In our
>> case,
>> > > we
>> > > >>>>>>> have
>> > > >>>>>>> > both
>> > > >>>>>>> > > > SourceReader and SplitReader involved. And there might
>> be
>> > > other
>> > > >>>>>>> > > interfaces
>> > > >>>>>>> > > > on the JM side involved for some future features. So the
>> > > >>>>>>> relevant
>> > > >>>>>>> > methods
>> > > >>>>>>> > > > can actually be scattered over the places. That said, we
>> > may
>> > > >>>>>>> still use
>> > > >>>>>>> > > > decorative interfaces for each component, if the
>> feature is
>> > > >>>>>>> optional,
>> > > >>>>>>> > > given
>> > > >>>>>>> > > > there is a single source of truth for the feature.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Here I would strongly lean towards making life easier
>> for
>> > new
>> > > >>>>>>> users,
>> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
>> > for
>> > > >>>>>>> the power
>> > > >>>>>>> > > >> users.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > I actually think the current approach is simpler, more
>> > > >>>>>>> extensible and
>> > > >>>>>>> > > more
>> > > >>>>>>> > > > general for all the users. Can you articulate a bit
>> more on
>> > > >>>>>>> which part
>> > > >>>>>>> > > you
>> > > >>>>>>> > > > think makes users harder to understand?
>> > > >>>>>>> > > >
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > There is another benefit of the decorative interfaces
>> which
>> > > is
>> > > >>>>>>> not
>> > > >>>>>>> > > > mentioned, but might be worth considering here. Usually
>> the
>> > > >>>>>>> decorative
>> > > >>>>>>> > > > interfaces give slightly better backwards compatibility
>> > than
>> > > >>>>>>> the new
>> > > >>>>>>> > > > default method in the interfaces. That is when users are
>> > > using
>> > > >>>>>>> a jar
>> > > >>>>>>> > that
>> > > >>>>>>> > > > was compiled with an older version of Flink which does
>> not
>> > > >>>>>>> have the
>> > > >>>>>>> > > default
>> > > >>>>>>> > > > method in the interfaces in question. A decorative
>> > interface
>> > > >>>>>>> may still
>> > > >>>>>>> > > > provide backwards compatibility in that case, while
>> default
>> > > >>>>>>> method impl
>> > > >>>>>>> > > > cannot.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > I think in Flink we in general do not guarantee custom
>> > > >>>>>>> components
>> > > >>>>>>> > > compiled
>> > > >>>>>>> > > > with an older version can run with a newer version of
>> > Flink.
>> > > A
>> > > >>>>>>> > recompile
>> > > >>>>>>> > > > with a newer version would be required. That said, if
>> we do
>> > > >>>>>>> care about
>> > > >>>>>>> > > > this, we can just change the "supportXXX()" method in
>> the
>> > > >>>>>>> Source
>> > > >>>>>>> > > interface
>> > > >>>>>>> > > > to use decorative interfaces, and leave the other parts
>> > > >>>>>>> unchanged.
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Thanks,
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > Jiangjie (Becket) Qin
>> > > >>>>>>> > > >
>> > > >>>>>>> > > >
>> > > >>>>>>> > > >
>> > > >>>>>>> > > >
>> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>> > > >>>>>>> pnowojski@apache.org>
>> > > >>>>>>> > > > wrote:
>> > > >>>>>>> > > >
>> > > >>>>>>> > > >> Hi,
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for the
>> last
>> > > two
>> > > >>>>>>> weeks.
>> > > >>>>>>> > > >> Regarding the interfaces, I would be actually against
>> > adding
>> > > >>>>>>> those
>> > > >>>>>>> > > methods
>> > > >>>>>>> > > >> to the base interfaces for the reasons mentioned above.
>> > > >>>>>>> Clogging the
>> > > >>>>>>> > > base
>> > > >>>>>>> > > >> interface for new users with tons of methods that they
>> do
>> > > not
>> > > >>>>>>> need, do
>> > > >>>>>>> > > not
>> > > >>>>>>> > > >> understand and do not know what to do with them.
>> Moreover,
>> > > >>>>>>> such
>> > > >>>>>>> > > decorative
>> > > >>>>>>> > > >> interfaces are solving a problem if a feature requires
>> two
>> > > or
>> > > >>>>>>> more
>> > > >>>>>>> > > methods
>> > > >>>>>>> > > >> to be implemented at the same time. If we have all of
>> the
>> > > >>>>>>> methods with
>> > > >>>>>>> > > >> default implementation in the base interface, the API
>> > > doesn't
>> > > >>>>>>> give any
>> > > >>>>>>> > > >> clue
>> > > >>>>>>> > > >> to the user which set of methods are required to be
>> > > >>>>>>> implemented at the
>> > > >>>>>>> > > >> same
>> > > >>>>>>> > > >> time.
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
>> interfaces
>> > is
>> > > >>>>>>> which
>> > > >>>>>>> > > >> interface
>> > > >>>>>>> > > >> > they can decorate and which combinations of multiple
>> > > >>>>>>> decorative
>> > > >>>>>>> > > >> interfaces
>> > > >>>>>>> > > >> > are valid. (...)
>> > > >>>>>>> > > >> > In the future, if there is a new feature added
>> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we
>> > going
>> > > >>>>>>> to create
>> > > >>>>>>> > > >> another
>> > > >>>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
>> > > >>>>>>> > > >> PrePartitionedAware?
>> > > >>>>>>> > > >> > Can they be combined? So I think the additional
>> > decorative
>> > > >>>>>>> interface
>> > > >>>>>>> > > >> like
>> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
>> understanding
>> > > >>>>>>> cost of
>> > > >>>>>>> > users
>> > > >>>>>>> > > >> > because they have to know what decorative interfaces
>> are
>> > > >>>>>>> there,
>> > > >>>>>>> > which
>> > > >>>>>>> > > >> > interface they can decorate and which combinations of
>> > the
>> > > >>>>>>> decorative
>> > > >>>>>>> > > >> > interfaces are valid and which are not. Ideally we
>> want
>> > to
>> > > >>>>>>> avoid
>> > > >>>>>>> > that.
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> I'm not sure if I understand how embedding default
>> methods
>> > > in
>> > > >>>>>>> the base
>> > > >>>>>>> > > >> interface is solving the problem: what can be combined
>> or
>> > > >>>>>>> not? If
>> > > >>>>>>> > there
>> > > >>>>>>> > > >> are
>> > > >>>>>>> > > >> two conflicting features, having decorative interfaces
>> > that
>> > > >>>>>>> can not be
>> > > >>>>>>> > > >> mixed together actually makes much more sense to me
>> rather
>> > > >>>>>>> than having
>> > > >>>>>>> > > >> them
>> > > >>>>>>> > > >> all in one base class. How would you allow users to
>> > > implement
>> > > >>>>>>> only one
>> > > >>>>>>> > > of
>> > > >>>>>>> > > >> those two features?
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
>> > > >>>>>>> > > >> - how can a user discover what decorators are there?
>> > > >>>>>>> > > >> - how can a user know where the decorator can be
>> applied
>> > to?
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> However those are drawbacks for more power users, that
>> can
>> > > be
>> > > >>>>>>> > mitigated
>> > > >>>>>>> > > by
>> > > >>>>>>> > > >> the documentation. For example listing all of the
>> > decorators
>> > > >>>>>>> with
>> > > >>>>>>> > > >> detailed explanation both in the docs and in the java
>> > docs.
>> > > >>>>>>> More
>> > > >>>>>>> > > >> experienced users will be able to deal with those
>> issues
>> > > >>>>>>> easier, as
>> > > >>>>>>> > they
>> > > >>>>>>> > > >> will already have some basic understanding of Flink.
>> Also
>> > if
>> > > >>>>>>> user has
>> > > >>>>>>> > a
>> > > >>>>>>> > > >> problem that he wants to solve, he will google search a
>> > > >>>>>>> potential
>> > > >>>>>>> > > solution
>> > > >>>>>>> > > >> to his problem anyway, and while doing that he is very
>> > > likely
>> > > >>>>>>> to
>> > > >>>>>>> > > discover
>> > > >>>>>>> > > >> the decorator that he needs anyway in the docs.
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> The advantages are:
>> > > >>>>>>> > > >> - clean and easy to implement base interface
>> > > >>>>>>> > > >> - all of the methods from a single feature are grouped
>> in
>> > a
>> > > >>>>>>> single
>> > > >>>>>>> > > >> decorator interface, together with their dedicated java
>> > doc.
>> > > >>>>>>> It's also
>> > > >>>>>>> > > >> easier to google search for help using the decorator
>> name
>> > > >>>>>>> > > >> - if an optional feature requires two methods to be
>> > > >>>>>>> implemented at
>> > > >>>>>>> > once,
>> > > >>>>>>> > > >> decorator can guarantee that
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> Here I would strongly lean towards making life easier
>> for
>> > > new
>> > > >>>>>>> users,
>> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
>> > for
>> > > >>>>>>> the power
>> > > >>>>>>> > > >> users.
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> Best,
>> > > >>>>>>> > > >> Piotrek
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
>> becket.qin@gmail.com
>> > >
>> > > >>>>>>> > napisał(a):
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > I think Sebastion has a good summary. This is a
>> really
>> > > >>>>>>> helpful
>> > > >>>>>>> > > >> discussion.
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might still be
>> > better
>> > > >>>>>>> to add the
>> > > >>>>>>> > > >> > supportsXXX() method in the Source rather than
>> > > SourceReader.
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > Generally speaking, what we are trying to do here is
>> to
>> > > let
>> > > >>>>>>> the
>> > > >>>>>>> > Flink
>> > > >>>>>>> > > >> > framework know what the Source is capable of. In this
>> > > FLIP,
>> > > >>>>>>> it
>> > > >>>>>>> > happens
>> > > >>>>>>> > > >> to
>> > > >>>>>>> > > >> > be the capability that only involves SourceReader.
>> But
>> > in
>> > > >>>>>>> the
>> > > >>>>>>> > future,
>> > > >>>>>>> > > >> it is
>> > > >>>>>>> > > >> > possible that another functionality involves both the
>> > > >>>>>>> > SplitEnumerator
>> > > >>>>>>> > > >> and
>> > > >>>>>>> > > >> > SourceReader. In that case, following the current
>> > > approach,
>> > > >>>>>>> we
>> > > >>>>>>> > should
>> > > >>>>>>> > > >> put
>> > > >>>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator
>> and
>> > > >>>>>>> SourceReader.
>> > > >>>>>>> > > >> > Because if we only put this in the SourceReader, then
>> > the
>> > > >>>>>>> JM would
>> > > >>>>>>> > > have
>> > > >>>>>>> > > >> to
>> > > >>>>>>> > > >> > create a SourceReader in order to know whether this
>> > > feature
>> > > >>>>>>> is
>> > > >>>>>>> > > >> supported,
>> > > >>>>>>> > > >> > which is a little ugly. But if we put the
>> > "supportsXXX()"
>> > > >>>>>>> method in
>> > > >>>>>>> > > the
>> > > >>>>>>> > > >> > Source, we will break the "symmetric" design because
>> > this
>> > > >>>>>>> FLIP
>> > > >>>>>>> > chose a
>> > > >>>>>>> > > >> > different way.
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > This is also why I think supportsXXX() method seems a
>> > good
>> > > >>>>>>> thing to
>> > > >>>>>>> > > >> have,
>> > > >>>>>>> > > >> > because when there are a few interfaces / methods
>> that
>> > are
>> > > >>>>>>> expected
>> > > >>>>>>> > to
>> > > >>>>>>> > > >> be
>> > > >>>>>>> > > >> > implemented at the same time in order to deliver a
>> > > feature,
>> > > >>>>>>> it is
>> > > >>>>>>> > > always
>> > > >>>>>>> > > >> > good to have a single source of truth to tell the
>> > > framework
>> > > >>>>>>> what to
>> > > >>>>>>> > > do,
>> > > >>>>>>> > > >> so
>> > > >>>>>>> > > >> > the framework can do consistent things in different
>> > parts.
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
>> AlignedSourceReader
>> > +
>> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is
>> that we
>> > > are
>> > > >>>>>>> > > essentially
>> > > >>>>>>> > > >> > expecting almost all the SourceReader
>> implementations to
>> > > >>>>>>> extend
>> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
>> > SourceReader
>> > > >>>>>>> interface
>> > > >>>>>>> > > >> > without the pausing support useless. So this
>> indicates
>> > > that
>> > > >>>>>>> public
>> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
>> > > >>>>>>> purpose) only
>> > > >>>>>>> > > >> > make sense if the original interface is also
>> expected to
>> > > be
>> > > >>>>>>> used.
>> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add the
>> method
>> > to
>> > > >>>>>>> the
>> > > >>>>>>> > original
>> > > >>>>>>> > > >> > interface itself.
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > Cheers,
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
>> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >>>>>>> > > >> > wrote:
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > I think most of your points aligned with the
>> > suggestions
>> > > >>>>>>> I made to
>> > > >>>>>>> > > the
>> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I believe
>> we
>> > > hit
>> > > >>>>>>> enter
>> > > >>>>>>> > > >> nearly at
>> > > >>>>>>> > > >> > > the same time ;) )
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Two points after we synced offline
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > 1. I changed also the
>> supportsWatermarksSplitAlignment
>> > > to
>> > > >>>>>>> > > >> > > supportsPausingSplits to express the general
>> > capability
>> > > of
>> > > >>>>>>> > pausing.
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > 2. As for if we should
>> > > >>>>>>> PausingSourceReader/PausingSplitReader
>> > > >>>>>>> > > (option
>> > > >>>>>>> > > >> b)
>> > > >>>>>>> > > >> > > or if we should just add the methods (option c), I
>> > > >>>>>>> suggest to
>> > > >>>>>>> > simply
>> > > >>>>>>> > > >> add
>> > > >>>>>>> > > >> > > the two methods as I felt this is much preferred
>> > > approach
>> > > >>>>>>> Becket,
>> > > >>>>>>> > > >> which
>> > > >>>>>>> > > >> > > others do not object. Unless there is an opposition
>> > > let's
>> > > >>>>>>> go with
>> > > >>>>>>> > > this
>> > > >>>>>>> > > >> > > option c.
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Best,
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Dawid
>> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Hi folks,
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all
>> the
>> > > >>>>>>> input and
>> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time
>> to
>> > > >>>>>>> catch up and
>> > > >>>>>>> > > >> > structure
>> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted to find a
>> way
>> > > to
>> > > >>>>>>> > structure
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > > conclusions. (Also because I had the feeling that
>> some
>> > > >>>>>>> concerns
>> > > >>>>>>> > got
>> > > >>>>>>> > > >> lost
>> > > >>>>>>> > > >> > in
>> > > >>>>>>> > > >> > > the discussion.) This is my attempt and please
>> correct
>> > > me
>> > > >>>>>>> if
>> > > >>>>>>> > > >> something is
>> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect and
>> > assemble
>> > > >>>>>>> the
>> > > >>>>>>> > > opinions,
>> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best of my
>> > > >>>>>>> knowledge):
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > # Top A: Should split alignment (pause/resume
>> > behavior)
>> > > >>>>>>> be a
>> > > >>>>>>> > general
>> > > >>>>>>> > > >> > > capability?
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > I personally don't see any reason no to have it a
>> > > general
>> > > >>>>>>> > capability
>> > > >>>>>>> > > >> > > because for the alignSplit method it is actually
>> > > >>>>>>> independent of
>> > > >>>>>>> > the
>> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a general
>> > > >>>>>>> capability, we
>> > > >>>>>>> > > >> should
>> > > >>>>>>> > > >> > > also agree on the right wording. Does
>> > > >>>>>>> "alignSplits(splitsToResume,
>> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then actually
>> meant?
>> > (I
>> > > >>>>>>> see it as
>> > > >>>>>>> > > >> okay.
>> > > >>>>>>> > > >> > I
>> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid suggested
>> > > >>>>>>> > > >> "pauseOrResumeSplits".)
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > # Top B: Should it be possible do enable/disable
>> split
>> > > >>>>>>> alignment?
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > I would personally not disable the split alignment
>> on
>> > > the
>> > > >>>>>>> source
>> > > >>>>>>> > > >> reader
>> > > >>>>>>> > > >> > > side because if split alignment is used for some
>> other
>> > > >>>>>>> use case
>> > > >>>>>>> > (see
>> > > >>>>>>> > > >> A)
>> > > >>>>>>> > > >> > it
>> > > >>>>>>> > > >> > > could have nasty side effects on other/future use
>> > cases.
>> > > >>>>>>> Instead,
>> > > >>>>>>> > I
>> > > >>>>>>> > > >> would
>> > > >>>>>>> > > >> > > disable "watermark split alignment" where I think
>> it
>> > > >>>>>>> should
>> > > >>>>>>> > disable
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > > watermark-dependent trigger for split alignment.
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > I find it difficult to define the scope of a
>> supportsX
>> > > >>>>>>> method
>> > > >>>>>>> > w.r.t.
>> > > >>>>>>> > > >> to
>> > > >>>>>>> > > >> > > the following questions: a) Where is it used? and
>> b)
>> > > What
>> > > >>>>>>> is the
>> > > >>>>>>> > > >> expected
>> > > >>>>>>> > > >> > > output? To b), it's not straight-forward to
>> provide a
>> > > >>>>>>> meaningful
>> > > >>>>>>> > > >> output,
>> > > >>>>>>> > > >> > > e.g., if SourceReader supports split alignment but
>> > > >>>>>>> SplitReader
>> > > >>>>>>> > not.
>> > > >>>>>>> > > >> This
>> > > >>>>>>> > > >> > is
>> > > >>>>>>> > > >> > > because with the current implementation, we can
>> > > determine
>> > > >>>>>>> whether
>> > > >>>>>>> > > >> split
>> > > >>>>>>> > > >> > > alignment is fully supported only during runtime
>> and
>> > > >>>>>>> specifically
>> > > >>>>>>> > > >> > actually
>> > > >>>>>>> > > >> > > only when calling alignSplits down the call
>> hierarchy
>> > up
>> > > >>>>>>> to the
>> > > >>>>>>> > > actual
>> > > >>>>>>> > > >> > > SplitReaders.
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Therefore, I would suggest to either raise an
>> error or
>> > > >>>>>>> warning if
>> > > >>>>>>> > > the
>> > > >>>>>>> > > >> > > alignment is called but not supported at some
>> point. I
>> > > >>>>>>> know we
>> > > >>>>>>> > > should
>> > > >>>>>>> > > >> > > carefully think about when this could be the case
>> > > because
>> > > >>>>>>> we don't
>> > > >>>>>>> > > >> want
>> > > >>>>>>> > > >> > to
>> > > >>>>>>> > > >> > > flood anybody with such warnings. However, warnings
>> > > could
>> > > >>>>>>> be an
>> > > >>>>>>> > > >> indicator
>> > > >>>>>>> > > >> > > for the user that for watermark split alignment use
>> > case
>> > > >>>>>>> split
>> > > >>>>>>> > > >> reading is
>> > > >>>>>>> > > >> > > imbalanced with the conclusion to either disable
>> the
>> > > >>>>>>> trigger for
>> > > >>>>>>> > > >> > watermark
>> > > >>>>>>> > > >> > > split alignment (see Top B) or to use/implement a
>> > source
>> > > >>>>>>> and
>> > > >>>>>>> > reader
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > > fully supports split alignment.
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Thanks for structuring the discussion with the the
>> > > various
>> > > >>>>>>> > > >> possibilities
>> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I would
>> like to
>> > > >>>>>>> summarize
>> > > >>>>>>> > the
>> > > >>>>>>> > > >> > > following requirements:
>> > > >>>>>>> > > >> > > - Interfaces should be consistent ("symmetric"),
>> i.e.,
>> > > >>>>>>> similar
>> > > >>>>>>> > > >> semantics
>> > > >>>>>>> > > >> > > should have similar interfaces with similar usage.
>> > > >>>>>>> > > >> > > - Make explicit which implementations implement
>> > > >>>>>>> interfaces/support
>> > > >>>>>>> > > >> > > behavior.
>> > > >>>>>>> > > >> > > - Make clear what are default implementations and
>> how
>> > to
>> > > >>>>>>> implement
>> > > >>>>>>> > > >> > > interfaces with desired behavior.
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > This is a simplified view of the relations between
>> > > >>>>>>> relevant
>> > > >>>>>>> > classes
>> > > >>>>>>> > > of
>> > > >>>>>>> > > >> > the
>> > > >>>>>>> > > >> > > PoC implementation:
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase
>> (Internal)
>> > > >>>>>>> <|-- ..
>> > > >>>>>>> > <|--
>> > > >>>>>>> > > >> > > MySourceReader
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal)
>> > <>--
>> > > >>>>>>> > SplitFetcher
>> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
>> > MySplitReader
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a"
>> B)
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
>> > > >>>>>>> implement most
>> > > >>>>>>> > of
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > > "logic" for split alignment just because we wanted
>> to
>> > > >>>>>>> implement
>> > > >>>>>>> > > split
>> > > >>>>>>> > > >> > > alignment and wanted it to be available as kind of
>> a
>> > > >>>>>>> default. As a
>> > > >>>>>>> > > >> > > consequence, we have a "default implementation" for
>> > > >>>>>>> SourceReader
>> > > >>>>>>> > > that
>> > > >>>>>>> > > >> > > implements the actual logic for split alignment.
>> For
>> > > that
>> > > >>>>>>> reason,
>> > > >>>>>>> > I
>> > > >>>>>>> > > >> find
>> > > >>>>>>> > > >> > it
>> > > >>>>>>> > > >> > > very confusing to have a NOOP default
>> implementation
>> > in
>> > > >>>>>>> the
>> > > >>>>>>> > > interface
>> > > >>>>>>> > > >> for
>> > > >>>>>>> > > >> > > the SourceReader. As a consequence, interface
>> strategy
>> > > c)
>> > > >>>>>>> is
>> > > >>>>>>> > > difficult
>> > > >>>>>>> > > >> > > because this would require NOOP default
>> > implementations
>> > > >>>>>>> in the
>> > > >>>>>>> > > public
>> > > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is
>> > the
>> > > >>>>>>> same for
>> > > >>>>>>> > > >> strategy
>> > > >>>>>>> > > >> > > d) because it would require NOOP default
>> > implementation
>> > > >>>>>>> in the
>> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described method
>> > > >>>>>>> signatures of
>> > > >>>>>>> > > >> alignSplit
>> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ and it
>> would
>> > be
>> > > >>>>>>> extremely
>> > > >>>>>>> > > >> > difficult
>> > > >>>>>>> > > >> > > to make the signatures the same (with even
>> potential
>> > > >>>>>>> performance
>> > > >>>>>>> > > >> impact
>> > > >>>>>>> > > >> > > because of additional loop-ups of split ids).
>> > Therefore,
>> > > >>>>>>> having a
>> > > >>>>>>> > > >> > symmetric
>> > > >>>>>>> > > >> > > decorative interface as of strategy a) is actually
>> not
>> > > >>>>>>> possible
>> > > >>>>>>> > and
>> > > >>>>>>> > > >> > having
>> > > >>>>>>> > > >> > > two decorative interfaces with different method
>> > > >>>>>>> signatures is
>> > > >>>>>>> > > >> confusing.
>> > > >>>>>>> > > >> > My
>> > > >>>>>>> > > >> > > conclusion is that we are best with strategy b)
>> which
>> > > >>>>>>> means to
>> > > >>>>>>> > have
>> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit from the
>> > parent
>> > > >>>>>>> > interface:
>> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader
>> > <|--
>> > > >>>>>>> > > >> > AlignedSplitReader
>> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what the
>> > > implications
>> > > >>>>>>> are and
>> > > >>>>>>> > if
>> > > >>>>>>> > > >> this
>> > > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid and I
>> just
>> > > >>>>>>> try to
>> > > >>>>>>> > > >> implement
>> > > >>>>>>> > > >> > and
>> > > >>>>>>> > > >> > > see if we like it. :)
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > # Summary
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > In conclusion, please let me know your
>> perspectives.
>> > > >>>>>>> Please
>> > > >>>>>>> > correct
>> > > >>>>>>> > > >> me,
>> > > >>>>>>> > > >> > if
>> > > >>>>>>> > > >> > > something is wrong or if I misunderstood
>> something. My
>> > > >>>>>>> perspective
>> > > >>>>>>> > > >> would
>> > > >>>>>>> > > >> > be:
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Top A: Yes
>> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
>> > > >>>>>>> alignment)
>> > > >>>>>>> > > >> > > Top C: No
>> > > >>>>>>> > > >> > > Top D: b)
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > Best,
>> > > >>>>>>> > > >> > > Sebastian
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
>> > > >>>>>>> > > >> dwysakowicz@apache.org
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > > wrote:
>> > > >>>>>>> > > >> > >
>> > > >>>>>>> > > >> > >> @Arvid:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
>> > fear
>> > > >>>>>>> that it
>> > > >>>>>>> > > >> doesn't
>> > > >>>>>>> > > >> > work
>> > > >>>>>>> > > >> > >> for this particular use case: Sources can always
>> be
>> > > >>>>>>> aligned
>> > > >>>>>>> > > >> cross-task
>> > > >>>>>>> > > >> > and
>> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
>> > > >>>>>>> plausible to put
>> > > >>>>>>> > > >> > sources
>> > > >>>>>>> > > >> > >> into an alignment group even though they do not
>> use
>> > any
>> > > >>>>>>> of the
>> > > >>>>>>> > > >> presented
>> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>> warning, if
>> > > >>>>>>> they handle
>> > > >>>>>>> > > >> > multiple
>> > > >>>>>>> > > >> > >> splits (see motivation section).
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for
>> telling
>> > > if
>> > > >>>>>>> it
>> > > >>>>>>> > > supports
>> > > >>>>>>> > > >> > that
>> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment would
>> > always
>> > > >>>>>>> be
>> > > >>>>>>> > > supported.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I updated interfaces to what I believe to be
>> closest
>> > > to a
>> > > >>>>>>> > consensus
>> > > >>>>>>> > > >> > >> between all participants. Do you mind taking a
>> look?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Dawid
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree
>> that
>> > > >>>>>>> decorator
>> > > >>>>>>> > > >> interfaces
>> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big fan of
>> > > >>>>>>> overloading the
>> > > >>>>>>> > > base
>> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as is). The
>> > > usual
>> > > >>>>>>> feedback
>> > > >>>>>>> > > to
>> > > >>>>>>> > > >> > >> Source-related interfaces are always that they are
>> > > >>>>>>> overwhelming
>> > > >>>>>>> > and
>> > > >>>>>>> > > >> too
>> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not oppose
>> 1c)
>> > as
>> > > >>>>>>> scattered
>> > > >>>>>>> > > >> > interfaces
>> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
>> > > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
>> > fear
>> > > >>>>>>> that it
>> > > >>>>>>> > > >> doesn't
>> > > >>>>>>> > > >> > work
>> > > >>>>>>> > > >> > >> for this particular use case: Sources can always
>> be
>> > > >>>>>>> aligned
>> > > >>>>>>> > > >> cross-task
>> > > >>>>>>> > > >> > and
>> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
>> > > >>>>>>> plausible to put
>> > > >>>>>>> > > >> > sources
>> > > >>>>>>> > > >> > >> into an alignment group even though they do not
>> use
>> > any
>> > > >>>>>>> of the
>> > > >>>>>>> > > >> presented
>> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a
>> warning, if
>> > > >>>>>>> they handle
>> > > >>>>>>> > > >> > multiple
>> > > >>>>>>> > > >> > >> splits (see motivation section).
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate future
>> use
>> > > >>>>>>> cases makes
>> > > >>>>>>> > > >> sense
>> > > >>>>>>> > > >> > but
>> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be
>> > > adjusted
>> > > >>>>>>> > > >> accordingly.
>> > > >>>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader
>> and
>> > > I'd
>> > > >>>>>>> go for
>> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We
>> > could
>> > > >>>>>>> also
>> > > >>>>>>> > split
>> > > >>>>>>> > > it
>> > > >>>>>>> > > >> > into
>> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
>> pauseOrResumeSplits
>> > > >>>>>>> may allow
>> > > >>>>>>> > > >> Sources
>> > > >>>>>>> > > >> > to
>> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as written
>> in
>> > > the
>> > > >>>>>>> > Javadoc),
>> > > >>>>>>> > > >> both
>> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if
>> > there
>> > > >>>>>>> is a
>> > > >>>>>>> > system
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >> can.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Some nit for the FLIP:
>> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
>> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the capability
>> > section:
>> > > >>>>>>> Sources
>> > > >>>>>>> > that
>> > > >>>>>>> > > >> > adopt
>> > > >>>>>>> > > >> > >> this interface cannot be used in earlier
>> versions. So
>> > > it
>> > > >>>>>>> feels
>> > > >>>>>>> > like
>> > > >>>>>>> > > >> we
>> > > >>>>>>> > > >> > are
>> > > >>>>>>> > > >> > >> only forward compatible (old sources can be used
>> > after
>> > > >>>>>>> the
>> > > >>>>>>> > change);
>> > > >>>>>>> > > >> but
>> > > >>>>>>> > > >> > I
>> > > >>>>>>> > > >> > >> guess this holds for any API addition.
>> > > >>>>>>> > > >> > >> - You might want to add what happens when all
>> splits
>> > > are
>> > > >>>>>>> paused.
>> > > >>>>>>> > > >> > >> - You may want to describe how the 3 flavors of
>> > > >>>>>>> SourceReaderBase
>> > > >>>>>>> > > >> > interact
>> > > >>>>>>> > > >> > >> with the interface.
>> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka
>> and
>> > > >>>>>>> Pulsar in
>> > > >>>>>>> > the
>> > > >>>>>>> > > >> > FLIP.
>> > > >>>>>>> > > >> > >> For me, this is rather immediate follow-up work.
>> > (could
>> > > >>>>>>> be in the
>> > > >>>>>>> > > >> same
>> > > >>>>>>> > > >> > >> umbrella ticket)
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Arvid
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz
>> <
>> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>> > > >>>>>>> > > >> > >> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>> > > >>>>>>> WithSplitsAlignment",
>> > > >>>>>>> > > >> along
>> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>> > > >>>>>>> WithSplitsAlignment",
>> > > >>>>>>> > > or
>> > > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
>> > and
>> > > >>>>>>> > > "MySplitReader
>> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>> > > >>>>>>> "MySplitReader
>> > > >>>>>>> > > >> > implements
>> > > >>>>>>> > > >> > >> SplitReader".
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
>> would
>> > > be:
>> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>> > > >>>>>>> "MySplitReader
>> > > >>>>>>> > > >> > implements
>> > > >>>>>>> > > >> > >> AlignedSplitReader".
>> > > >>>>>>> > > >> > >> I am fine with this API, although personally
>> > speaking I
>> > > >>>>>>> think it
>> > > >>>>>>> > is
>> > > >>>>>>> > > >> > simpler
>> > > >>>>>>> > > >> > >> to just add a new method to the split reader with
>> > > >>>>>>> default impl.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I think that is a good idea to have it aligned as
>> > much
>> > > as
>> > > >>>>>>> > possible.
>> > > >>>>>>> > > >> I'd
>> > > >>>>>>> > > >> > be
>> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
>> > AlignedSplitReader
>> > > >>>>>>> with
>> > > >>>>>>> > > >> > SplitReader. We
>> > > >>>>>>> > > >> > >> will update the FLIP shortly.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Dawid
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Can the interface be used to align other things in
>> > the
>> > > >>>>>>> future?
>> > > >>>>>>> > For
>> > > >>>>>>> > > >> > example,
>> > > >>>>>>> > > >> > >> align read speed, I have
>> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
>> > feature
>> > > >>>>>>> may also
>> > > >>>>>>> > > need
>> > > >>>>>>> > > >> an
>> > > >>>>>>> > > >> > >> interface like this.
>> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
>> support
>> > > >>>>>>> align other
>> > > >>>>>>> > > >> > things, I
>> > > >>>>>>> > > >> > >> suggest explicitly declaring
>> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>> > > >>>>>>> `alignWatermarksForSplits`
>> > > >>>>>>> > > >> instead
>> > > >>>>>>> > > >> > of
>> > > >>>>>>> > > >> > >> `alignSplits`.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it would
>> usually
>> > be
>> > > >>>>>>> more
>> > > >>>>>>> > > >> extensible
>> > > >>>>>>> > > >> > to
>> > > >>>>>>> > > >> > >> just describe what the method actually does,
>> instead
>> > of
>> > > >>>>>>> assuming
>> > > >>>>>>> > > the
>> > > >>>>>>> > > >> > >> purpose of doing this. For example, in this case,
>> > > >>>>>>> > > >> pauseOrResumeSplits()
>> > > >>>>>>> > > >> > >> would be more extensible because this can be used
>> for
>> > > >>>>>>> any kind of
>> > > >>>>>>> > > >> flow
>> > > >>>>>>> > > >> > >> control, be it watermark alignment or simple rate
>> > > >>>>>>> limiting.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 4. Interface or Method.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think
>> they
>> > > have
>> > > >>>>>>> their
>> > > >>>>>>> > own
>> > > >>>>>>> > > >> > >> advantages.
>> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
>> extending
>> > > >>>>>>> abilities
>> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in
>> this
>> > > >>>>>>> case. When
>> > > >>>>>>> > you
>> > > >>>>>>> > > >> have
>> > > >>>>>>> > > >> > a
>> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
>> > > >>>>>>> organize them
>> > > >>>>>>> > and
>> > > >>>>>>> > > >> make
>> > > >>>>>>> > > >> > >> users clear which methods
>> > > >>>>>>> > > >> > >> need to implement when you want to have an
>> ability.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if this is a
>> > general
>> > > >>>>>>> design
>> > > >>>>>>> > > >> pattern
>> > > >>>>>>> > > >> > in
>> > > >>>>>>> > > >> > >> the other components in Flink. But it looks like
>> the
>> > > >>>>>>> current API
>> > > >>>>>>> > > >> > proposal
>> > > >>>>>>> > > >> > >> is not symmetric.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> The current proposal is essentially
>> "MySourceReader
>> > > >>>>>>> implements
>> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
>> > > >>>>>>> "MySplitReader
>> > > >>>>>>> > > >> implements
>> > > >>>>>>> > > >> > >> AlignedSplitsReader".
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd consider any
>> > one
>> > > >>>>>>> of the
>> > > >>>>>>> > > >> following
>> > > >>>>>>> > > >> > as
>> > > >>>>>>> > > >> > >> symmetric.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>> > > >>>>>>> WithSplitsAlignment",
>> > > >>>>>>> > > >> along
>> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>> > > >>>>>>> WithSplitsAlignment",
>> > > >>>>>>> > > or
>> > > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
>> > and
>> > > >>>>>>> > > "MySplitReader
>> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
>> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>> > > >>>>>>> "MySplitReader
>> > > >>>>>>> > > >> > implements
>> > > >>>>>>> > > >> > >> SplitReader".
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
>> would
>> > > be:
>> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>> > > >>>>>>> "MySplitReader
>> > > >>>>>>> > > >> > implements
>> > > >>>>>>> > > >> > >> AlignedSplitReader".
>> > > >>>>>>> > > >> > >> I am fine with this API, although personally
>> > speaking I
>> > > >>>>>>> think it
>> > > >>>>>>> > is
>> > > >>>>>>> > > >> > simpler
>> > > >>>>>>> > > >> > >> to just add a new method to the split reader with
>> > > >>>>>>> default impl.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
>> > > >>>>>>> > dwysakowicz@apache.org
>> > > >>>>>>> > > >
>> > > >>>>>>> > > >> <
>> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>,
>> > thanks
>> > > >>>>>>> for the
>> > > >>>>>>> > > reply.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference
>> and I
>> > > >>>>>>> agree most
>> > > >>>>>>> > of
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
>> > following
>> > > >>>>>>> your
>> > > >>>>>>> > > >> suggestion
>> > > >>>>>>> > > >> > to
>> > > >>>>>>> > > >> > >> have the SourceReader extending from
>> > > >>>>>>> > WithWatermarksSplitsAlignment,
>> > > >>>>>>> > > >> but
>> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
>> Source
>> > to
>> > > >>>>>>> keep the
>> > > >>>>>>> > > two
>> > > >>>>>>> > > >> > >> methods together.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source
>> is
>> > > >>>>>>> that this
>> > > >>>>>>> > > allows
>> > > >>>>>>> > > >> > some
>> > > >>>>>>> > > >> > >> compile time check. For example, if a user enabled
>> > > >>>>>>> watermark
>> > > >>>>>>> > > >> alignment
>> > > >>>>>>> > > >> > >> while it is not supported by the Source, an
>> exception
>> > > >>>>>>> can be
>> > > >>>>>>> > thrown
>> > > >>>>>>> > > >> at
>> > > >>>>>>> > > >> > >> compile time. It seems in general useful. That
>> said,
>> > I
>> > > >>>>>>> agree that
>> > > >>>>>>> > > API
>> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the two
>> methods
>> > > >>>>>>> together.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
>> > > >>>>>>> imjark@gmail.com> <
>> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
>> imjark@gmail.com>
>> > > >>>>>>> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I
>> did
>> > > >>>>>>> miss it.
>> > > >>>>>>> > > >> > >> I don't have other concerns then.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >> Jark
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
>> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
>> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >>>>>>> > > >> > >> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Jark:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
>> > when
>> > > >>>>>>> the
>> > > >>>>>>> > source
>> > > >>>>>>> > > >> > >> implements the interface?
>> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>> alignment
>> > > even
>> > > >>>>>>> if Kafka
>> > > >>>>>>> > > >> > >> implements the interface,
>> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
>> agree
>> > > >>>>>>> with Becket
>> > > >>>>>>> > > >> > >> we may need a
>> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
>> > > configure
>> > > >>>>>>> the
>> > > >>>>>>> > source
>> > > >>>>>>> > > >> to
>> > > >>>>>>> > > >> > >> enable/disable the alignment.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>> > > maxDesiredWatermark?
>> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
>> > > >>>>>>> greatly affect
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> throughput
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> if the reader is constantly
>> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
>> > > configure
>> > > >>>>>>> the
>> > > >>>>>>> > > >> alignment
>> > > >>>>>>> > > >> > >> offset?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1] which has
>> > been
>> > > >>>>>>> already
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> implemented
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the
>> > > watermark
>> > > >>>>>>> strategy
>> > > >>>>>>> > > >> which
>> > > >>>>>>> > > >> > >> also configures drift and update interval.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
>> support
>> > > >>>>>>> align other
>> > > >>>>>>> > > >> things,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> suggest explicitly declaring
>> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>> > > >>>>>>> `alignWatermarksForSplits`
>> > > >>>>>>> > > >> instead
>> > > >>>>>>> > > >> > of
>> > > >>>>>>> > > >> > >> `alignSplits`.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Sure let's rename it.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Becket:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I understand your point. On the other hand putting
>> > all
>> > > >>>>>>> methods,
>> > > >>>>>>> > > even
>> > > >>>>>>> > > >> > with
>> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain
>> features,
>> > > >>>>>>> makes the
>> > > >>>>>>> > > entry
>> > > >>>>>>> > > >> > >> threshold for writing a new source higher.
>> Instead of
>> > > >>>>>>> focusing on
>> > > >>>>>>> > > the
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> basic
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> and required properties of the Source, the person
>> > > >>>>>>> implementing a
>> > > >>>>>>> > > >> source
>> > > >>>>>>> > > >> > >> must bother with and need to figure out what all
>> of
>> > the
>> > > >>>>>>> extra
>> > > >>>>>>> > > >> features
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> are
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> about and how to deal with them. It makes it also
>> > > harder
>> > > >>>>>>> to
>> > > >>>>>>> > > organize
>> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference
>> and I
>> > > >>>>>>> agree most
>> > > >>>>>>> > of
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
>> > following
>> > > >>>>>>> your
>> > > >>>>>>> > > >> suggestion
>> > > >>>>>>> > > >> > to
>> > > >>>>>>> > > >> > >> have the SourceReader extending from
>> > > >>>>>>> > WithWatermarksSplitsAlignment,
>> > > >>>>>>> > > >> but
>> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
>> Source
>> > to
>> > > >>>>>>> keep the
>> > > >>>>>>> > > two
>> > > >>>>>>> > > >> > >> methods together.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
>> > > >>>>>>> "alignSplits"
>> > > >>>>>>> > methods
>> > > >>>>>>> > > >> > differ
>> > > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The
>> reason
>> > > for
>> > > >>>>>>> that is
>> > > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
>> > > >>>>>>> SplitReader
>> > > >>>>>>> > > needs
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
>> discrepancy
>> > > >>>>>>> acceptable
>> > > >>>>>>> > for
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > >> sake of simplifying changes significantly,
>> especially
>> > > as
>> > > >>>>>>> they
>> > > >>>>>>> > would
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> highly
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> likely impact performance as we would have to
>> perform
>> > > >>>>>>> additional
>> > > >>>>>>> > > >> > lookups.
>> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Dawid
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I missed
>> > > >>>>>>> something).
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
>> > when
>> > > >>>>>>> the
>> > > >>>>>>> > source
>> > > >>>>>>> > > >> > >> implements the interface?
>> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark
>> alignment
>> > > even
>> > > >>>>>>> if Kafka
>> > > >>>>>>> > > >> > >> implements the interface,
>> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
>> agree
>> > > >>>>>>> with Becket
>> > > >>>>>>> > > >> > >> we may need a
>> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
>> > > configure
>> > > >>>>>>> the
>> > > >>>>>>> > source
>> > > >>>>>>> > > >> to
>> > > >>>>>>> > > >> > >> enable/disable the alignment.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 2. How does the framework calculate
>> > > maxDesiredWatermark?
>> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
>> > > >>>>>>> greatly affect
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> throughput
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> if the reader is constantly
>> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
>> > > configure
>> > > >>>>>>> the
>> > > >>>>>>> > > >> alignment
>> > > >>>>>>> > > >> > >> offset?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
>> > > >>>>>>> > > >> > >> Can the interface be used to align other things in
>> > the
>> > > >>>>>>> future?
>> > > >>>>>>> > For
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> example,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> align read speed, I have
>> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
>> > feature
>> > > >>>>>>> may also
>> > > >>>>>>> > > need
>> > > >>>>>>> > > >> an
>> > > >>>>>>> > > >> > >> interface like this.
>> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
>> support
>> > > >>>>>>> align other
>> > > >>>>>>> > > >> things,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> suggest explicitly declaring
>> > > >>>>>>> > > >> > >> the purpose of the methods, such as
>> > > >>>>>>> `alignWatermarksForSplits`
>> > > >>>>>>> > > >> instead
>> > > >>>>>>> > > >> > of
>> > > >>>>>>> > > >> > >> `alignSplits`.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 4. Interface or Method.
>> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think
>> they
>> > > have
>> > > >>>>>>> their
>> > > >>>>>>> > own
>> > > >>>>>>> > > >> > >> advantages.
>> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
>> extending
>> > > >>>>>>> abilities
>> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in
>> this
>> > > >>>>>>> case. When
>> > > >>>>>>> > you
>> > > >>>>>>> > > >> have
>> > > >>>>>>> > > >> > a
>> > > >>>>>>> > > >> > >> bunch of abilities and each ability
>> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
>> > > >>>>>>> organize them
>> > > >>>>>>> > and
>> > > >>>>>>> > > >> make
>> > > >>>>>>> > > >> > >> users clear which methods
>> > > >>>>>>> > > >> > >> need to implement when you want to have an
>> ability.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >> Jark
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>> > > >>>>>>> becket.qin@gmail.com>
>> > > >>>>>>> > <
>> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
>> > > >>>>>>> becket.qin@gmail.com>
>> > > >>>>>>> > <
>> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Hi Dawid,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies that I
>> somehow
>> > > >>>>>>> misread a
>> > > >>>>>>> > > bunch
>> > > >>>>>>> > > >> of
>> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Regarding 1, by default implementation, I was
>> > thinking
>> > > >>>>>>> of the
>> > > >>>>>>> > > default
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> no-op
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> implementation. I am a little worried about the
>> > > >>>>>>> proliferation of
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> decorative
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> interfaces. I think the most important thing about
>> > > >>>>>>> interfaces is
>> > > >>>>>>> > > that
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> they
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> are easy to understand. In this case, I prefer
>> adding
>> > > >>>>>>> new method
>> > > >>>>>>> > to
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > >> existing interface for the following reasons:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of decorative
>> > interfaces
>> > > >>>>>>> is which
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> interface
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> they can decorate and which combinations of
>> multiple
>> > > >>>>>>> decorative
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> interfaces
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
>> > withSplitsAlignment
>> > > >>>>>>> interface
>> > > >>>>>>> > > is
>> > > >>>>>>> > > >> > only
>> > > >>>>>>> > > >> > >> applicable to the SourceReader which means it
>> can't
>> > > >>>>>>> decorate any
>> > > >>>>>>> > > >> other
>> > > >>>>>>> > > >> > >> interface. From an interface design perspective, a
>> > > >>>>>>> natural
>> > > >>>>>>> > question
>> > > >>>>>>> > > >> is
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> why
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
>> > > >>>>>>> "withSplitsAlignment"? And it
>> > > >>>>>>> > > is
>> > > >>>>>>> > > >> > also
>> > > >>>>>>> > > >> > >> natural to assume that a split reader implementing
>> > both
>> > > >>>>>>> > SplitReader
>> > > >>>>>>> > > >> and
>> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a source
>> > reader
>> > > >>>>>>> > implementing
>> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why
>> > > isn't
>> > > >>>>>>> there an
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> interface
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if there
>> is a
>> > > new
>> > > >>>>>>> feature
>> > > >>>>>>> > > >> added
>> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are
>> we
>> > > >>>>>>> going to
>> > > >>>>>>> > create
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> another
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> interface of SplitReader such as
>> SortedSplitReader or
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> PrePartitionedAware?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Can they be combined? So I think the additional
>> > > >>>>>>> decorative
>> > > >>>>>>> > > interface
>> > > >>>>>>> > > >> > like
>> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
>> > > understanding
>> > > >>>>>>> cost of
>> > > >>>>>>> > > >> users
>> > > >>>>>>> > > >> > >> because they have to know what decorative
>> interfaces
>> > > are
>> > > >>>>>>> there,
>> > > >>>>>>> > > which
>> > > >>>>>>> > > >> > >> interface they can decorate and which
>> combinations of
>> > > the
>> > > >>>>>>> > > decorative
>> > > >>>>>>> > > >> > >> interfaces are valid and which are not. Ideally we
>> > want
>> > > >>>>>>> to avoid
>> > > >>>>>>> > > >> that.
>> > > >>>>>>> > > >> > To
>> > > >>>>>>> > > >> > >> be clear, I am not opposing having an interface of
>> > > >>>>>>> > > >> withSplitsAlignment,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> it
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> is completely OK to have it as an internal
>> interface
>> > > and
>> > > >>>>>>> let
>> > > >>>>>>> > > >> > SourceReader
>> > > >>>>>>> > > >> > >> and SplitReader both extend it.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader with a
>> > > default
>> > > >>>>>>> > > >> implementation
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> of
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> no-op would help avoid logic branching in the
>> source
>> > > >>>>>>> logic,
>> > > >>>>>>> > > >> especially
>> > > >>>>>>> > > >> > >> given that we agree that the vast majority of the
>> > > >>>>>>> SourceReader
>> > > >>>>>>> > > >> > >> implementations, if not all, would just extend
>> from
>> > the
>> > > >>>>>>> > > >> > SourceReaderBase.
>> > > >>>>>>> > > >> > >> That means adding a new method to the interface
>> would
>> > > >>>>>>> effectively
>> > > >>>>>>> > > >> give
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> the
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> same user experience, but simpler.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> c) A related design principle that may be worth
>> > > >>>>>>> discussing is how
>> > > >>>>>>> > > do
>> > > >>>>>>> > > >> we
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> let
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> the Source implementations tell Flink what
>> capability
>> > > is
>> > > >>>>>>> > supported
>> > > >>>>>>> > > >> and
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> what
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the most
>> intuitive
>> > > >>>>>>> place to me
>> > > >>>>>>> > > is
>> > > >>>>>>> > > >> in
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> the
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Source itself, because that is the entrance of the
>> > > >>>>>>> entire Source
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> connector
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> logic.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
>> > > >>>>>>> following
>> > > >>>>>>> > > >> interface
>> > > >>>>>>> > > >> > >> would be easier to understand by the users.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal
>> interface,
>> > > >>>>>>> let both
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> SourceReader
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a default
>> no-op
>> > > >>>>>>> > > implementation.
>> > > >>>>>>> > > >> > >> - Add a new method "boolean
>> supportSplitsAlignment()"
>> > > to
>> > > >>>>>>> the
>> > > >>>>>>> > Source
>> > > >>>>>>> > > >> > >> interface, with a default implementation returning
>> > > >>>>>>> false. Sources
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> have
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> implemented the alignment logic can change this to
>> > > >>>>>>> return true,
>> > > >>>>>>> > and
>> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
>> > SourceReader
>> > > /
>> > > >>>>>>> > > SplitReader
>> > > >>>>>>> > > >> if
>> > > >>>>>>> > > >> > >> needed.
>> > > >>>>>>> > > >> > >> - In the future, if a new optional feature is
>> going
>> > to
>> > > >>>>>>> be added
>> > > >>>>>>> > to
>> > > >>>>>>> > > >> the
>> > > >>>>>>> > > >> > >> Source, and that feature requires the awareness
>> from
>> > > >>>>>>> Flink, we
>> > > >>>>>>> > can
>> > > >>>>>>> > > >> add
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> more
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> such methods to the Source.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> What do you think?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
>> > > >>>>>>> > > >> dwysakowicz@apache.org>
>> > > >>>>>>> > > >> > <dw...@apache.org>
>> > > >>>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org>
>> <
>> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Konstantin:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
>> > > interface
>> > > >>>>>>> (aka the
>> > > >>>>>>> > > >> stop &
>> > > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
>> > > >>>>>>> Pulsar only,
>> > > >>>>>>> > > >> correct?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Correct, as far as I know though, those are the
>> only
>> > > >>>>>>> sources
>> > > >>>>>>> > which
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> consume
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> concurrently from multiple splits and thus
>> alignment
>> > > >>>>>>> applies.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Thomas:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>> > > >>>>>>> SourceReaderBase and
>> > > >>>>>>> > > then
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> doing
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
>> > > >>>>>>> AlignedSplitReader
>> > > >>>>>>> > > >> could
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> be
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
>> instead
>> > be
>> > > >>>>>>> added to
>> > > >>>>>>> > the
>> > > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
>> > make
>> > > >>>>>>> it
>> > > >>>>>>> > explicit
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >> the source actually supports it.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we could
>> > > >>>>>>> actually do
>> > > >>>>>>> > that.
>> > > >>>>>>> > > >> Given
>> > > >>>>>>> > > >> > >> the actual implementation of the
>> > > >>>>>>> SourceReaderBase#alignSplits is
>> > > >>>>>>> > > >> rather
>> > > >>>>>>> > > >> > >> short (just a forward to the corresponding method
>> of
>> > > >>>>>>> > SplitFetcher),
>> > > >>>>>>> > > >> we
>> > > >>>>>>> > > >> > >> could reimplement it in the actual source
>> > > >>>>>>> implementations. This
>> > > >>>>>>> > > >> solution
>> > > >>>>>>> > > >> > >> has the downside though. Authors of new sources
>> would
>> > > >>>>>>> have to do
>> > > >>>>>>> > > two
>> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and
>> implement
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> WithSplitsAssignment,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> instead of just extending AlignedSplitReader. I
>> would
>> > > be
>> > > >>>>>>> fine
>> > > >>>>>>> > with
>> > > >>>>>>> > > >> such
>> > > >>>>>>> > > >> > a
>> > > >>>>>>> > > >> > >> tradeoff though. What others think?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Steven:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> For this part from the motivation section, is it
>> > > >>>>>>> accurate? Let's
>> > > >>>>>>> > > >> assume
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> one
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of
>> the
>> > > >>>>>>> partition
>> > > >>>>>>> > is
>> > > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
>> > for
>> > > >>>>>>> this
>> > > >>>>>>> > source
>> > > >>>>>>> > > >> task
>> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
>> > other
>> > > >>>>>>> two Kafka
>> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
>> > overall
>> > > >>>>>>> > watermark.
>> > > >>>>>>> > > I
>> > > >>>>>>> > > >> > >> thought the problem is that we may have late data
>> for
>> > > >>>>>>> this slow
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> partition.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> It will hold back the watermark. Watermark of an
>> > > >>>>>>> operator is the
>> > > >>>>>>> > > >> minimum
>> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I have another question about the restart. Say
>> split
>> > > >>>>>>> alignment is
>> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>> > > >>>>>>> restored from
>> > > >>>>>>> > > the
>> > > >>>>>>> > > >> > last
>> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
>> > > >>>>>>> checkpointed,
>> > > >>>>>>> > > initially
>> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle
>> of
>> > > >>>>>>> watermark
>> > > >>>>>>> > > >> > aggregation
>> > > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is
>> a
>> > > >>>>>>> problem. Just
>> > > >>>>>>> > > >> want
>> > > >>>>>>> > > >> > to
>> > > >>>>>>> > > >> > >> understand it more.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Your understanding is correct.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> @Becket:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
>> general
>> > use
>> > > >>>>>>> case, so
>> > > >>>>>>> > > >> should
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> we
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
>> directly
>> > > >>>>>>> instead of
>> > > >>>>>>> > > >> > >> introducing the new interface of
>> WithSplitAssignment?
>> > > We
>> > > >>>>>>> can
>> > > >>>>>>> > > provide
>> > > >>>>>>> > > >> > >> default implementations, so backwards
>> compatibility
>> > > >>>>>>> won't be an
>> > > >>>>>>> > > >> issue.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I don't think we can provide a default
>> > implementation.
>> > > >>>>>>> How would
>> > > >>>>>>> > we
>> > > >>>>>>> > > >> do
>> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it better than
>> > > having
>> > > >>>>>>> an
>> > > >>>>>>> > opt-in
>> > > >>>>>>> > > >> > >> interface? The default implementation would have
>> to
>> > be
>> > > >>>>>>> added
>> > > >>>>>>> > > >> exclusively
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> in
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the way
>> notice
>> > > >>>>>>> > > SourceReaderBase
>> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
>> effectively
>> > > all
>> > > >>>>>>> > > >> implementations
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> do
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I think
>> it is
>> > > >>>>>>> impossible
>> > > >>>>>>> > to
>> > > >>>>>>> > > >> > >> implement the SourceReader interface directly by
>> end
>> > > >>>>>>> users.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
>> > probably
>> > > >>>>>>> also
>> > > >>>>>>> > needs
>> > > >>>>>>> > > >> some
>> > > >>>>>>> > > >> > >> change to support throttling at the split
>> > granularity.
>> > > >>>>>>> Can you
>> > > >>>>>>> > add
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >> interface change into the public interface
>> section as
>> > > >>>>>>> well?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> It has been added from the beginning. See
>> > > >>>>>>> *AlignedSplitReader.*
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
>> > assignSplits
>> > > >>>>>>> here,
>> > > >>>>>>> > given
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> it
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> is not actually changing the split assignments? It
>> > > seems
>> > > >>>>>>> > something
>> > > >>>>>>> > > >> like
>> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>> adjustSplitsThrottling() is
>> > > >>>>>>> more
>> > > >>>>>>> > > accurate.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do
>> you
>> > > >>>>>>> still
>> > > >>>>>>> > prefer
>> > > >>>>>>> > > a
>> > > >>>>>>> > > >> > >> different name for that? Personally, I am open for
>> > > >>>>>>> suggestions
>> > > >>>>>>> > > here.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Dawid
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> [1]
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >>
>> > > >>>>>>> > >
>> > > >>>>>>> >
>> > > >>>>>>>
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think
>> the
>> > > >>>>>>> motivation
>> > > >>>>>>> > > >> makes a
>> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a
>> general
>> > use
>> > > >>>>>>> case, so
>> > > >>>>>>> > > >> should
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> we
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
>> directly
>> > > >>>>>>> instead of
>> > > >>>>>>> > > >> > >> introducing the new interface of
>> WithSplitAssignment?
>> > > We
>> > > >>>>>>> can
>> > > >>>>>>> > > provide
>> > > >>>>>>> > > >> > >> default implementations, so backwards
>> compatibility
>> > > >>>>>>> won't be an
>> > > >>>>>>> > > >> issue.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
>> > probably
>> > > >>>>>>> also
>> > > >>>>>>> > needs
>> > > >>>>>>> > > >> some
>> > > >>>>>>> > > >> > >> change to support throttling at the split
>> > granularity.
>> > > >>>>>>> Can you
>> > > >>>>>>> > add
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >> interface change into the public interface
>> section as
>> > > >>>>>>> well?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
>> > assignSplits
>> > > >>>>>>> here,
>> > > >>>>>>> > given
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> it
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> is not actually changing the split assignments? It
>> > > seems
>> > > >>>>>>> > something
>> > > >>>>>>> > > >> like
>> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or
>> adjustSplitsThrottling() is
>> > > >>>>>>> more
>> > > >>>>>>> > > accurate.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
>> > > >>>>>>> stevenz3wu@gmail.com
>> > > >>>>>>> > >
>> > > >>>>>>> > > <
>> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
>> > > >>>>>>> stevenz3wu@gmail.com>
>> > > >>>>>>> > <
>> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
>> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> However, a single source operator may read data
>> from
>> > > >>>>>>> multiple
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka
>> partitions,
>> > > such
>> > > >>>>>>> that
>> > > >>>>>>> > even
>> > > >>>>>>> > > >> with
>> > > >>>>>>> > > >> > >> watermark alignment the source operator may need
>> to
>> > > >>>>>>> buffer
>> > > >>>>>>> > > excessive
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> amount
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> of data if one split emits data faster than
>> another.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> For this part from the motivation section, is it
>> > > >>>>>>> accurate? Let's
>> > > >>>>>>> > > >> assume
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> one
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of
>> the
>> > > >>>>>>> partition
>> > > >>>>>>> > is
>> > > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
>> > for
>> > > >>>>>>> this
>> > > >>>>>>> > source
>> > > >>>>>>> > > >> task
>> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
>> > other
>> > > >>>>>>> two Kafka
>> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
>> > overall
>> > > >>>>>>> > watermark.
>> > > >>>>>>> > > I
>> > > >>>>>>> > > >> > >> thought the problem is that we may have late data
>> for
>> > > >>>>>>> this slow
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> partition.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I have another question about the restart. Say
>> split
>> > > >>>>>>> alignment is
>> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>> > > >>>>>>> restored from
>> > > >>>>>>> > > the
>> > > >>>>>>> > > >> > last
>> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
>> > > >>>>>>> checkpointed,
>> > > >>>>>>> > > initially
>> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle
>> of
>> > > >>>>>>> watermark
>> > > >>>>>>> > > >> > aggregation
>> > > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is
>> a
>> > > >>>>>>> problem. Just
>> > > >>>>>>> > > >> want
>> > > >>>>>>> > > >> > to
>> > > >>>>>>> > > >> > >> understand it more.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
>> > > >>>>>>> thw@apache.org> <
>> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>> > > >>>>>>> > > >> > >> thw@apache.org> <
>> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks for working on this!
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>> > > >>>>>>> SourceReaderBase and
>> > > >>>>>>> > > then
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> doing
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
>> > > >>>>>>> AlignedSplitReader
>> > > >>>>>>> > > >> could
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> be
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can
>> instead
>> > be
>> > > >>>>>>> added to
>> > > >>>>>>> > the
>> > > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
>> > make
>> > > >>>>>>> it
>> > > >>>>>>> > explicit
>> > > >>>>>>> > > >> that
>> > > >>>>>>> > > >> > >> the source actually supports it.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Thanks,
>> > > >>>>>>> > > >> > >> Thomas
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
>> > > >>>>>>> > > knaufk@apache.org>
>> > > >>>>>>> > > >> <
>> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
>> > knaufk@apache.org
>> > > >
>> > > >>>>>>> <
>> > > >>>>>>> > > >> > >> knaufk@apache.org> <
>> > > >>>>>>> > > >> > >> knaufk@apache.org>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
>> > > interface
>> > > >>>>>>> (aka the
>> > > >>>>>>> > > >> stop
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> &
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
>> > > >>>>>>> Pulsar only,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> correct?
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to
>> complete
>> > the
>> > > >>>>>>> watermark
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> aligned
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> story with this FLIP.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Cheers,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Konstantin
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz
>> <
>> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I support it
>> ;)
>> > I
>> > > >>>>>>> think we
>> > > >>>>>>> > can
>> > > >>>>>>> > > >> > >> start a vote thread soonish, as there are no
>> concerns
>> > > so
>> > > >>>>>>> far.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Dawid
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Dear Flink developers,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1]
>> for
>> > > an
>> > > >>>>>>> > extension
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> of
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment also in
>> > > >>>>>>> SplitReaders. To
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> do
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> so,
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and resume
>> > reading
>> > > >>>>>>> from
>> > > >>>>>>> > split
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> sources
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and controlls
>> > > >>>>>>> suspend and
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> resume.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> To
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> gather information about current watermarks of the
>> > > >>>>>>> SplitReaders,
>> > > >>>>>>> > we
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> extend
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
>> > > >>>>>>> watermarks to
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> the
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> SourceOperator.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by
>> Arvid
>> > > >>>>>>> Heise and
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> revised
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> and
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the
>> > work.)
>> > > >>>>>>> and me.
>> > > >>>>>>> > The
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> changes
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> are backwards compatible in a way that if affected
>> > > >>>>>>> components do
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> not
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> support split alignment the behavior is as before.
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Best,
>> > > >>>>>>> > > >> > >> Sebastian
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> [1]
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >>
>> > > >>>>>>> > >
>> > > >>>>>>> >
>> > > >>>>>>>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> [2]
>> > > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> --
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
>> > > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> > >>
>> > > >>>>>>> > > >> >
>> > > >>>>>>> > > >>
>> > > >>>>>>> > > >
>> > > >>>>>>> > >
>> > > >>>>>>> >
>> > > >>>>>>>
>> > > >>>>>>
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
I had an offline discussion with Piotr and here is the summary. Please
correct me if I miss something, Piotr.

There are two things we would like to seek more opinions from the
community, so we can make progress on this FLIP.

1. The General pattern to add obligatory features to existing interfaces.
***********************************************************************************
For interfaces exposed to the developers for implementation, they are
either intended to be *optional* or *obligatory. *While it is quite clear
about how to convey that intention when creating the interfaces, it is not
as commonly agreed when we are adding new features to an existing
interface. In general, Flink uses decorative interfaces when adding
optional features to existing interfaces. Both Piotr and I agree that looks
good.

Different opinions are mainly about how to add obligatory features to the
existing interfaces, probably due to different understandings of
"obligatory".

We have discussed about four options:

*Option 1:*

   - Just add a new method to the existing interface.
   - For backwards compatibility, the method would have a default
   implementation throwing "UnsupportedOperationException".
   - In the next major version, remove the default implementation.
   - For the developers, any method with a default implementation throwing
   an "UnsupportedOperationException" should be taken as obligatory.

*Option 2:*

   - Always make the features optional by adding a decorative interface,
   just like ordinary optional features.
   - Inform the developers via documentation that this feature is
   obligatory, although it looks like optional from the code.
   - In case the developers did not implement the decorative interface,
   throw an exception
   - In the next major version, move the methods in the decorative
   interface to the base interface, and deprecate the decorative interface.

*Option 3:*

   - Always bump the major version when a new obligatory feature is added,
   even if we may have to do it frequently.

*Option 4:*

   - Add a V2, V3... of the interface affected by the new obligatory
   feature.
   - In the next major versions, deprecate old versions of the interfaces.

Both Piotr and me agreed that option 3 and option 4 have a big side effect
and should be avoided. We have different preference between option 1 and
option 2.

Personally I prefer option 1, the reasons are:
  a) simple and intuitive. Java 8 introduced the default impl in interfaces
exactly for interface evolving, and this is a common pattern in many
projects.
  b) prominent to the developers that the feature is expected to be
implemented, because it explicitly throws an exception in the default impl.
  c) low maintenance overhead - the Flink framework can always assume the
method exists, so no special handling logic is needed.
  d) communicate a clear semantic boundary between optional and obligatory
features in the Flink to the developers.
      - Optional: Jobs still run without exception if these methods are not
implemented. e.g. all the SupportsXXXPushDown interfaces.
      - Obligatory: Jobs may fail if these methods are not implemented
properly. e..g SourceReader#pauseOrResumeSplits(). This is a common pattern
in Java, e.g. Iterator.remove() by default throws
"UnsupportedOperationException", informing the implementation that things
may go wrong if this method is not implemented.

As for option 2, Although the API itself sounds clean, it misleads people
to think of an obligatory feature to be optional - from the code the
feature is optional, but the documents say it is obligatory. We probably
should avoid such code-doc inconsistency, as people will be confused. And I
would actually be bewildered that sometimes not implementing an "optional"
feature is fine, but sometimes it causes the jobs to fail.

In response to the argument that the method with a default implementation
is always optional, if that is true, it actually means all the interfaces
should be immutable once they are created. If we want to add a method to an
existing interface, for backwards compatibility, we will have to provide a
default implementation. And the fact it has a default implementation
indicates the method is optional. If that method is optional, it should
reside in a separate decorative interface, otherwise it clogs that existing
interface. Therefore, people should never add a method to an existing
interface. I find this conclusion a bit extreme.

Piotr prefers option 2, his opinions are:
    a) Obligatory methods are the methods that fail the code compilation if
not implemented.
    b) All obligatory methods should reside in the base interface, without
a default implementation. And all the optional methods should be in
decorative interfaces. This is a clean API.
    c) due to b), there isn't a viable solution to add an obligatory method
to an existing interface in a backwards compatible way. Unless we are OK
with breaking backwards compatibility, all the interfaces should be treated
as immutable. As a compromise, we might as well just treat all the features
added later as optional features. This way we keep the API clean.
    d) based on b) and c), option 2 has a clean API, while option 1 does
not.
    e) It is OK that the code itself in option 2 indicates the developers
that a feature is optional. We will rely on the documentation to correct
that and clarify that the feature is actually obligatory.
    f) Regarding the effectiveness of making people aware that the feature
is obligatory, Option 1 and Option 2 are similar. For people that do not
read the release note / documentation, they will mistake the feature to be
optional anyways.

As for option 1: For developers, the feature is still optional due to the
default implementation in the interface, regardless of what the default
implementation does, because the code compiles without overriding these
methods. Also, another problem of this option is that for users that do not
know about the history of the interface, they may be confused by the
default implementation throwing an exception.


2. For this particular FLIP, should it be optional or not?
***********************************************************************************
As mentioned in the previous email, I feel this FLIP should be obligatory,
for the following reasons:
1. The Flink framework exposes the watermark alignment API to the end
users. From the end users' perspective, the feature should be available
regardless of the implementation details in the pluggables. This is true
for any other methods exposed as the Flink API.
2. If a Source is not pausable, the end user should receive an exception
when enable the watermark alignment, (both Piotr and me agree on this). In
that case, it meets my criteria of obligatory feature because not
implementing the feature causes a framework API to throw exception and
fails the job.

On the other hand, Piotr does not have a strong opinion regarding whether
this feature should be optional or not.


Thanks for reading through this long email. So basically in order to make
progress on this FLIP, we want to see what do people feel about the above
two topics.

Thanks,

Jiangjie (Becket) Qin


On Thu, May 26, 2022 at 3:06 PM Piotr Nowojski <pn...@apache.org> wrote:

> Hi Becket,
>
> I still sustain what I wrote before:
> > I think I would still vote soft -1 on this option, but I wouldn't block
> it in case I am out-voted.
>
> > I think it might be helpful to agree on the definition of optional in our
> case.
>
> For me it doesn't matter whether a default method throwing an exception we
> call optional or non-optional. As long as we keep it this way, the effect
> is the same. It's effectively a method that a user doesn't have to
> implement. If interface/system allows some methods to be not implemented,
> some users will do just that, regardless if we call it and document as
> non-optional. And at the same time it's clogging the base interface.
>
> By the way, just the need for a java-doc/documentation explaining the
> existence of some construct is a bad smell (code should be self-documenting
> and default method throwing an UnsupportedOperationException is not).
>
> > Please note that so far we do not assume whether the feature is in
> > the original API or it is added later. A newly added feature can also be
> > non-optional, although it might take some time for all the pluggable
> > developers to catch up, and they should still work if the new feature is
> > not used until they catch up. In contrast, we may never expect an
> optional
> > feature to catch up, because leaving it unimplemented is also blessed.
> >
> > Let's take the checkpointing as an example. Imagine Flink did not support
> > checkpointing before release 1.16. And now we are trying to add
> > checkpointing to Flink. So we exposed the checkpoint configuration to the
> > end users. In the meantime, will we tell the pluggable (e.g. operators,
> > connectors) developers that methods like "snapshotState()" is optional?
> If
> > we do that, the availability of checkpointing in Flink would be severely
> > weakened. But apparently we should still allow the existing
> implementations
> > to work without checkpointing. It looks to me that adding the method to
> the
> > pluggable interfaces with a default implementation throwing
> > "UnsupportedOperationException" would be the solution here. Please note
> > that in this case, having the default implementation does not mean this
> is
> > optional. It is just the technique to support backwards compatibility in
> > the feature evolution. The fact that this method is in the base interface
> > suggests it is not optional, so the developers SHOULD implement it.
>
> I would soft vote -1 for having the default method throwing
> UnsupportedOperationException as one of thing for this (FLIP-217) special
> circumstances.
>
> At the moment, without thinking this over too much, I would vote harder -1
> for having this as a general rule when adding new features. If we ever end
> up with an API that is littered with default methods throwing
> UnsupportedOperationException that are documented as "non optional" it
> would be IMO a big design failure. I would be ok-ish with that, only if
> that was a temporary thing and we had an aggressive plan to release more
> often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
> compatibility, that would get rid of those default methods. Adding
> checkpointing and methods like "snapshotState()" would IMO easily justify a
> new major Flink release. In that case we could add those methods with
> default implementation for some transition period, a one or two minor
> releases, followed by a clean up in a major release. However I would still
> argue that it would be cleaner/better to add a decorative interface like
> `CheckpointedOperator` interface instead of adding those default methods to
> the base `Operator` interface.
>
> I think I can sum up our disagreement here is that I would like to keep the
> interfaces simpler, with only obligatory methods/features on one side and
> clearly optional features on the other. While you would like to add an
> extra third state in between those two?
>
> Best,
> Piotrek
>
>
>
> czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com> napisał(a):
>
> > Thanks for the clarification, Piotr and Sebastian.
> >
> > It looks like the key problem is still whether the implementation of
> > pausable splits in the Sources should be optional or not.
> >
> > I think it might be helpful to agree on the definition of optional in our
> > case. To me:
> > Optional = "You CAN leave the method unimplemented, and that is fine."
> > Non-Optional = "You CAN leave the method unimplemented, but you SHOULD
> NOT,
> > because people assume this works."
> >
> > I think one sufficient condition of a Non-Optional feature is that if the
> > feature is exposed through the framework API, Flink should expect the
> > pluggables to support this feature by default. Otherwise the availability
> > of that feature becomes undefined.
> >
> > Please note that so far we do not assume whether the feature is in
> > the original API or it is added later. A newly added feature can also be
> > non-optional, although it might take some time for all the pluggable
> > developers to catch up, and they should still work if the new feature is
> > not used until they catch up. In contrast, we may never expect an
> optional
> > feature to catch up, because leaving it unimplemented is also blessed.
> >
> > Let's take the checkpointing as an example. Imagine Flink did not support
> > checkpointing before release 1.16. And now we are trying to add
> > checkpointing to Flink. So we exposed the checkpoint configuration to the
> > end users. In the meantime, will we tell the pluggable (e.g. operators,
> > connectors) developers that methods like "snapshotState()" is optional?
> If
> > we do that, the availability of checkpointing in Flink would be severely
> > weakened. But apparently we should still allow the existing
> implementations
> > to work without checkpointing. It looks to me that adding the method to
> the
> > pluggable interfaces with a default implementation throwing
> > "UnsupportedOperationException" would be the solution here. Please note
> > that in this case, having the default implementation does not mean this
> is
> > optional. It is just the technique to support backwards compatibility in
> > the feature evolution. The fact that this method is in the base interface
> > suggests it is not optional, so the developers SHOULD implement it.
> >
> > When it comes to this FLIP, I think it meets the criteria of non-optional
> > features, so we should just use the evolution path of non-optional
> > features.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> > > Hi,
> > >
> > > Actually previously I thought about having a decorative interface and
> > > whenever watermark alignment is enabled, checking that the source
> > > implements the decorative interface. If not, throwing an exception.
> > >
> > > The option with default methods in the source interfaces throwing
> > > `UnsupportedOperationException` I think still suffers from the same
> > > problems I mentioned before. It's still an optional implementation and
> at
> > > the same time it's clogging the base interface. I think I would still
> > vote
> > > soft -1 on this option, but I wouldn't block it in case I am out-voted.
> > >
> > > Best,
> > > Piotrek
> > >
> > > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <se...@ververica.com>
> > > napisał(a):
> > >
> > > > Hi Becket,
> > > >
> > > > Thanks a lot for your fast and detailed response. For me, it
> converges
> > > and
> > > > dropping the supportsX method sounds very reasonable to me. (Side
> note:
> > > > With "pausable splits" enabled as "default" I think we misunderstood.
> > As
> > > > you described now "default" I understand as that it should be the new
> > > > recommended way of implementation, and I think that is fully valid.
> > > Before,
> > > > I understood "default" here as the default implementation, i.e.,
> > throwing
> > > > UnsupportedOperationException, which is the exact opposite. :) )
> > > >
> > > > Nevertheless: As mentioned, an open question for me is if watermark
> > > > alignment should enforce pausable splits. For clarification, the
> > current
> > > > documentation [1] says:
> > > >
> > > > *Note:* As of 1.15, Flink supports aligning across tasks of the same
> > > >> source and/or different sources. It does not support aligning
> > > >> splits/partitions/shards in the same task.
> > > >>
> > > >> In a case where there are e.g. two Kafka partitions that produce
> > > >> watermarks at different pace, that get assigned to the same task
> > > watermark
> > > >> might not behave as expected. Fortunately, worst case it should not
> > > perform
> > > >> worse than without alignment.
> > > >>
> > > >> Given the limitation above, we suggest applying watermark alignment
> in
> > > >> two situations:
> > > >>
> > > >>    1. You have two different sources (e.g. Kafka and File) that
> > produce
> > > >>    watermarks at different speeds
> > > >>    2. You run your source with parallelism equal to the number of
> > > >>    splits/shards/partitions, which results in every subtask being
> > > assigned a
> > > >>    single unit of work.
> > > >>
> > > >> I personally see no issue in implementing and I see no reason
> against
> > > > implementing this dependency of watermark alignment and pausable
> > splits.
> > > (I
> > > > think this would even be a good path towards shaping watermark
> > alignment
> > > in
> > > > 1.16.) However, "I don't see" means that I would be happy to hear
> > Dawid's
> > > > and Piotrek's opinions as they implemented watermark alignment based
> on
> > > > FLIP-182 [2] and I don't want to miss relevant rationale/background
> > info
> > > > from their side.
> > > >
> > > > *@Piotrek* *@Dawid *What do you think?
> > > >
> > > > Regards,
> > > > Sebastian
> > > >
> > > > [1]
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > > >
> > > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <be...@gmail.com>
> > wrote:
> > > >
> > > >> +dev
> > > >>
> > > >> Hi Sebastian,
> > > >>
> > > >> Thank you for the summary. Please see the detailed replies inline.
> As
> > a
> > > >> recap of my suggestions.
> > > >>
> > > >> 1. Pausable splits API.
> > > >>   a) Add default implementations to methods "pauseOrResumeSplits" in
> > > both
> > > >> SourceReader and SplitReader where both default implementations
> throw
> > > >>  UnsupportedOperationException.
> > > >>
> > > >> 2. User story.
> > > >>     a) We tell users to enable the watermark alignment as they like.
> > > This
> > > >> is exactly what the current Flink API is.
> > > >>     b) We tell the source developers, please implement pausable
> > splits,
> > > >> otherwise bad things may happen. Think of it like you are expected
> to
> > > >> implement SourceReader#snapshotState() properly, otherwise
> exceptions
> > > will
> > > >> be thrown when users enable checkpointing.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> > > >> sebastian@ververica.com> wrote:
> > > >>
> > > >>> Hi Becket, Hi everybody,
> > > >>>
> > > >>> I'm sorry if I misread the messages but I could not derive an
> > agreement
> > > >>> from the mailing list. Nevertheless, if I understand you right the
> > > >>> suggestion is:
> > > >>>
> > > >>> * Add default implementations to methods "pauseOrResumeSplits" in
> > both
> > > >>> SourceReader and SplitReader where both default implementations
> throw
> > > >>> UnsupportedOperationException.
> > > >>>
> > > >> Yes.
> > > >>
> > > >> * Add "supportsPauseOrResumeSplits" to the Source interface. (In the
> > > >>> following, I refer to supporting this as "pausable splits".)
> > > >>>
> > > >> We may no longer need this if pausable splits are expected to be
> > > >> implemented by the source developers, i.e. non-optional. Having this
> > > method
> > > >> would then be somewhat misleading as it looks like the sources that
> do
> > > not
> > > >> support pausable splits are also acceptable in the long term. So API
> > > wise,
> > > >> I'd say maybe we should remove this for this FLIP, although I
> believe
> > > this
> > > >> supportXXX pattern itself is still attractive for optional features.
> > > >>
> > > >>
> > > >>>
> > > >>> To make the conclusions explicit:
> > > >>>
> > > >>> 1. The implementation of pauseOrResumeSplits in both interfaces
> > > >>> SourceReader and SplitReader are optional where the default is that
> > it
> > > >>> doesn't support it. (--> This means that the implementation is
> still
> > > >>> optional for the source developer.)
> > > >>>
> > > >> It is optional for backwards compatibility with existing sources, as
> > > they
> > > >> may still compile without code change. But starting from this FLIP,
> > > Flink
> > > >> will always optimistically assume that all the sources support
> > pausable
> > > >> splits. If a source does not support pausable splits, it goes to an
> > > error
> > > >> handling path when watermark alignment is enabled on it. This is
> > > different
> > > >> from a usual optional feature, where no error is expected.
> > > >>
> > > >>
> > > >>> 2. If watermark alignment is enabled in the application code by
> > adding
> > > >>> withWatermarkAlignment to the WatermarkStrategy while SourceReader
> or
> > > >>> SplitReader do not support pausableSplits, we throw an
> > > >>> UnsupportedOperationException.
> > > >>>
> > > >> Yes.
> > > >>
> > > >>
> > > >>> 3. With regard to your statement:
> > > >>>
> > > >>>> [...] basically means watermark alignment is an non-optional
> feature
> > > to
> > > >>>> the end users.
> > > >>>
> > > >>> You actually mean that "pausable splits" are non-optional for the
> app
> > > >>> developer if watermark alignment is enabled. However, watermark
> > > alignment
> > > >>> is optional and can be enabled/disabled.
> > > >>>
> > > >> Yes, watermark alignment can be enabled/disabled in individual
> sources
> > > in
> > > >> Flink jobs, which basically means the code supporting watermark
> > > alignment
> > > >> has to already be there. That again means the Source developers are
> > also
> > > >> expected to support pausable splits by default. So this way we
> > > essentially
> > > >> tell the end users that you may enable / disable this feature as you
> > > wish,
> > > >> and tell the source developers that you SHOULD implement this
> because
> > > the
> > > >> end users may turn it on/off at will. And if the source does not
> > support
> > > >> pausable splits, that goes to an error handling path when watermark
> > > >> alignment is enabled on it. So users know they have to explicitly
> > > exclude
> > > >> this source.
> > > >>
> > > >>
> > > >>>
> > > >>> So far it's totally clear to me and I hope this is what you mean. I
> > > also
> > > >>> agree with both statements:
> > > >>>
> > > >>> So making that expectation aligned with the source developers seems
> > > >>>> reasonable.
> > > >>>>
> > > >>>
> > > >>> I think this is a simple and clean solution from both the end user
> > and
> > > >>>> source developers' standpoint.
> > > >>>>
> > > >>>
> > > >>> However, a last conclusion derives from 3. and is an open question
> > for
> > > >>> me:
> > > >>>
> > > >>> 4. The feature of "pausable splits" is now tightly bound to
> watermark
> > > >>> alignment, i.e., if sources do not support "pausable splits" one
> can
> > > not
> > > >>> enable watermark alignment for these sources. This dependency is
> not
> > > the
> > > >>> current status of watermark alignment implementation because it
> > is/was
> > > >>> implemented without pausable splits. Do we want to introduce this
> > > >>> dependency? (This is an open question. I cannot judge that.)
> > > >>>
> > > >> The watermark alignment basically relies on the pausable splits,
> > right?
> > > >> So personally I found it quite reasonable that if the source does
> not
> > > >> support pausable splits, end users cannot enable watermark alignment
> > on
> > > it.
> > > >>
> > > >>
> > > >>> If something is wrong, please correct me.
> > > >>>
> > > >>> Regards,
> > > >>> Sebastian
> > > >>>
> > > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com>
> > > wrote:
> > > >>>
> > > >>>> Hi Sebastian,
> > > >>>>
> > > >>>> Thanks for the reply and patient discussion. I agree this is a
> > tricky
> > > >>>> decision.
> > > >>>>
> > > >>>>
> > > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I
> see
> > as
> > > >>>>> follows:
> > > >>>>> (1) An interface with default NOOP implementation makes the
> > > >>>>> implementation optional. And in my opinion, a default
> > implementation
> > > is and
> > > >>>>> will remain a way of making implementation optional because even
> in
> > > future
> > > >>>>> a developer can decide to implement the "old flavor" without
> > support
> > > for
> > > >>>>> pausable splits.
> > > >>>>> (2) It may not be too critical but I also find it suboptimal that
> > > with
> > > >>>>> a NOOP default implementation there is no way to check at runtime
> > if
> > > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
> > one
> > > would
> > > >>>>> need a supportsX method which makes it again more complicated.)\
> > > >>>>
> > > >>>>
> > > >>>> Based on the last few messages in the mailing list.  Piotr and I
> > > agreed
> > > >>>> that the default implementation should just throw an
> > > >>>> UnsupportedOperationException if the source is unpausable. So this
> > > >>>> basically tells the Source developers that this feature is
> expected
> > > to be
> > > >>>> supported. Because we cannot prevent end users from putting an
> > > unpausable
> > > >>>> source into the watermark alignment group, that basically means
> > > watermark
> > > >>>> alignment is an non-optional feature to the end users. So making
> > that
> > > >>>> expectation aligned with the source developers seems reasonable.
> > And
> > > if a
> > > >>>> source does not support this feature, the end users should
> > explicitly
> > > >>>> remove that source from the watermark alignment group.
> > > >>>>
> > > >>>> Personally speaking I think this is a simple and clean solution
> from
> > > >>>> both the end user and source developers' standpoint.
> > > >>>>
> > > >>>> Does this address your concerns?
> > > >>>>
> > > >>>> Thanks,
> > > >>>>
> > > >>>> Jiangjie (Becket) Qin
> > > >>>>
> > > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> > > >>>> sebastian@ververica.com> wrote:
> > > >>>>
> > > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > > >>>>>
> > > >>>>> we, Dawid and I, discussed the various suggestions/options and we
> > > >>>>> would be okay either way because we find neither solution is
> > perfect
> > > just
> > > >>>>> because of the already present complexity.
> > > >>>>>
> > > >>>>> Option c) Adding methods to the interfaces of SourceReader and
> > > >>>>> SplitReader
> > > >>>>> Option a) Adding decorative interfaces to be used by SourceReader
> > and
> > > >>>>> SplitReader
> > > >>>>>
> > > >>>>> As of the current status (v. 12) of the FLIP [1], it is based on
> > > >>>>> Option c) which we find acceptable because the complexity added
> is
> > > only a
> > > >>>>> single method.
> > > >>>>>
> > > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I
> see
> > as
> > > >>>>> follows:
> > > >>>>> (1) An interface with default NOOP implementation makes the
> > > >>>>> implementation optional. And in my opinion, a default
> > implementation
> > > is and
> > > >>>>> will remain a way of making implementation optional because even
> in
> > > future
> > > >>>>> a developer can decide to implement the "old flavor" without
> > support
> > > for
> > > >>>>> pausable splits.
> > > >>>>> (2) It may not be too critical but I also find it suboptimal that
> > > with
> > > >>>>> a NOOP default implementation there is no way to check at runtime
> > if
> > > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
> > one
> > > would
> > > >>>>> need a supportsX method which makes it again more complicated.)
> > > >>>>>
> > > >>>>> However, we haven't changed it because Option a) is also not
> > optimal
> > > >>>>> or straight-forward:
> > > >>>>> (1) We need to add two distinct yet similar decorative interfaces
> > > >>>>> since, as mentioned, the signatures of the methods are different.
> > For
> > > >>>>> example, we would need decorative interfaces like
> > > >>>>> `SplitReaderWithPausableSplits` and
> > `SourceReaderWithPausableSplits`.
> > > >>>>> (2) As a consequence, we would need to somehow document how/where
> > to
> > > >>>>> implement both interfaces and how this relates to each other.
> This
> > > we could
> > > >>>>> solve by adding a note in the interface of SourceReader and
> > > SplitReader and
> > > >>>>> reference to the decorative interfaces but it still increases
> > > complexity
> > > >>>>> too.
> > > >>>>>
> > > >>>>> In summary, we see both as acceptable and preferred over other
> > > >>>>> options. The question is if we can find a solution or compromise
> > > that is
> > > >>>>> acceptable for everybody to reach consensus.
> > > >>>>>
> > > >>>>> Please let us know what you think because we would be happy if we
> > can
> > > >>>>> conclude the discussion to avoid dropping the initiative on this
> > > FLIP.
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> Sebastian
> > > >>>>>
> > > >>>>> [1]
> > > >>>>>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > > >>>>> (v. 12)
> > > >>>>>
> > > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> > pnowojski@apache.org
> > > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Guowei,
> > > >>>>>>
> > > >>>>>> as Dawid wrote a couple of messages back:
> > > >>>>>>
> > > >>>>>> > This is covered in the previous FLIP[1] which has been already
> > > >>>>>> implemented in 1.15. In short, it must be enabled with the
> > watermark
> > > >>>>>> strategy which also configures drift and update interval
> > > >>>>>>
> > > >>>>>> So by default watermark alignment is disabled, regardless if a
> > > source
> > > >>>>>> supports it or not.
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Piotrek
> > > >>>>>>
> > > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com>
> > > napisał(a):
> > > >>>>>>
> > > >>>>>>> Hi,
> > > >>>>>>>
> > > >>>>>>> We know that in the case of Bounded input Flink supports the
> > Batch
> > > >>>>>>> execution mode. Currently in Batch execution mode, flink is
> > > executed
> > > >>>>>>> on a
> > > >>>>>>> stage-by-stage basis. In this way, perhaps watermark alignment
> > > might
> > > >>>>>>> not
> > > >>>>>>> gain much.
> > > >>>>>>>
> > > >>>>>>> So my question is: Is watermark alignment the default
> > behavior(for
> > > >>>>>>> implemented source only)? If so, have you considered evaluating
> > the
> > > >>>>>>> impact
> > > >>>>>>> of this behavior on the Batch execution mode? Or thinks it is
> not
> > > >>>>>>> necessary.
> > > >>>>>>>
> > > >>>>>>> Correct me if I miss something.
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Guowei
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> > > >>>>>>> piotr.nowojski@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> > Hi Becket and Dawid,
> > > >>>>>>> >
> > > >>>>>>> > > I feel that no matter which option we choose this can not
> be
> > > >>>>>>> solved
> > > >>>>>>> > entirely in either of the options, because of the point above
> > and
> > > >>>>>>> because
> > > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
> > > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
> > > >>>>>>> identifies
> > > >>>>>>> > splits with splitId the other one passes the splits
> directly).
> > > >>>>>>> >
> > > >>>>>>> > Yes, that's a good point in this case and for features that
> > need
> > > >>>>>>> to be
> > > >>>>>>> > implemented in more than one place.
> > > >>>>>>> >
> > > >>>>>>> > > Is there any reason for pausing reading from a split an
> > > optional
> > > >>>>>>> feature,
> > > >>>>>>> > > other than that this was not included in the original
> > > interface?
> > > >>>>>>> >
> > > >>>>>>> > An additional argument in favor of making it optional is to
> > > >>>>>>> simplify source
> > > >>>>>>> > implementation. But on its own I'm not sure if that would be
> > > >>>>>>> enough to
> > > >>>>>>> > justify making this feature optional. Maybe.
> > > >>>>>>> >
> > > >>>>>>> > > I think it would be way simpler and clearer to just let end
> > > >>>>>>> users and
> > > >>>>>>> > Flink
> > > >>>>>>> > > assume all the connectors will implement this feature.
> > > >>>>>>> >
> > > >>>>>>> > As I wrote above that would be an interesting choice to make
> > > (ease
> > > >>>>>>> of
> > > >>>>>>> > implementation for new users, vs system consistency).
> > Regardless
> > > >>>>>>> of that,
> > > >>>>>>> > yes, for me the main argument is the API backward
> > compatibility.
> > > >>>>>>> But let's
> > > >>>>>>> > clear a couple of points:
> > > >>>>>>> > - The current proposal adding methods to the base interface
> > with
> > > >>>>>>> default
> > > >>>>>>> > implementations is an OPTIONAL feature. Same as the
> decorative
> > > >>>>>>> version
> > > >>>>>>> > would be.
> > > >>>>>>> > - Decorative version could implement "throw
> > > >>>>>>> UnsupportedOperationException"
> > > >>>>>>> > if user enabled watermark alignment just as well and I agree
> > > >>>>>>> that's a
> > > >>>>>>> > better option compared to logging a warning.
> > > >>>>>>> >
> > > >>>>>>> > Best,
> > > >>>>>>> > Piotrek
> > > >>>>>>> >
> > > >>>>>>> >
> > > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
> > > >>>>>>> napisał(a):
> > > >>>>>>> >
> > > >>>>>>> > > Thanks for the reply and patient discussion, Piotr and
> Dawid.
> > > >>>>>>> > >
> > > >>>>>>> > > Is there any reason for pausing reading from a split an
> > > optional
> > > >>>>>>> feature,
> > > >>>>>>> > > other than that this was not included in the original
> > > interface?
> > > >>>>>>> > >
> > > >>>>>>> > > To be honest I am really worried about the complexity of
> the
> > > >>>>>>> user story
> > > >>>>>>> > > here. Optional features like this have a high overhead.
> > Imagine
> > > >>>>>>> this
> > > >>>>>>> > > feature is optional, now a user enabled watermark alignment
> > and
> > > >>>>>>> defined a
> > > >>>>>>> > > few watermark groups. Would it work? Hmm, that depends on
> > > >>>>>>> whether the
> > > >>>>>>> > > involved Source has implmemented this feature. If the
> Sources
> > > >>>>>>> are well
> > > >>>>>>> > > documented, good luck. Otherwise end users may have to look
> > > into
> > > >>>>>>> the code
> > > >>>>>>> > > of the Source to see whether the feature is supported.
> Which
> > is
> > > >>>>>>> something
> > > >>>>>>> > > they shouldn't have to do.
> > > >>>>>>> > >
> > > >>>>>>> > > I think it would be way simpler and clearer to just let end
> > > >>>>>>> users and
> > > >>>>>>> > Flink
> > > >>>>>>> > > assume all the connectors will implement this feature.
> After
> > > all
> > > >>>>>>> the
> > > >>>>>>> > > watermark group is not optinoal to the end users. If in
> some
> > > >>>>>>> rare cases,
> > > >>>>>>> > > the feature cannot be supported, a clear
> > > >>>>>>> UnsupportedOperationException
> > > >>>>>>> > will
> > > >>>>>>> > > be thrown to tell users to explicitly remove this Source
> from
> > > the
> > > >>>>>>> > watermark
> > > >>>>>>> > > group. I don't think we should have a warning message here,
> > as
> > > >>>>>>> they tend
> > > >>>>>>> > to
> > > >>>>>>> > > be ignored in many cases. If we do this, we don't even need
> > the
> > > >>>>>>> > supportXXX
> > > >>>>>>> > > method in the Source for this feature. In fact this is
> > exactly
> > > >>>>>>> how many
> > > >>>>>>> > > interfaces works today. For example,
> > > >>>>>>> SplitEnumerator#addSplitsBack() is
> > > >>>>>>> > not
> > > >>>>>>> > > supported by Pravega source because it does not support
> > partial
> > > >>>>>>> failover.
> > > >>>>>>> > > In that case, it simply throws an exception to trigger a
> > global
> > > >>>>>>> recovery.
> > > >>>>>>> > >
> > > >>>>>>> > > The reason we add a default implementation in this case
> would
> > > >>>>>>> just for
> > > >>>>>>> > the
> > > >>>>>>> > > sake of backwards compatibility so the old source can still
> > > >>>>>>> compile.
> > > >>>>>>> > Sure,
> > > >>>>>>> > > in short term, this feature might not be supported by many
> > > >>>>>>> existing
> > > >>>>>>> > > sources. That is OK, and it is quite visible to the source
> > > >>>>>>> developers
> > > >>>>>>> > that
> > > >>>>>>> > > they did not override the default impl which throws an
> > > >>>>>>> > > UnsupportedOperationException.
> > > >>>>>>> > >
> > > >>>>>>> > > @Dawid,
> > > >>>>>>> > >
> > > >>>>>>> > > the Java doc of the SupportXXX() method in the Source would
> > be
> > > >>>>>>> the single
> > > >>>>>>> > > >> source of truth regarding how to implement this feature.
> > > >>>>>>> > > >
> > > >>>>>>> > > >
> > > >>>>>>> > >
> > > >>>>>>> > > I also don't find it entirely true. Half of the classes are
> > > >>>>>>> theoretically
> > > >>>>>>> > > > optional and are utility classes from the point of view
> how
> > > the
> > > >>>>>>> > > interfaces
> > > >>>>>>> > > > are organized. Theoretically users do not need to use any
> > of
> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
> > their
> > > >>>>>>> methods in
> > > >>>>>>> > > the
> > > >>>>>>> > > > Source interface.
> > > >>>>>>> > >
> > > >>>>>>> > > I think the ultimate goal of java docs is to guide users to
> > > >>>>>>> implement the
> > > >>>>>>> > > Source. If SourceReaderBase is the preferred way to
> > implement a
> > > >>>>>>> > > SourceReader, it seems worth mentioning that. Even the Java
> > > >>>>>>> language
> > > >>>>>>> > > documentation interfaces lists the konwn implementations
> [1]
> > so
> > > >>>>>>> people
> > > >>>>>>> > can
> > > >>>>>>> > > leverage them. But for this particular case, if we make the
> > > >>>>>>> feature
> > > >>>>>>> > > non-optional, we don't even need the supportXXX() method
> for
> > > now.
> > > >>>>>>> > >
> > > >>>>>>> > > Thanks,
> > > >>>>>>> > >
> > > >>>>>>> > > Jiangjie (Becket) Qin
> > > >>>>>>> > >
> > > >>>>>>> > >
> > > >>>>>>> > >
> > > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
> > > >>>>>>> dwysakowicz@apache.org>
> > > >>>>>>> > > wrote:
> > > >>>>>>> > >
> > > >>>>>>> > > > Hey Piotr and Becket,
> > > >>>>>>> > > >
> > > >>>>>>> > > > First of all, let me say I am happy with whichever option
> > is
> > > >>>>>>> agreed in
> > > >>>>>>> > > the
> > > >>>>>>> > > > discussion.
> > > >>>>>>> > > >
> > > >>>>>>> > > > I wanted to clarify a few points from the discussion
> > though:
> > > >>>>>>> > > >
> > > >>>>>>> > > > @Becket:
> > > >>>>>>> > > >
> > > >>>>>>> > > > The main argument for adding the methods to the
> > SourceReader
> > > >>>>>>> is that
> > > >>>>>>> > > these
> > > >>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader
> impl,
> > > i.e.
> > > >>>>>>> > starting
> > > >>>>>>> > > > from this FLIP, all the SourceReaders impl are expected
> to
> > > >>>>>>> support this
> > > >>>>>>> > > > method, although some old implementations may not have
> > > >>>>>>> implemented this
> > > >>>>>>> > > > feature. I think we should distinguish the new features
> > from
> > > >>>>>>> the
> > > >>>>>>> > optional
> > > >>>>>>> > > > features. While the public decorative interface is a
> > solution
> > > >>>>>>> to the
> > > >>>>>>> > > > optional features. We should not use it for the features
> > that
> > > >>>>>>> are
> > > >>>>>>> > > > non-optional.
> > > >>>>>>> > > >
> > > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even
> > though
> > > >>>>>>> > preferred, I
> > > >>>>>>> > > > still think it can be simply optional.
> > > >>>>>>> > > >
> > > >>>>>>> > > > the Java doc of the SupportXXX() method in the Source
> would
> > > be
> > > >>>>>>> the
> > > >>>>>>> > single
> > > >>>>>>> > > > source of truth regarding how to implement this feature.
> > > >>>>>>> > > >
> > > >>>>>>> > > > I also don't find it entirely true. Half of the classes
> are
> > > >>>>>>> > theoretically
> > > >>>>>>> > > > optional and are utility classes from the point of view
> how
> > > the
> > > >>>>>>> > > interfaces
> > > >>>>>>> > > > are organized. Theoretically users do not need to use any
> > of
> > > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
> > their
> > > >>>>>>> methods in
> > > >>>>>>> > > the
> > > >>>>>>> > > > Source interface.
> > > >>>>>>> > > >
> > > >>>>>>> > > > @Piotr
> > > >>>>>>> > > >
> > > >>>>>>> > > > If we have all of the methods with default implementation
> > in
> > > >>>>>>> the base
> > > >>>>>>> > > > interface, the API doesn't give any clue to the user
> which
> > > set
> > > >>>>>>> of
> > > >>>>>>> > methods
> > > >>>>>>> > > > are required to be implemented at the same time.
> > > >>>>>>> > > >
> > > >>>>>>> > > > I feel that no matter which option we choose this can not
> > be
> > > >>>>>>> solved
> > > >>>>>>> > > > entirely in either of the options, because of the point
> > above
> > > >>>>>>> and
> > > >>>>>>> > because
> > > >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
> > > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different
> > (one
> > > >>>>>>> identifies
> > > >>>>>>> > > > splits with splitId the other one passes the splits
> > > directly).
> > > >>>>>>> > > >
> > > >>>>>>> > > > Best,
> > > >>>>>>> > > >
> > > >>>>>>> > > > Dawid
> > > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > > >>>>>>> > > >
> > > >>>>>>> > > > Hi Piotr,
> > > >>>>>>> > > >
> > > >>>>>>> > > > Thanks for the comment.
> > > >>>>>>> > > >
> > > >>>>>>> > > > Just to clarify, I am not against the decorative
> > interfaces,
> > > >>>>>>> but I do
> > > >>>>>>> > > > think we should use them with caution. The main argument
> > for
> > > >>>>>>> adding the
> > > >>>>>>> > > > methods to the SourceReader is that these methods are
> > > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e.
> > starting
> > > >>>>>>> from this
> > > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to support
> > this
> > > >>>>>>> > > > method, although some old implementations may not have
> > > >>>>>>> implemented this
> > > >>>>>>> > > > feature. I think we should distinguish the new features
> > from
> > > >>>>>>> the
> > > >>>>>>> > optional
> > > >>>>>>> > > > features. While the public decorative interface is a
> > solution
> > > >>>>>>> to the
> > > >>>>>>> > > > optional features. We should not use it for the features
> > that
> > > >>>>>>> are
> > > >>>>>>> > > > non-optional.
> > > >>>>>>> > > >
> > > >>>>>>> > > > That said, this feature is optional for SplitReaders.
> > > Arguably
> > > >>>>>>> we can
> > > >>>>>>> > > have
> > > >>>>>>> > > > a decorative interface for that, but for simplicity and
> > > >>>>>>> symmetry of the
> > > >>>>>>> > > > interface, personally I prefer just adding a new method.
> > > >>>>>>> > > >
> > > >>>>>>> > > > Regarding the advantages you mentioned about the
> decorative
> > > >>>>>>> interfaces,
> > > >>>>>>> > > > they would make sense if:
> > > >>>>>>> > > > 1. The feature is optional.
> > > >>>>>>> > > > 2. There is only one decorative interface involved for a
> > > >>>>>>> feature.
> > > >>>>>>> > > > Otherwise the argument that all the methods are grouped
> > > >>>>>>> together will
> > > >>>>>>> > not
> > > >>>>>>> > > > stand.
> > > >>>>>>> > > >
> > > >>>>>>> > > > Compared with that, I think the current solution works
> fine
> > > in
> > > >>>>>>> all
> > > >>>>>>> > cases,
> > > >>>>>>> > > > i.e. "having supportXXX() method in Source, and default
> > > >>>>>>> methods /
> > > >>>>>>> > > > decorative interfaces in base interfaces.".
> > > >>>>>>> > > >
> > > >>>>>>> > > > The advantages are:
> > > >>>>>>> > > >> - clean and easy to implement base interface
> > > >>>>>>> > > >
> > > >>>>>>> > > > In the current approach, the Java doc of the SupportXXX()
> > > >>>>>>> method in the
> > > >>>>>>> > > > Source would be the single source of truth regarding how
> to
> > > >>>>>>> implement
> > > >>>>>>> > > this
> > > >>>>>>> > > > feature. It lists the method that has to be implemented
> to
> > > >>>>>>> support this
> > > >>>>>>> > > > feature, regardless of how many classes / interfaces are
> > > >>>>>>> involved.
> > > >>>>>>> > > >
> > > >>>>>>> > > > When implementing the base interface, users do not need
> to
> > > >>>>>>> implement a
> > > >>>>>>> > > > method with default implementation. If they are curious
> > what
> > > >>>>>>> the method
> > > >>>>>>> > > is
> > > >>>>>>> > > > for, the java doc of that method simply points users to
> the
> > > >>>>>>> > SupportXXX()
> > > >>>>>>> > > > method in the Source. It seems not adding work to the
> users
> > > >>>>>>> compared
> > > >>>>>>> > with
> > > >>>>>>> > > > decorative interfaces, but gives much better
> > discoverability.
> > > >>>>>>> > > >
> > > >>>>>>> > > > - all of the methods from a single feature are grouped
> in a
> > > >>>>>>> single
> > > >>>>>>> > > >> decorator interface, together with their dedicated java
> > doc.
> > > >>>>>>> It's also
> > > >>>>>>> > > >> easier to google search for help using the decorator
> name
> > > >>>>>>> > > >
> > > >>>>>>> > > > - if an optional feature requires two methods to be
> > > >>>>>>> implemented at
> > > >>>>>>> > once,
> > > >>>>>>> > > >> decorator can guarantee that
> > > >>>>>>> > > >
> > > >>>>>>> > > > These two points are not true when multiple components
> and
> > > >>>>>>> classes are
> > > >>>>>>> > > > involved collaboratively to provide a feature. In our
> case,
> > > we
> > > >>>>>>> have
> > > >>>>>>> > both
> > > >>>>>>> > > > SourceReader and SplitReader involved. And there might be
> > > other
> > > >>>>>>> > > interfaces
> > > >>>>>>> > > > on the JM side involved for some future features. So the
> > > >>>>>>> relevant
> > > >>>>>>> > methods
> > > >>>>>>> > > > can actually be scattered over the places. That said, we
> > may
> > > >>>>>>> still use
> > > >>>>>>> > > > decorative interfaces for each component, if the feature
> is
> > > >>>>>>> optional,
> > > >>>>>>> > > given
> > > >>>>>>> > > > there is a single source of truth for the feature.
> > > >>>>>>> > > >
> > > >>>>>>> > > > Here I would strongly lean towards making life easier for
> > new
> > > >>>>>>> users,
> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
> > for
> > > >>>>>>> the power
> > > >>>>>>> > > >> users.
> > > >>>>>>> > > >
> > > >>>>>>> > > > I actually think the current approach is simpler, more
> > > >>>>>>> extensible and
> > > >>>>>>> > > more
> > > >>>>>>> > > > general for all the users. Can you articulate a bit more
> on
> > > >>>>>>> which part
> > > >>>>>>> > > you
> > > >>>>>>> > > > think makes users harder to understand?
> > > >>>>>>> > > >
> > > >>>>>>> > > >
> > > >>>>>>> > > > There is another benefit of the decorative interfaces
> which
> > > is
> > > >>>>>>> not
> > > >>>>>>> > > > mentioned, but might be worth considering here. Usually
> the
> > > >>>>>>> decorative
> > > >>>>>>> > > > interfaces give slightly better backwards compatibility
> > than
> > > >>>>>>> the new
> > > >>>>>>> > > > default method in the interfaces. That is when users are
> > > using
> > > >>>>>>> a jar
> > > >>>>>>> > that
> > > >>>>>>> > > > was compiled with an older version of Flink which does
> not
> > > >>>>>>> have the
> > > >>>>>>> > > default
> > > >>>>>>> > > > method in the interfaces in question. A decorative
> > interface
> > > >>>>>>> may still
> > > >>>>>>> > > > provide backwards compatibility in that case, while
> default
> > > >>>>>>> method impl
> > > >>>>>>> > > > cannot.
> > > >>>>>>> > > >
> > > >>>>>>> > > > I think in Flink we in general do not guarantee custom
> > > >>>>>>> components
> > > >>>>>>> > > compiled
> > > >>>>>>> > > > with an older version can run with a newer version of
> > Flink.
> > > A
> > > >>>>>>> > recompile
> > > >>>>>>> > > > with a newer version would be required. That said, if we
> do
> > > >>>>>>> care about
> > > >>>>>>> > > > this, we can just change the "supportXXX()" method in the
> > > >>>>>>> Source
> > > >>>>>>> > > interface
> > > >>>>>>> > > > to use decorative interfaces, and leave the other parts
> > > >>>>>>> unchanged.
> > > >>>>>>> > > >
> > > >>>>>>> > > > Thanks,
> > > >>>>>>> > > >
> > > >>>>>>> > > > Jiangjie (Becket) Qin
> > > >>>>>>> > > >
> > > >>>>>>> > > >
> > > >>>>>>> > > >
> > > >>>>>>> > > >
> > > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
> > > >>>>>>> pnowojski@apache.org>
> > > >>>>>>> > > > wrote:
> > > >>>>>>> > > >
> > > >>>>>>> > > >> Hi,
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for the
> last
> > > two
> > > >>>>>>> weeks.
> > > >>>>>>> > > >> Regarding the interfaces, I would be actually against
> > adding
> > > >>>>>>> those
> > > >>>>>>> > > methods
> > > >>>>>>> > > >> to the base interfaces for the reasons mentioned above.
> > > >>>>>>> Clogging the
> > > >>>>>>> > > base
> > > >>>>>>> > > >> interface for new users with tons of methods that they
> do
> > > not
> > > >>>>>>> need, do
> > > >>>>>>> > > not
> > > >>>>>>> > > >> understand and do not know what to do with them.
> Moreover,
> > > >>>>>>> such
> > > >>>>>>> > > decorative
> > > >>>>>>> > > >> interfaces are solving a problem if a feature requires
> two
> > > or
> > > >>>>>>> more
> > > >>>>>>> > > methods
> > > >>>>>>> > > >> to be implemented at the same time. If we have all of
> the
> > > >>>>>>> methods with
> > > >>>>>>> > > >> default implementation in the base interface, the API
> > > doesn't
> > > >>>>>>> give any
> > > >>>>>>> > > >> clue
> > > >>>>>>> > > >> to the user which set of methods are required to be
> > > >>>>>>> implemented at the
> > > >>>>>>> > > >> same
> > > >>>>>>> > > >> time.
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> > a) I feel the biggest drawback of decorative
> interfaces
> > is
> > > >>>>>>> which
> > > >>>>>>> > > >> interface
> > > >>>>>>> > > >> > they can decorate and which combinations of multiple
> > > >>>>>>> decorative
> > > >>>>>>> > > >> interfaces
> > > >>>>>>> > > >> > are valid. (...)
> > > >>>>>>> > > >> > In the future, if there is a new feature added
> > > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we
> > going
> > > >>>>>>> to create
> > > >>>>>>> > > >> another
> > > >>>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
> > > >>>>>>> > > >> PrePartitionedAware?
> > > >>>>>>> > > >> > Can they be combined? So I think the additional
> > decorative
> > > >>>>>>> interface
> > > >>>>>>> > > >> like
> > > >>>>>>> > > >> > withSplitsAlignment actually increases the
> understanding
> > > >>>>>>> cost of
> > > >>>>>>> > users
> > > >>>>>>> > > >> > because they have to know what decorative interfaces
> are
> > > >>>>>>> there,
> > > >>>>>>> > which
> > > >>>>>>> > > >> > interface they can decorate and which combinations of
> > the
> > > >>>>>>> decorative
> > > >>>>>>> > > >> > interfaces are valid and which are not. Ideally we
> want
> > to
> > > >>>>>>> avoid
> > > >>>>>>> > that.
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> I'm not sure if I understand how embedding default
> methods
> > > in
> > > >>>>>>> the base
> > > >>>>>>> > > >> interface is solving the problem: what can be combined
> or
> > > >>>>>>> not? If
> > > >>>>>>> > there
> > > >>>>>>> > > >> are
> > > >>>>>>> > > >> two conflicting features, having decorative interfaces
> > that
> > > >>>>>>> can not be
> > > >>>>>>> > > >> mixed together actually makes much more sense to me
> rather
> > > >>>>>>> than having
> > > >>>>>>> > > >> them
> > > >>>>>>> > > >> all in one base class. How would you allow users to
> > > implement
> > > >>>>>>> only one
> > > >>>>>>> > > of
> > > >>>>>>> > > >> those two features?
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
> > > >>>>>>> > > >> - how can a user discover what decorators are there?
> > > >>>>>>> > > >> - how can a user know where the decorator can be applied
> > to?
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> However those are drawbacks for more power users, that
> can
> > > be
> > > >>>>>>> > mitigated
> > > >>>>>>> > > by
> > > >>>>>>> > > >> the documentation. For example listing all of the
> > decorators
> > > >>>>>>> with
> > > >>>>>>> > > >> detailed explanation both in the docs and in the java
> > docs.
> > > >>>>>>> More
> > > >>>>>>> > > >> experienced users will be able to deal with those issues
> > > >>>>>>> easier, as
> > > >>>>>>> > they
> > > >>>>>>> > > >> will already have some basic understanding of Flink.
> Also
> > if
> > > >>>>>>> user has
> > > >>>>>>> > a
> > > >>>>>>> > > >> problem that he wants to solve, he will google search a
> > > >>>>>>> potential
> > > >>>>>>> > > solution
> > > >>>>>>> > > >> to his problem anyway, and while doing that he is very
> > > likely
> > > >>>>>>> to
> > > >>>>>>> > > discover
> > > >>>>>>> > > >> the decorator that he needs anyway in the docs.
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> The advantages are:
> > > >>>>>>> > > >> - clean and easy to implement base interface
> > > >>>>>>> > > >> - all of the methods from a single feature are grouped
> in
> > a
> > > >>>>>>> single
> > > >>>>>>> > > >> decorator interface, together with their dedicated java
> > doc.
> > > >>>>>>> It's also
> > > >>>>>>> > > >> easier to google search for help using the decorator
> name
> > > >>>>>>> > > >> - if an optional feature requires two methods to be
> > > >>>>>>> implemented at
> > > >>>>>>> > once,
> > > >>>>>>> > > >> decorator can guarantee that
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> Here I would strongly lean towards making life easier
> for
> > > new
> > > >>>>>>> users,
> > > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
> > for
> > > >>>>>>> the power
> > > >>>>>>> > > >> users.
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> Best,
> > > >>>>>>> > > >> Piotrek
> > > >>>>>>> > > >>
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <
> becket.qin@gmail.com
> > >
> > > >>>>>>> > napisał(a):
> > > >>>>>>> > > >>
> > > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > I think Sebastion has a good summary. This is a really
> > > >>>>>>> helpful
> > > >>>>>>> > > >> discussion.
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > Thinking a bit more, I feel that it might still be
> > better
> > > >>>>>>> to add the
> > > >>>>>>> > > >> > supportsXXX() method in the Source rather than
> > > SourceReader.
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > Generally speaking, what we are trying to do here is
> to
> > > let
> > > >>>>>>> the
> > > >>>>>>> > Flink
> > > >>>>>>> > > >> > framework know what the Source is capable of. In this
> > > FLIP,
> > > >>>>>>> it
> > > >>>>>>> > happens
> > > >>>>>>> > > >> to
> > > >>>>>>> > > >> > be the capability that only involves SourceReader. But
> > in
> > > >>>>>>> the
> > > >>>>>>> > future,
> > > >>>>>>> > > >> it is
> > > >>>>>>> > > >> > possible that another functionality involves both the
> > > >>>>>>> > SplitEnumerator
> > > >>>>>>> > > >> and
> > > >>>>>>> > > >> > SourceReader. In that case, following the current
> > > approach,
> > > >>>>>>> we
> > > >>>>>>> > should
> > > >>>>>>> > > >> put
> > > >>>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
> > > >>>>>>> SourceReader.
> > > >>>>>>> > > >> > Because if we only put this in the SourceReader, then
> > the
> > > >>>>>>> JM would
> > > >>>>>>> > > have
> > > >>>>>>> > > >> to
> > > >>>>>>> > > >> > create a SourceReader in order to know whether this
> > > feature
> > > >>>>>>> is
> > > >>>>>>> > > >> supported,
> > > >>>>>>> > > >> > which is a little ugly. But if we put the
> > "supportsXXX()"
> > > >>>>>>> method in
> > > >>>>>>> > > the
> > > >>>>>>> > > >> > Source, we will break the "symmetric" design because
> > this
> > > >>>>>>> FLIP
> > > >>>>>>> > chose a
> > > >>>>>>> > > >> > different way.
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > This is also why I think supportsXXX() method seems a
> > good
> > > >>>>>>> thing to
> > > >>>>>>> > > >> have,
> > > >>>>>>> > > >> > because when there are a few interfaces / methods that
> > are
> > > >>>>>>> expected
> > > >>>>>>> > to
> > > >>>>>>> > > >> be
> > > >>>>>>> > > >> > implemented at the same time in order to deliver a
> > > feature,
> > > >>>>>>> it is
> > > >>>>>>> > > always
> > > >>>>>>> > > >> > good to have a single source of truth to tell the
> > > framework
> > > >>>>>>> what to
> > > >>>>>>> > > do,
> > > >>>>>>> > > >> so
> > > >>>>>>> > > >> > the framework can do consistent things in different
> > parts.
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > Regarding interface flavor b), i.e.
> AlignedSourceReader
> > +
> > > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is that
> we
> > > are
> > > >>>>>>> > > essentially
> > > >>>>>>> > > >> > expecting almost all the SourceReader implementations
> to
> > > >>>>>>> extend
> > > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
> > SourceReader
> > > >>>>>>> interface
> > > >>>>>>> > > >> > without the pausing support useless. So this indicates
> > > that
> > > >>>>>>> public
> > > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
> > > >>>>>>> purpose) only
> > > >>>>>>> > > >> > make sense if the original interface is also expected
> to
> > > be
> > > >>>>>>> used.
> > > >>>>>>> > > >> > Otherwise, it seems makes more sense to add the method
> > to
> > > >>>>>>> the
> > > >>>>>>> > original
> > > >>>>>>> > > >> > interface itself.
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > Cheers,
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>>>>> > > >> > wrote:
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > I think most of your points aligned with the
> > suggestions
> > > >>>>>>> I made to
> > > >>>>>>> > > the
> > > >>>>>>> > > >> > > FLIP, while you were writing your reply (I believe
> we
> > > hit
> > > >>>>>>> enter
> > > >>>>>>> > > >> nearly at
> > > >>>>>>> > > >> > > the same time ;) )
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Two points after we synced offline
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > 1. I changed also the
> supportsWatermarksSplitAlignment
> > > to
> > > >>>>>>> > > >> > > supportsPausingSplits to express the general
> > capability
> > > of
> > > >>>>>>> > pausing.
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > 2. As for if we should
> > > >>>>>>> PausingSourceReader/PausingSplitReader
> > > >>>>>>> > > (option
> > > >>>>>>> > > >> b)
> > > >>>>>>> > > >> > > or if we should just add the methods (option c), I
> > > >>>>>>> suggest to
> > > >>>>>>> > simply
> > > >>>>>>> > > >> add
> > > >>>>>>> > > >> > > the two methods as I felt this is much preferred
> > > approach
> > > >>>>>>> Becket,
> > > >>>>>>> > > >> which
> > > >>>>>>> > > >> > > others do not object. Unless there is an opposition
> > > let's
> > > >>>>>>> go with
> > > >>>>>>> > > this
> > > >>>>>>> > > >> > > option c.
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Best,
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Dawid
> > > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Hi folks,
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all
> the
> > > >>>>>>> input and
> > > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time to
> > > >>>>>>> catch up and
> > > >>>>>>> > > >> > structure
> > > >>>>>>> > > >> > > (for myself) the discussion and I wanted to find a
> way
> > > to
> > > >>>>>>> > structure
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > > conclusions. (Also because I had the feeling that
> some
> > > >>>>>>> concerns
> > > >>>>>>> > got
> > > >>>>>>> > > >> lost
> > > >>>>>>> > > >> > in
> > > >>>>>>> > > >> > > the discussion.) This is my attempt and please
> correct
> > > me
> > > >>>>>>> if
> > > >>>>>>> > > >> something is
> > > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect and
> > assemble
> > > >>>>>>> the
> > > >>>>>>> > > opinions,
> > > >>>>>>> > > >> > > suggestions, and conclusions (to the best of my
> > > >>>>>>> knowledge):
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > # Top A: Should split alignment (pause/resume
> > behavior)
> > > >>>>>>> be a
> > > >>>>>>> > general
> > > >>>>>>> > > >> > > capability?
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > I personally don't see any reason no to have it a
> > > general
> > > >>>>>>> > capability
> > > >>>>>>> > > >> > > because for the alignSplit method it is actually
> > > >>>>>>> independent of
> > > >>>>>>> > the
> > > >>>>>>> > > >> > > watermarks. If we agree here to have it a general
> > > >>>>>>> capability, we
> > > >>>>>>> > > >> should
> > > >>>>>>> > > >> > > also agree on the right wording. Does
> > > >>>>>>> "alignSplits(splitsToResume,
> > > >>>>>>> > > >> > > splitsToPause)" refer to what is then actually
> meant?
> > (I
> > > >>>>>>> see it as
> > > >>>>>>> > > >> okay.
> > > >>>>>>> > > >> > I
> > > >>>>>>> > > >> > > don't have any better idea whilst Arvid suggested
> > > >>>>>>> > > >> "pauseOrResumeSplits".)
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > # Top B: Should it be possible do enable/disable
> split
> > > >>>>>>> alignment?
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > I would personally not disable the split alignment
> on
> > > the
> > > >>>>>>> source
> > > >>>>>>> > > >> reader
> > > >>>>>>> > > >> > > side because if split alignment is used for some
> other
> > > >>>>>>> use case
> > > >>>>>>> > (see
> > > >>>>>>> > > >> A)
> > > >>>>>>> > > >> > it
> > > >>>>>>> > > >> > > could have nasty side effects on other/future use
> > cases.
> > > >>>>>>> Instead,
> > > >>>>>>> > I
> > > >>>>>>> > > >> would
> > > >>>>>>> > > >> > > disable "watermark split alignment" where I think it
> > > >>>>>>> should
> > > >>>>>>> > disable
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > > watermark-dependent trigger for split alignment.
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > I find it difficult to define the scope of a
> supportsX
> > > >>>>>>> method
> > > >>>>>>> > w.r.t.
> > > >>>>>>> > > >> to
> > > >>>>>>> > > >> > > the following questions: a) Where is it used? and b)
> > > What
> > > >>>>>>> is the
> > > >>>>>>> > > >> expected
> > > >>>>>>> > > >> > > output? To b), it's not straight-forward to provide
> a
> > > >>>>>>> meaningful
> > > >>>>>>> > > >> output,
> > > >>>>>>> > > >> > > e.g., if SourceReader supports split alignment but
> > > >>>>>>> SplitReader
> > > >>>>>>> > not.
> > > >>>>>>> > > >> This
> > > >>>>>>> > > >> > is
> > > >>>>>>> > > >> > > because with the current implementation, we can
> > > determine
> > > >>>>>>> whether
> > > >>>>>>> > > >> split
> > > >>>>>>> > > >> > > alignment is fully supported only during runtime and
> > > >>>>>>> specifically
> > > >>>>>>> > > >> > actually
> > > >>>>>>> > > >> > > only when calling alignSplits down the call
> hierarchy
> > up
> > > >>>>>>> to the
> > > >>>>>>> > > actual
> > > >>>>>>> > > >> > > SplitReaders.
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Therefore, I would suggest to either raise an error
> or
> > > >>>>>>> warning if
> > > >>>>>>> > > the
> > > >>>>>>> > > >> > > alignment is called but not supported at some
> point. I
> > > >>>>>>> know we
> > > >>>>>>> > > should
> > > >>>>>>> > > >> > > carefully think about when this could be the case
> > > because
> > > >>>>>>> we don't
> > > >>>>>>> > > >> want
> > > >>>>>>> > > >> > to
> > > >>>>>>> > > >> > > flood anybody with such warnings. However, warnings
> > > could
> > > >>>>>>> be an
> > > >>>>>>> > > >> indicator
> > > >>>>>>> > > >> > > for the user that for watermark split alignment use
> > case
> > > >>>>>>> split
> > > >>>>>>> > > >> reading is
> > > >>>>>>> > > >> > > imbalanced with the conclusion to either disable the
> > > >>>>>>> trigger for
> > > >>>>>>> > > >> > watermark
> > > >>>>>>> > > >> > > split alignment (see Top B) or to use/implement a
> > source
> > > >>>>>>> and
> > > >>>>>>> > reader
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > > fully supports split alignment.
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > # Top D: How to design interfaces?
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Thanks for structuring the discussion with the the
> > > various
> > > >>>>>>> > > >> possibilities
> > > >>>>>>> > > >> > > (a-d). From the discussion and emails, I would like
> to
> > > >>>>>>> summarize
> > > >>>>>>> > the
> > > >>>>>>> > > >> > > following requirements:
> > > >>>>>>> > > >> > > - Interfaces should be consistent ("symmetric"),
> i.e.,
> > > >>>>>>> similar
> > > >>>>>>> > > >> semantics
> > > >>>>>>> > > >> > > should have similar interfaces with similar usage.
> > > >>>>>>> > > >> > > - Make explicit which implementations implement
> > > >>>>>>> interfaces/support
> > > >>>>>>> > > >> > > behavior.
> > > >>>>>>> > > >> > > - Make clear what are default implementations and
> how
> > to
> > > >>>>>>> implement
> > > >>>>>>> > > >> > > interfaces with desired behavior.
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > This is a simplified view of the relations between
> > > >>>>>>> relevant
> > > >>>>>>> > classes
> > > >>>>>>> > > of
> > > >>>>>>> > > >> > the
> > > >>>>>>> > > >> > > PoC implementation:
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase
> (Internal)
> > > >>>>>>> <|-- ..
> > > >>>>>>> > <|--
> > > >>>>>>> > > >> > > MySourceReader
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal)
> > <>--
> > > >>>>>>> > SplitFetcher
> > > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
> > MySplitReader
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
> > > >>>>>>> implement most
> > > >>>>>>> > of
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > > "logic" for split alignment just because we wanted
> to
> > > >>>>>>> implement
> > > >>>>>>> > > split
> > > >>>>>>> > > >> > > alignment and wanted it to be available as kind of a
> > > >>>>>>> default. As a
> > > >>>>>>> > > >> > > consequence, we have a "default implementation" for
> > > >>>>>>> SourceReader
> > > >>>>>>> > > that
> > > >>>>>>> > > >> > > implements the actual logic for split alignment. For
> > > that
> > > >>>>>>> reason,
> > > >>>>>>> > I
> > > >>>>>>> > > >> find
> > > >>>>>>> > > >> > it
> > > >>>>>>> > > >> > > very confusing to have a NOOP default implementation
> > in
> > > >>>>>>> the
> > > >>>>>>> > > interface
> > > >>>>>>> > > >> for
> > > >>>>>>> > > >> > > the SourceReader. As a consequence, interface
> strategy
> > > c)
> > > >>>>>>> is
> > > >>>>>>> > > difficult
> > > >>>>>>> > > >> > > because this would require NOOP default
> > implementations
> > > >>>>>>> in the
> > > >>>>>>> > > public
> > > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is
> > the
> > > >>>>>>> same for
> > > >>>>>>> > > >> strategy
> > > >>>>>>> > > >> > > d) because it would require NOOP default
> > implementation
> > > >>>>>>> in the
> > > >>>>>>> > > >> > > SourceReader. Further, as Dawid described method
> > > >>>>>>> signatures of
> > > >>>>>>> > > >> alignSplit
> > > >>>>>>> > > >> > > for SourceReader and SplitReader differ and it would
> > be
> > > >>>>>>> extremely
> > > >>>>>>> > > >> > difficult
> > > >>>>>>> > > >> > > to make the signatures the same (with even potential
> > > >>>>>>> performance
> > > >>>>>>> > > >> impact
> > > >>>>>>> > > >> > > because of additional loop-ups of split ids).
> > Therefore,
> > > >>>>>>> having a
> > > >>>>>>> > > >> > symmetric
> > > >>>>>>> > > >> > > decorative interface as of strategy a) is actually
> not
> > > >>>>>>> possible
> > > >>>>>>> > and
> > > >>>>>>> > > >> > having
> > > >>>>>>> > > >> > > two decorative interfaces with different method
> > > >>>>>>> signatures is
> > > >>>>>>> > > >> confusing.
> > > >>>>>>> > > >> > My
> > > >>>>>>> > > >> > > conclusion is that we are best with strategy b)
> which
> > > >>>>>>> means to
> > > >>>>>>> > have
> > > >>>>>>> > > >> > > specializing sub-interfaces that inherit from the
> > parent
> > > >>>>>>> > interface:
> > > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader
> > <|--
> > > >>>>>>> > > >> > AlignedSplitReader
> > > >>>>>>> > > >> > > With this option, I'm not 100% sure what the
> > > implications
> > > >>>>>>> are and
> > > >>>>>>> > if
> > > >>>>>>> > > >> this
> > > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid and I
> just
> > > >>>>>>> try to
> > > >>>>>>> > > >> implement
> > > >>>>>>> > > >> > and
> > > >>>>>>> > > >> > > see if we like it. :)
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > # Summary
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > In conclusion, please let me know your perspectives.
> > > >>>>>>> Please
> > > >>>>>>> > correct
> > > >>>>>>> > > >> me,
> > > >>>>>>> > > >> > if
> > > >>>>>>> > > >> > > something is wrong or if I misunderstood something.
> My
> > > >>>>>>> perspective
> > > >>>>>>> > > >> would
> > > >>>>>>> > > >> > be:
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Top A: Yes
> > > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
> > > >>>>>>> alignment)
> > > >>>>>>> > > >> > > Top C: No
> > > >>>>>>> > > >> > > Top D: b)
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > Best,
> > > >>>>>>> > > >> > > Sebastian
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> > > >>>>>>> > > >> dwysakowicz@apache.org
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > > wrote:
> > > >>>>>>> > > >> > >
> > > >>>>>>> > > >> > >> @Arvid:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
> > fear
> > > >>>>>>> that it
> > > >>>>>>> > > >> doesn't
> > > >>>>>>> > > >> > work
> > > >>>>>>> > > >> > >> for this particular use case: Sources can always be
> > > >>>>>>> aligned
> > > >>>>>>> > > >> cross-task
> > > >>>>>>> > > >> > and
> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> > > >>>>>>> plausible to put
> > > >>>>>>> > > >> > sources
> > > >>>>>>> > > >> > >> into an alignment group even though they do not use
> > any
> > > >>>>>>> of the
> > > >>>>>>> > > >> presented
> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning,
> if
> > > >>>>>>> they handle
> > > >>>>>>> > > >> > multiple
> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for
> telling
> > > if
> > > >>>>>>> it
> > > >>>>>>> > > supports
> > > >>>>>>> > > >> > that
> > > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment would
> > always
> > > >>>>>>> be
> > > >>>>>>> > > supported.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I updated interfaces to what I believe to be
> closest
> > > to a
> > > >>>>>>> > consensus
> > > >>>>>>> > > >> > >> between all participants. Do you mind taking a
> look?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Dawid
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree that
> > > >>>>>>> decorator
> > > >>>>>>> > > >> interfaces
> > > >>>>>>> > > >> > >> should be avoided but I'm also not a big fan of
> > > >>>>>>> overloading the
> > > >>>>>>> > > base
> > > >>>>>>> > > >> > >> interfaces (they are hard to implement as is). The
> > > usual
> > > >>>>>>> feedback
> > > >>>>>>> > > to
> > > >>>>>>> > > >> > >> Source-related interfaces are always that they are
> > > >>>>>>> overwhelming
> > > >>>>>>> > and
> > > >>>>>>> > > >> too
> > > >>>>>>> > > >> > >> hard to implement. However, I'd also not oppose 1c)
> > as
> > > >>>>>>> scattered
> > > >>>>>>> > > >> > interfaces
> > > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> > > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
> > fear
> > > >>>>>>> that it
> > > >>>>>>> > > >> doesn't
> > > >>>>>>> > > >> > work
> > > >>>>>>> > > >> > >> for this particular use case: Sources can always be
> > > >>>>>>> aligned
> > > >>>>>>> > > >> cross-task
> > > >>>>>>> > > >> > and
> > > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> > > >>>>>>> plausible to put
> > > >>>>>>> > > >> > sources
> > > >>>>>>> > > >> > >> into an alignment group even though they do not use
> > any
> > > >>>>>>> of the
> > > >>>>>>> > > >> presented
> > > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning,
> if
> > > >>>>>>> they handle
> > > >>>>>>> > > >> > multiple
> > > >>>>>>> > > >> > >> splits (see motivation section).
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate future
> use
> > > >>>>>>> cases makes
> > > >>>>>>> > > >> sense
> > > >>>>>>> > > >> > but
> > > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be
> > > adjusted
> > > >>>>>>> > > >> accordingly.
> > > >>>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader
> and
> > > I'd
> > > >>>>>>> go for
> > > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We
> > could
> > > >>>>>>> also
> > > >>>>>>> > split
> > > >>>>>>> > > it
> > > >>>>>>> > > >> > into
> > > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While
> pauseOrResumeSplits
> > > >>>>>>> may allow
> > > >>>>>>> > > >> Sources
> > > >>>>>>> > > >> > to
> > > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as written
> in
> > > the
> > > >>>>>>> > Javadoc),
> > > >>>>>>> > > >> both
> > > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if
> > there
> > > >>>>>>> is a
> > > >>>>>>> > system
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >> can.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Some nit for the FLIP:
> > > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> > > >>>>>>> > > >> > >> - Not sure if it's worth it in the capability
> > section:
> > > >>>>>>> Sources
> > > >>>>>>> > that
> > > >>>>>>> > > >> > adopt
> > > >>>>>>> > > >> > >> this interface cannot be used in earlier versions.
> So
> > > it
> > > >>>>>>> feels
> > > >>>>>>> > like
> > > >>>>>>> > > >> we
> > > >>>>>>> > > >> > are
> > > >>>>>>> > > >> > >> only forward compatible (old sources can be used
> > after
> > > >>>>>>> the
> > > >>>>>>> > change);
> > > >>>>>>> > > >> but
> > > >>>>>>> > > >> > I
> > > >>>>>>> > > >> > >> guess this holds for any API addition.
> > > >>>>>>> > > >> > >> - You might want to add what happens when all
> splits
> > > are
> > > >>>>>>> paused.
> > > >>>>>>> > > >> > >> - You may want to describe how the 3 flavors of
> > > >>>>>>> SourceReaderBase
> > > >>>>>>> > > >> > interact
> > > >>>>>>> > > >> > >> with the interface.
> > > >>>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka
> and
> > > >>>>>>> Pulsar in
> > > >>>>>>> > the
> > > >>>>>>> > > >> > FLIP.
> > > >>>>>>> > > >> > >> For me, this is rather immediate follow-up work.
> > (could
> > > >>>>>>> be in the
> > > >>>>>>> > > >> same
> > > >>>>>>> > > >> > >> umbrella ticket)
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Arvid
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > > >>>>>>> > > >> > >> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > > >>>>>>> WithSplitsAlignment",
> > > >>>>>>> > > >> along
> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > > >>>>>>> WithSplitsAlignment",
> > > >>>>>>> > > or
> > > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
> > and
> > > >>>>>>> > > "MySplitReader
> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> > > >>>>>>> "MySplitReader
> > > >>>>>>> > > >> > implements
> > > >>>>>>> > > >> > >> SplitReader".
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
> would
> > > be:
> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> > > >>>>>>> "MySplitReader
> > > >>>>>>> > > >> > implements
> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> > speaking I
> > > >>>>>>> think it
> > > >>>>>>> > is
> > > >>>>>>> > > >> > simpler
> > > >>>>>>> > > >> > >> to just add a new method to the split reader with
> > > >>>>>>> default impl.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I think that is a good idea to have it aligned as
> > much
> > > as
> > > >>>>>>> > possible.
> > > >>>>>>> > > >> I'd
> > > >>>>>>> > > >> > be
> > > >>>>>>> > > >> > >> +1 for your option c). We can merge
> > AlignedSplitReader
> > > >>>>>>> with
> > > >>>>>>> > > >> > SplitReader. We
> > > >>>>>>> > > >> > >> will update the FLIP shortly.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Dawid
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Can the interface be used to align other things in
> > the
> > > >>>>>>> future?
> > > >>>>>>> > For
> > > >>>>>>> > > >> > example,
> > > >>>>>>> > > >> > >> align read speed, I have
> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> > feature
> > > >>>>>>> may also
> > > >>>>>>> > > need
> > > >>>>>>> > > >> an
> > > >>>>>>> > > >> > >> interface like this.
> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> support
> > > >>>>>>> align other
> > > >>>>>>> > > >> > things, I
> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >>>>>>> `alignWatermarksForSplits`
> > > >>>>>>> > > >> instead
> > > >>>>>>> > > >> > of
> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> This is a good point. Naming wise, it would usually
> > be
> > > >>>>>>> more
> > > >>>>>>> > > >> extensible
> > > >>>>>>> > > >> > to
> > > >>>>>>> > > >> > >> just describe what the method actually does,
> instead
> > of
> > > >>>>>>> assuming
> > > >>>>>>> > > the
> > > >>>>>>> > > >> > >> purpose of doing this. For example, in this case,
> > > >>>>>>> > > >> pauseOrResumeSplits()
> > > >>>>>>> > > >> > >> would be more extensible because this can be used
> for
> > > >>>>>>> any kind of
> > > >>>>>>> > > >> flow
> > > >>>>>>> > > >> > >> control, be it watermark alignment or simple rate
> > > >>>>>>> limiting.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think they
> > > have
> > > >>>>>>> their
> > > >>>>>>> > own
> > > >>>>>>> > > >> > >> advantages.
> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> extending
> > > >>>>>>> abilities
> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
> > > >>>>>>> case. When
> > > >>>>>>> > you
> > > >>>>>>> > > >> have
> > > >>>>>>> > > >> > a
> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> > > >>>>>>> organize them
> > > >>>>>>> > and
> > > >>>>>>> > > >> make
> > > >>>>>>> > > >> > >> users clear which methods
> > > >>>>>>> > > >> > >> need to implement when you want to have an ability.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I am OK with decorative interfaces if this is a
> > general
> > > >>>>>>> design
> > > >>>>>>> > > >> pattern
> > > >>>>>>> > > >> > in
> > > >>>>>>> > > >> > >> the other components in Flink. But it looks like
> the
> > > >>>>>>> current API
> > > >>>>>>> > > >> > proposal
> > > >>>>>>> > > >> > >> is not symmetric.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> The current proposal is essentially "MySourceReader
> > > >>>>>>> implements
> > > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
> > > >>>>>>> "MySplitReader
> > > >>>>>>> > > >> implements
> > > >>>>>>> > > >> > >> AlignedSplitsReader".
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Should we make the API symmetric? I'd consider any
> > one
> > > >>>>>>> of the
> > > >>>>>>> > > >> following
> > > >>>>>>> > > >> > as
> > > >>>>>>> > > >> > >> symmetric.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > > >>>>>>> WithSplitsAlignment",
> > > >>>>>>> > > >> along
> > > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > > >>>>>>> WithSplitsAlignment",
> > > >>>>>>> > > or
> > > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
> > and
> > > >>>>>>> > > "MySplitReader
> > > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> > > >>>>>>> "MySplitReader
> > > >>>>>>> > > >> > implements
> > > >>>>>>> > > >> > >> SplitReader".
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I think the latest proposal according to Dawid
> would
> > > be:
> > > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> > > >>>>>>> "MySplitReader
> > > >>>>>>> > > >> > implements
> > > >>>>>>> > > >> > >> AlignedSplitReader".
> > > >>>>>>> > > >> > >> I am fine with this API, although personally
> > speaking I
> > > >>>>>>> think it
> > > >>>>>>> > is
> > > >>>>>>> > > >> > simpler
> > > >>>>>>> > > >> > >> to just add a new method to the split reader with
> > > >>>>>>> default impl.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
> > > >>>>>>> > dwysakowicz@apache.org
> > > >>>>>>> > > >
> > > >>>>>>> > > >> <
> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>,
> > thanks
> > > >>>>>>> for the
> > > >>>>>>> > > reply.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference and
> I
> > > >>>>>>> agree most
> > > >>>>>>> > of
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> > following
> > > >>>>>>> your
> > > >>>>>>> > > >> suggestion
> > > >>>>>>> > > >> > to
> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > >>>>>>> > > >> but
> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
> Source
> > to
> > > >>>>>>> keep the
> > > >>>>>>> > > two
> > > >>>>>>> > > >> > >> methods together.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source
> is
> > > >>>>>>> that this
> > > >>>>>>> > > allows
> > > >>>>>>> > > >> > some
> > > >>>>>>> > > >> > >> compile time check. For example, if a user enabled
> > > >>>>>>> watermark
> > > >>>>>>> > > >> alignment
> > > >>>>>>> > > >> > >> while it is not supported by the Source, an
> exception
> > > >>>>>>> can be
> > > >>>>>>> > thrown
> > > >>>>>>> > > >> at
> > > >>>>>>> > > >> > >> compile time. It seems in general useful. That
> said,
> > I
> > > >>>>>>> agree that
> > > >>>>>>> > > API
> > > >>>>>>> > > >> > >> cleanliness wise it is better to put the two
> methods
> > > >>>>>>> together.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
> > > >>>>>>> imjark@gmail.com> <
> > > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <
> imjark@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I
> did
> > > >>>>>>> miss it.
> > > >>>>>>> > > >> > >> I don't have other concerns then.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >> Jark
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>>>>> > > >> > >> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Jark:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
> > when
> > > >>>>>>> the
> > > >>>>>>> > source
> > > >>>>>>> > > >> > >> implements the interface?
> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment
> > > even
> > > >>>>>>> if Kafka
> > > >>>>>>> > > >> > >> implements the interface,
> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
> agree
> > > >>>>>>> with Becket
> > > >>>>>>> > > >> > >> we may need a
> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> > > configure
> > > >>>>>>> the
> > > >>>>>>> > source
> > > >>>>>>> > > >> to
> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > > maxDesiredWatermark?
> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> > > >>>>>>> greatly affect
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> throughput
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> if the reader is constantly
> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> > > configure
> > > >>>>>>> the
> > > >>>>>>> > > >> alignment
> > > >>>>>>> > > >> > >> offset?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> This is covered in the previous FLIP[1] which has
> > been
> > > >>>>>>> already
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> implemented
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the
> > > watermark
> > > >>>>>>> strategy
> > > >>>>>>> > > >> which
> > > >>>>>>> > > >> > >> also configures drift and update interval.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> support
> > > >>>>>>> align other
> > > >>>>>>> > > >> things,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >>>>>>> `alignWatermarksForSplits`
> > > >>>>>>> > > >> instead
> > > >>>>>>> > > >> > of
> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Sure let's rename it.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Becket:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I understand your point. On the other hand putting
> > all
> > > >>>>>>> methods,
> > > >>>>>>> > > even
> > > >>>>>>> > > >> > with
> > > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain
> features,
> > > >>>>>>> makes the
> > > >>>>>>> > > entry
> > > >>>>>>> > > >> > >> threshold for writing a new source higher. Instead
> of
> > > >>>>>>> focusing on
> > > >>>>>>> > > the
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> basic
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> and required properties of the Source, the person
> > > >>>>>>> implementing a
> > > >>>>>>> > > >> source
> > > >>>>>>> > > >> > >> must bother with and need to figure out what all of
> > the
> > > >>>>>>> extra
> > > >>>>>>> > > >> features
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> are
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> about and how to deal with them. It makes it also
> > > harder
> > > >>>>>>> to
> > > >>>>>>> > > organize
> > > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Having said that, as I don't have a preference and
> I
> > > >>>>>>> agree most
> > > >>>>>>> > of
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > >> sources will support the alignment I am fine
> > following
> > > >>>>>>> your
> > > >>>>>>> > > >> suggestion
> > > >>>>>>> > > >> > to
> > > >>>>>>> > > >> > >> have the SourceReader extending from
> > > >>>>>>> > WithWatermarksSplitsAlignment,
> > > >>>>>>> > > >> but
> > > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the
> Source
> > to
> > > >>>>>>> keep the
> > > >>>>>>> > > two
> > > >>>>>>> > > >> > >> methods together.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
> > > >>>>>>> "alignSplits"
> > > >>>>>>> > methods
> > > >>>>>>> > > >> > differ
> > > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The
> reason
> > > for
> > > >>>>>>> that is
> > > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
> > > >>>>>>> SplitReader
> > > >>>>>>> > > needs
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > >> actual splits to pause them. I found the
> discrepancy
> > > >>>>>>> acceptable
> > > >>>>>>> > for
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > >> sake of simplifying changes significantly,
> especially
> > > as
> > > >>>>>>> they
> > > >>>>>>> > would
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> highly
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> likely impact performance as we would have to
> perform
> > > >>>>>>> additional
> > > >>>>>>> > > >> > lookups.
> > > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Dawid
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I just have some minor questions (maybe I missed
> > > >>>>>>> something).
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
> > when
> > > >>>>>>> the
> > > >>>>>>> > source
> > > >>>>>>> > > >> > >> implements the interface?
> > > >>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment
> > > even
> > > >>>>>>> if Kafka
> > > >>>>>>> > > >> > >> implements the interface,
> > > >>>>>>> > > >> > >> and this will affect the throughput somehow. I
> agree
> > > >>>>>>> with Becket
> > > >>>>>>> > > >> > >> we may need a
> > > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> > > configure
> > > >>>>>>> the
> > > >>>>>>> > source
> > > >>>>>>> > > >> to
> > > >>>>>>> > > >> > >> enable/disable the alignment.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 2. How does the framework calculate
> > > maxDesiredWatermark?
> > > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> > > >>>>>>> greatly affect
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> throughput
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> if the reader is constantly
> > > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> > > configure
> > > >>>>>>> the
> > > >>>>>>> > > >> alignment
> > > >>>>>>> > > >> > >> offset?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > > >>>>>>> > > >> > >> Can the interface be used to align other things in
> > the
> > > >>>>>>> future?
> > > >>>>>>> > For
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> example,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> align read speed, I have
> > > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> > feature
> > > >>>>>>> may also
> > > >>>>>>> > > need
> > > >>>>>>> > > >> an
> > > >>>>>>> > > >> > >> interface like this.
> > > >>>>>>> > > >> > >> If we don't plan to extend this interface to
> support
> > > >>>>>>> align other
> > > >>>>>>> > > >> things,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> suggest explicitly declaring
> > > >>>>>>> > > >> > >> the purpose of the methods, such as
> > > >>>>>>> `alignWatermarksForSplits`
> > > >>>>>>> > > >> instead
> > > >>>>>>> > > >> > of
> > > >>>>>>> > > >> > >> `alignSplits`.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 4. Interface or Method.
> > > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think they
> > > have
> > > >>>>>>> their
> > > >>>>>>> > own
> > > >>>>>>> > > >> > >> advantages.
> > > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for
> extending
> > > >>>>>>> abilities
> > > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
> > > >>>>>>> case. When
> > > >>>>>>> > you
> > > >>>>>>> > > >> have
> > > >>>>>>> > > >> > a
> > > >>>>>>> > > >> > >> bunch of abilities and each ability
> > > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> > > >>>>>>> organize them
> > > >>>>>>> > and
> > > >>>>>>> > > >> make
> > > >>>>>>> > > >> > >> users clear which methods
> > > >>>>>>> > > >> > >> need to implement when you want to have an ability.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >> Jark
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
> > > >>>>>>> becket.qin@gmail.com>
> > > >>>>>>> > <
> > > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
> > > >>>>>>> becket.qin@gmail.com>
> > > >>>>>>> > <
> > > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Hi Dawid,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks for the explanation. Apologies that I
> somehow
> > > >>>>>>> misread a
> > > >>>>>>> > > bunch
> > > >>>>>>> > > >> of
> > > >>>>>>> > > >> > >> "align" and thought they were "assign".
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Regarding 1, by default implementation, I was
> > thinking
> > > >>>>>>> of the
> > > >>>>>>> > > default
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> no-op
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> implementation. I am a little worried about the
> > > >>>>>>> proliferation of
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> decorative
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> interfaces. I think the most important thing about
> > > >>>>>>> interfaces is
> > > >>>>>>> > > that
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> they
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> are easy to understand. In this case, I prefer
> adding
> > > >>>>>>> new method
> > > >>>>>>> > to
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > >> existing interface for the following reasons:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> a) I feel the biggest drawback of decorative
> > interfaces
> > > >>>>>>> is which
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> interface
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> they can decorate and which combinations of
> multiple
> > > >>>>>>> decorative
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> interfaces
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> > withSplitsAlignment
> > > >>>>>>> interface
> > > >>>>>>> > > is
> > > >>>>>>> > > >> > only
> > > >>>>>>> > > >> > >> applicable to the SourceReader which means it can't
> > > >>>>>>> decorate any
> > > >>>>>>> > > >> other
> > > >>>>>>> > > >> > >> interface. From an interface design perspective, a
> > > >>>>>>> natural
> > > >>>>>>> > question
> > > >>>>>>> > > >> is
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> why
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> > > >>>>>>> "withSplitsAlignment"? And it
> > > >>>>>>> > > is
> > > >>>>>>> > > >> > also
> > > >>>>>>> > > >> > >> natural to assume that a split reader implementing
> > both
> > > >>>>>>> > SplitReader
> > > >>>>>>> > > >> and
> > > >>>>>>> > > >> > >> WithSplitAlignment would work, because a source
> > reader
> > > >>>>>>> > implementing
> > > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why
> > > isn't
> > > >>>>>>> there an
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> interface
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if there is
> a
> > > new
> > > >>>>>>> feature
> > > >>>>>>> > > >> added
> > > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we
> > > >>>>>>> going to
> > > >>>>>>> > create
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> another
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> interface of SplitReader such as SortedSplitReader
> or
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> PrePartitionedAware?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Can they be combined? So I think the additional
> > > >>>>>>> decorative
> > > >>>>>>> > > interface
> > > >>>>>>> > > >> > like
> > > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
> > > understanding
> > > >>>>>>> cost of
> > > >>>>>>> > > >> users
> > > >>>>>>> > > >> > >> because they have to know what decorative
> interfaces
> > > are
> > > >>>>>>> there,
> > > >>>>>>> > > which
> > > >>>>>>> > > >> > >> interface they can decorate and which combinations
> of
> > > the
> > > >>>>>>> > > decorative
> > > >>>>>>> > > >> > >> interfaces are valid and which are not. Ideally we
> > want
> > > >>>>>>> to avoid
> > > >>>>>>> > > >> that.
> > > >>>>>>> > > >> > To
> > > >>>>>>> > > >> > >> be clear, I am not opposing having an interface of
> > > >>>>>>> > > >> withSplitsAlignment,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> it
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> is completely OK to have it as an internal
> interface
> > > and
> > > >>>>>>> let
> > > >>>>>>> > > >> > SourceReader
> > > >>>>>>> > > >> > >> and SplitReader both extend it.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader with a
> > > default
> > > >>>>>>> > > >> implementation
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> of
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> no-op would help avoid logic branching in the
> source
> > > >>>>>>> logic,
> > > >>>>>>> > > >> especially
> > > >>>>>>> > > >> > >> given that we agree that the vast majority of the
> > > >>>>>>> SourceReader
> > > >>>>>>> > > >> > >> implementations, if not all, would just extend from
> > the
> > > >>>>>>> > > >> > SourceReaderBase.
> > > >>>>>>> > > >> > >> That means adding a new method to the interface
> would
> > > >>>>>>> effectively
> > > >>>>>>> > > >> give
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> the
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> same user experience, but simpler.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> c) A related design principle that may be worth
> > > >>>>>>> discussing is how
> > > >>>>>>> > > do
> > > >>>>>>> > > >> we
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> let
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> the Source implementations tell Flink what
> capability
> > > is
> > > >>>>>>> > supported
> > > >>>>>>> > > >> and
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> what
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> is not. Personally speaking I feel the most
> intuitive
> > > >>>>>>> place to me
> > > >>>>>>> > > is
> > > >>>>>>> > > >> in
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> the
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Source itself, because that is the entrance of the
> > > >>>>>>> entire Source
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> connector
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> logic.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
> > > >>>>>>> following
> > > >>>>>>> > > >> interface
> > > >>>>>>> > > >> > >> would be easier to understand by the users.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal
> interface,
> > > >>>>>>> let both
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> SourceReader
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> and SplitReader extend from it, with a default
> no-op
> > > >>>>>>> > > implementation.
> > > >>>>>>> > > >> > >> - Add a new method "boolean
> supportSplitsAlignment()"
> > > to
> > > >>>>>>> the
> > > >>>>>>> > Source
> > > >>>>>>> > > >> > >> interface, with a default implementation returning
> > > >>>>>>> false. Sources
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> have
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> implemented the alignment logic can change this to
> > > >>>>>>> return true,
> > > >>>>>>> > and
> > > >>>>>>> > > >> > >> override the alignSplits() methods in the
> > SourceReader
> > > /
> > > >>>>>>> > > SplitReader
> > > >>>>>>> > > >> if
> > > >>>>>>> > > >> > >> needed.
> > > >>>>>>> > > >> > >> - In the future, if a new optional feature is going
> > to
> > > >>>>>>> be added
> > > >>>>>>> > to
> > > >>>>>>> > > >> the
> > > >>>>>>> > > >> > >> Source, and that feature requires the awareness
> from
> > > >>>>>>> Flink, we
> > > >>>>>>> > can
> > > >>>>>>> > > >> add
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> more
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> such methods to the Source.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> What do you think?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
> > > >>>>>>> > > >> dwysakowicz@apache.org>
> > > >>>>>>> > > >> > <dw...@apache.org>
> > > >>>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org>
> <
> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Konstantin:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> > > interface
> > > >>>>>>> (aka the
> > > >>>>>>> > > >> stop &
> > > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> > > >>>>>>> Pulsar only,
> > > >>>>>>> > > >> correct?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Correct, as far as I know though, those are the
> only
> > > >>>>>>> sources
> > > >>>>>>> > which
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> consume
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> concurrently from multiple splits and thus
> alignment
> > > >>>>>>> applies.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Thomas:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > > >>>>>>> SourceReaderBase and
> > > >>>>>>> > > then
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> doing
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
> > > >>>>>>> AlignedSplitReader
> > > >>>>>>> > > >> could
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> be
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead
> > be
> > > >>>>>>> added to
> > > >>>>>>> > the
> > > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
> > make
> > > >>>>>>> it
> > > >>>>>>> > explicit
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >> the source actually supports it.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we could
> > > >>>>>>> actually do
> > > >>>>>>> > that.
> > > >>>>>>> > > >> Given
> > > >>>>>>> > > >> > >> the actual implementation of the
> > > >>>>>>> SourceReaderBase#alignSplits is
> > > >>>>>>> > > >> rather
> > > >>>>>>> > > >> > >> short (just a forward to the corresponding method
> of
> > > >>>>>>> > SplitFetcher),
> > > >>>>>>> > > >> we
> > > >>>>>>> > > >> > >> could reimplement it in the actual source
> > > >>>>>>> implementations. This
> > > >>>>>>> > > >> solution
> > > >>>>>>> > > >> > >> has the downside though. Authors of new sources
> would
> > > >>>>>>> have to do
> > > >>>>>>> > > two
> > > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and
> implement
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> WithSplitsAssignment,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> instead of just extending AlignedSplitReader. I
> would
> > > be
> > > >>>>>>> fine
> > > >>>>>>> > with
> > > >>>>>>> > > >> such
> > > >>>>>>> > > >> > a
> > > >>>>>>> > > >> > >> tradeoff though. What others think?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Steven:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> For this part from the motivation section, is it
> > > >>>>>>> accurate? Let's
> > > >>>>>>> > > >> assume
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> one
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of
> the
> > > >>>>>>> partition
> > > >>>>>>> > is
> > > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
> > for
> > > >>>>>>> this
> > > >>>>>>> > source
> > > >>>>>>> > > >> task
> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
> > other
> > > >>>>>>> two Kafka
> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
> > overall
> > > >>>>>>> > watermark.
> > > >>>>>>> > > I
> > > >>>>>>> > > >> > >> thought the problem is that we may have late data
> for
> > > >>>>>>> this slow
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> partition.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> It will hold back the watermark. Watermark of an
> > > >>>>>>> operator is the
> > > >>>>>>> > > >> minimum
> > > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I have another question about the restart. Say
> split
> > > >>>>>>> alignment is
> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> > > >>>>>>> restored from
> > > >>>>>>> > > the
> > > >>>>>>> > > >> > last
> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> > > >>>>>>> checkpointed,
> > > >>>>>>> > > initially
> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
> > > >>>>>>> watermark
> > > >>>>>>> > > >> > aggregation
> > > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
> > > >>>>>>> problem. Just
> > > >>>>>>> > > >> want
> > > >>>>>>> > > >> > to
> > > >>>>>>> > > >> > >> understand it more.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Your understanding is correct.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> @Becket:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general
> > use
> > > >>>>>>> case, so
> > > >>>>>>> > > >> should
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> we
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> directly
> > > >>>>>>> instead of
> > > >>>>>>> > > >> > >> introducing the new interface of
> WithSplitAssignment?
> > > We
> > > >>>>>>> can
> > > >>>>>>> > > provide
> > > >>>>>>> > > >> > >> default implementations, so backwards compatibility
> > > >>>>>>> won't be an
> > > >>>>>>> > > >> issue.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I don't think we can provide a default
> > implementation.
> > > >>>>>>> How would
> > > >>>>>>> > we
> > > >>>>>>> > > >> do
> > > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it better than
> > > having
> > > >>>>>>> an
> > > >>>>>>> > opt-in
> > > >>>>>>> > > >> > >> interface? The default implementation would have to
> > be
> > > >>>>>>> added
> > > >>>>>>> > > >> exclusively
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> in
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the way
> notice
> > > >>>>>>> > > SourceReaderBase
> > > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so
> effectively
> > > all
> > > >>>>>>> > > >> implementations
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> do
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> handle the alignment case. To be honest I think it
> is
> > > >>>>>>> impossible
> > > >>>>>>> > to
> > > >>>>>>> > > >> > >> implement the SourceReader interface directly by
> end
> > > >>>>>>> users.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> > probably
> > > >>>>>>> also
> > > >>>>>>> > needs
> > > >>>>>>> > > >> some
> > > >>>>>>> > > >> > >> change to support throttling at the split
> > granularity.
> > > >>>>>>> Can you
> > > >>>>>>> > add
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >> interface change into the public interface section
> as
> > > >>>>>>> well?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> It has been added from the beginning. See
> > > >>>>>>> *AlignedSplitReader.*
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> > assignSplits
> > > >>>>>>> here,
> > > >>>>>>> > given
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> it
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> is not actually changing the split assignments? It
> > > seems
> > > >>>>>>> > something
> > > >>>>>>> > > >> like
> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling()
> is
> > > >>>>>>> more
> > > >>>>>>> > > accurate.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do
> you
> > > >>>>>>> still
> > > >>>>>>> > prefer
> > > >>>>>>> > > a
> > > >>>>>>> > > >> > >> different name for that? Personally, I am open for
> > > >>>>>>> suggestions
> > > >>>>>>> > > here.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Dawid
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> [1]
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >>
> > > >>>>>>> > >
> > > >>>>>>> >
> > > >>>>>>>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think
> the
> > > >>>>>>> motivation
> > > >>>>>>> > > >> makes a
> > > >>>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general
> > use
> > > >>>>>>> case, so
> > > >>>>>>> > > >> should
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> we
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> just add the related methods to SourceReader
> directly
> > > >>>>>>> instead of
> > > >>>>>>> > > >> > >> introducing the new interface of
> WithSplitAssignment?
> > > We
> > > >>>>>>> can
> > > >>>>>>> > > provide
> > > >>>>>>> > > >> > >> default implementations, so backwards compatibility
> > > >>>>>>> won't be an
> > > >>>>>>> > > >> issue.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> > probably
> > > >>>>>>> also
> > > >>>>>>> > needs
> > > >>>>>>> > > >> some
> > > >>>>>>> > > >> > >> change to support throttling at the split
> > granularity.
> > > >>>>>>> Can you
> > > >>>>>>> > add
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >> interface change into the public interface section
> as
> > > >>>>>>> well?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> > assignSplits
> > > >>>>>>> here,
> > > >>>>>>> > given
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> it
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> is not actually changing the split assignments? It
> > > seems
> > > >>>>>>> > something
> > > >>>>>>> > > >> like
> > > >>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling()
> is
> > > >>>>>>> more
> > > >>>>>>> > > accurate.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
> > > >>>>>>> stevenz3wu@gmail.com
> > > >>>>>>> > >
> > > >>>>>>> > > <
> > > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
> > > >>>>>>> stevenz3wu@gmail.com>
> > > >>>>>>> > <
> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> However, a single source operator may read data
> from
> > > >>>>>>> multiple
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions,
> > > such
> > > >>>>>>> that
> > > >>>>>>> > even
> > > >>>>>>> > > >> with
> > > >>>>>>> > > >> > >> watermark alignment the source operator may need to
> > > >>>>>>> buffer
> > > >>>>>>> > > excessive
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> amount
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> of data if one split emits data faster than
> another.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> For this part from the motivation section, is it
> > > >>>>>>> accurate? Let's
> > > >>>>>>> > > >> assume
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> one
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of
> the
> > > >>>>>>> partition
> > > >>>>>>> > is
> > > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
> > for
> > > >>>>>>> this
> > > >>>>>>> > source
> > > >>>>>>> > > >> task
> > > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
> > other
> > > >>>>>>> two Kafka
> > > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
> > overall
> > > >>>>>>> > watermark.
> > > >>>>>>> > > I
> > > >>>>>>> > > >> > >> thought the problem is that we may have late data
> for
> > > >>>>>>> this slow
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> partition.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I have another question about the restart. Say
> split
> > > >>>>>>> alignment is
> > > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> > > >>>>>>> restored from
> > > >>>>>>> > > the
> > > >>>>>>> > > >> > last
> > > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> > > >>>>>>> checkpointed,
> > > >>>>>>> > > initially
> > > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
> > > >>>>>>> watermark
> > > >>>>>>> > > >> > aggregation
> > > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
> > > >>>>>>> problem. Just
> > > >>>>>>> > > >> want
> > > >>>>>>> > > >> > to
> > > >>>>>>> > > >> > >> understand it more.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
> > > >>>>>>> thw@apache.org> <
> > > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> > > >>>>>>> > > >> > >> thw@apache.org> <
> > > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks for working on this!
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > > >>>>>>> SourceReaderBase and
> > > >>>>>>> > > then
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> doing
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> nothing if the split reader does not implement
> > > >>>>>>> AlignedSplitReader
> > > >>>>>>> > > >> could
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> be
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead
> > be
> > > >>>>>>> added to
> > > >>>>>>> > the
> > > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
> > make
> > > >>>>>>> it
> > > >>>>>>> > explicit
> > > >>>>>>> > > >> that
> > > >>>>>>> > > >> > >> the source actually supports it.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Thanks,
> > > >>>>>>> > > >> > >> Thomas
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
> > > >>>>>>> > > knaufk@apache.org>
> > > >>>>>>> > > >> <
> > > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
> > knaufk@apache.org
> > > >
> > > >>>>>>> <
> > > >>>>>>> > > >> > >> knaufk@apache.org> <
> > > >>>>>>> > > >> > >> knaufk@apache.org>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> > > interface
> > > >>>>>>> (aka the
> > > >>>>>>> > > >> stop
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> &
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> > > >>>>>>> Pulsar only,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> correct?
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to complete
> > the
> > > >>>>>>> watermark
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> aligned
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> story with this FLIP.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Cheers,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Konstantin
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> > > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> To be explicit, having worked on it, I support it
> ;)
> > I
> > > >>>>>>> think we
> > > >>>>>>> > can
> > > >>>>>>> > > >> > >> start a vote thread soonish, as there are no
> concerns
> > > so
> > > >>>>>>> far.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Dawid
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Dear Flink developers,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1]
> for
> > > an
> > > >>>>>>> > extension
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> of
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Watermark Alignment to perform alignment also in
> > > >>>>>>> SplitReaders. To
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> do
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> so,
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> SplitReaders must be able to suspend and resume
> > reading
> > > >>>>>>> from
> > > >>>>>>> > split
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> sources
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> where the SourceOperator coordinates and controlls
> > > >>>>>>> suspend and
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> resume.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> To
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> gather information about current watermarks of the
> > > >>>>>>> SplitReaders,
> > > >>>>>>> > we
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> extend
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
> > > >>>>>>> watermarks to
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> the
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> SourceOperator.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by
> Arvid
> > > >>>>>>> Heise and
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> revised
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> and
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the
> > work.)
> > > >>>>>>> and me.
> > > >>>>>>> > The
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> changes
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> are backwards compatible in a way that if affected
> > > >>>>>>> components do
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> not
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> support split alignment the behavior is as before.
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Best,
> > > >>>>>>> > > >> > >> Sebastian
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> [1]
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >>
> > > >>>>>>> > >
> > > >>>>>>> >
> > > >>>>>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> [2]
> > > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> --
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> > >>
> > > >>>>>>> > > >> >
> > > >>>>>>> > > >>
> > > >>>>>>> > > >
> > > >>>>>>> > >
> > > >>>>>>> >
> > > >>>>>>>
> > > >>>>>>
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi Becket,

I still sustain what I wrote before:
> I think I would still vote soft -1 on this option, but I wouldn't block
it in case I am out-voted.

> I think it might be helpful to agree on the definition of optional in our
case.

For me it doesn't matter whether a default method throwing an exception we
call optional or non-optional. As long as we keep it this way, the effect
is the same. It's effectively a method that a user doesn't have to
implement. If interface/system allows some methods to be not implemented,
some users will do just that, regardless if we call it and document as
non-optional. And at the same time it's clogging the base interface.

By the way, just the need for a java-doc/documentation explaining the
existence of some construct is a bad smell (code should be self-documenting
and default method throwing an UnsupportedOperationException is not).

> Please note that so far we do not assume whether the feature is in
> the original API or it is added later. A newly added feature can also be
> non-optional, although it might take some time for all the pluggable
> developers to catch up, and they should still work if the new feature is
> not used until they catch up. In contrast, we may never expect an optional
> feature to catch up, because leaving it unimplemented is also blessed.
>
> Let's take the checkpointing as an example. Imagine Flink did not support
> checkpointing before release 1.16. And now we are trying to add
> checkpointing to Flink. So we exposed the checkpoint configuration to the
> end users. In the meantime, will we tell the pluggable (e.g. operators,
> connectors) developers that methods like "snapshotState()" is optional? If
> we do that, the availability of checkpointing in Flink would be severely
> weakened. But apparently we should still allow the existing
implementations
> to work without checkpointing. It looks to me that adding the method to
the
> pluggable interfaces with a default implementation throwing
> "UnsupportedOperationException" would be the solution here. Please note
> that in this case, having the default implementation does not mean this is
> optional. It is just the technique to support backwards compatibility in
> the feature evolution. The fact that this method is in the base interface
> suggests it is not optional, so the developers SHOULD implement it.

I would soft vote -1 for having the default method throwing
UnsupportedOperationException as one of thing for this (FLIP-217) special
circumstances.

At the moment, without thinking this over too much, I would vote harder -1
for having this as a general rule when adding new features. If we ever end
up with an API that is littered with default methods throwing
UnsupportedOperationException that are documented as "non optional" it
would be IMO a big design failure. I would be ok-ish with that, only if
that was a temporary thing and we had an aggressive plan to release more
often new major Flink versions (2.x.y, 3.x.y, ...) breaking API
compatibility, that would get rid of those default methods. Adding
checkpointing and methods like "snapshotState()" would IMO easily justify a
new major Flink release. In that case we could add those methods with
default implementation for some transition period, a one or two minor
releases, followed by a clean up in a major release. However I would still
argue that it would be cleaner/better to add a decorative interface like
`CheckpointedOperator` interface instead of adding those default methods to
the base `Operator` interface.

I think I can sum up our disagreement here is that I would like to keep the
interfaces simpler, with only obligatory methods/features on one side and
clearly optional features on the other. While you would like to add an
extra third state in between those two?

Best,
Piotrek



czw., 12 maj 2022 o 04:25 Becket Qin <be...@gmail.com> napisał(a):

> Thanks for the clarification, Piotr and Sebastian.
>
> It looks like the key problem is still whether the implementation of
> pausable splits in the Sources should be optional or not.
>
> I think it might be helpful to agree on the definition of optional in our
> case. To me:
> Optional = "You CAN leave the method unimplemented, and that is fine."
> Non-Optional = "You CAN leave the method unimplemented, but you SHOULD NOT,
> because people assume this works."
>
> I think one sufficient condition of a Non-Optional feature is that if the
> feature is exposed through the framework API, Flink should expect the
> pluggables to support this feature by default. Otherwise the availability
> of that feature becomes undefined.
>
> Please note that so far we do not assume whether the feature is in
> the original API or it is added later. A newly added feature can also be
> non-optional, although it might take some time for all the pluggable
> developers to catch up, and they should still work if the new feature is
> not used until they catch up. In contrast, we may never expect an optional
> feature to catch up, because leaving it unimplemented is also blessed.
>
> Let's take the checkpointing as an example. Imagine Flink did not support
> checkpointing before release 1.16. And now we are trying to add
> checkpointing to Flink. So we exposed the checkpoint configuration to the
> end users. In the meantime, will we tell the pluggable (e.g. operators,
> connectors) developers that methods like "snapshotState()" is optional? If
> we do that, the availability of checkpointing in Flink would be severely
> weakened. But apparently we should still allow the existing implementations
> to work without checkpointing. It looks to me that adding the method to the
> pluggable interfaces with a default implementation throwing
> "UnsupportedOperationException" would be the solution here. Please note
> that in this case, having the default implementation does not mean this is
> optional. It is just the technique to support backwards compatibility in
> the feature evolution. The fact that this method is in the base interface
> suggests it is not optional, so the developers SHOULD implement it.
>
> When it comes to this FLIP, I think it meets the criteria of non-optional
> features, so we should just use the evolution path of non-optional
> features.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
> > Hi,
> >
> > Actually previously I thought about having a decorative interface and
> > whenever watermark alignment is enabled, checking that the source
> > implements the decorative interface. If not, throwing an exception.
> >
> > The option with default methods in the source interfaces throwing
> > `UnsupportedOperationException` I think still suffers from the same
> > problems I mentioned before. It's still an optional implementation and at
> > the same time it's clogging the base interface. I think I would still
> vote
> > soft -1 on this option, but I wouldn't block it in case I am out-voted.
> >
> > Best,
> > Piotrek
> >
> > śr., 11 maj 2022 o 14:22 Sebastian Mattheis <se...@ververica.com>
> > napisał(a):
> >
> > > Hi Becket,
> > >
> > > Thanks a lot for your fast and detailed response. For me, it converges
> > and
> > > dropping the supportsX method sounds very reasonable to me. (Side note:
> > > With "pausable splits" enabled as "default" I think we misunderstood.
> As
> > > you described now "default" I understand as that it should be the new
> > > recommended way of implementation, and I think that is fully valid.
> > Before,
> > > I understood "default" here as the default implementation, i.e.,
> throwing
> > > UnsupportedOperationException, which is the exact opposite. :) )
> > >
> > > Nevertheless: As mentioned, an open question for me is if watermark
> > > alignment should enforce pausable splits. For clarification, the
> current
> > > documentation [1] says:
> > >
> > > *Note:* As of 1.15, Flink supports aligning across tasks of the same
> > >> source and/or different sources. It does not support aligning
> > >> splits/partitions/shards in the same task.
> > >>
> > >> In a case where there are e.g. two Kafka partitions that produce
> > >> watermarks at different pace, that get assigned to the same task
> > watermark
> > >> might not behave as expected. Fortunately, worst case it should not
> > perform
> > >> worse than without alignment.
> > >>
> > >> Given the limitation above, we suggest applying watermark alignment in
> > >> two situations:
> > >>
> > >>    1. You have two different sources (e.g. Kafka and File) that
> produce
> > >>    watermarks at different speeds
> > >>    2. You run your source with parallelism equal to the number of
> > >>    splits/shards/partitions, which results in every subtask being
> > assigned a
> > >>    single unit of work.
> > >>
> > >> I personally see no issue in implementing and I see no reason against
> > > implementing this dependency of watermark alignment and pausable
> splits.
> > (I
> > > think this would even be a good path towards shaping watermark
> alignment
> > in
> > > 1.16.) However, "I don't see" means that I would be happy to hear
> Dawid's
> > > and Piotrek's opinions as they implemented watermark alignment based on
> > > FLIP-182 [2] and I don't want to miss relevant rationale/background
> info
> > > from their side.
> > >
> > > *@Piotrek* *@Dawid *What do you think?
> > >
> > > Regards,
> > > Sebastian
> > >
> > > [1]
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> > >
> > > On Wed, May 11, 2022 at 1:30 PM Becket Qin <be...@gmail.com>
> wrote:
> > >
> > >> +dev
> > >>
> > >> Hi Sebastian,
> > >>
> > >> Thank you for the summary. Please see the detailed replies inline. As
> a
> > >> recap of my suggestions.
> > >>
> > >> 1. Pausable splits API.
> > >>   a) Add default implementations to methods "pauseOrResumeSplits" in
> > both
> > >> SourceReader and SplitReader where both default implementations throw
> > >>  UnsupportedOperationException.
> > >>
> > >> 2. User story.
> > >>     a) We tell users to enable the watermark alignment as they like.
> > This
> > >> is exactly what the current Flink API is.
> > >>     b) We tell the source developers, please implement pausable
> splits,
> > >> otherwise bad things may happen. Think of it like you are expected to
> > >> implement SourceReader#snapshotState() properly, otherwise exceptions
> > will
> > >> be thrown when users enable checkpointing.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> > >> sebastian@ververica.com> wrote:
> > >>
> > >>> Hi Becket, Hi everybody,
> > >>>
> > >>> I'm sorry if I misread the messages but I could not derive an
> agreement
> > >>> from the mailing list. Nevertheless, if I understand you right the
> > >>> suggestion is:
> > >>>
> > >>> * Add default implementations to methods "pauseOrResumeSplits" in
> both
> > >>> SourceReader and SplitReader where both default implementations throw
> > >>> UnsupportedOperationException.
> > >>>
> > >> Yes.
> > >>
> > >> * Add "supportsPauseOrResumeSplits" to the Source interface. (In the
> > >>> following, I refer to supporting this as "pausable splits".)
> > >>>
> > >> We may no longer need this if pausable splits are expected to be
> > >> implemented by the source developers, i.e. non-optional. Having this
> > method
> > >> would then be somewhat misleading as it looks like the sources that do
> > not
> > >> support pausable splits are also acceptable in the long term. So API
> > wise,
> > >> I'd say maybe we should remove this for this FLIP, although I believe
> > this
> > >> supportXXX pattern itself is still attractive for optional features.
> > >>
> > >>
> > >>>
> > >>> To make the conclusions explicit:
> > >>>
> > >>> 1. The implementation of pauseOrResumeSplits in both interfaces
> > >>> SourceReader and SplitReader are optional where the default is that
> it
> > >>> doesn't support it. (--> This means that the implementation is still
> > >>> optional for the source developer.)
> > >>>
> > >> It is optional for backwards compatibility with existing sources, as
> > they
> > >> may still compile without code change. But starting from this FLIP,
> > Flink
> > >> will always optimistically assume that all the sources support
> pausable
> > >> splits. If a source does not support pausable splits, it goes to an
> > error
> > >> handling path when watermark alignment is enabled on it. This is
> > different
> > >> from a usual optional feature, where no error is expected.
> > >>
> > >>
> > >>> 2. If watermark alignment is enabled in the application code by
> adding
> > >>> withWatermarkAlignment to the WatermarkStrategy while SourceReader or
> > >>> SplitReader do not support pausableSplits, we throw an
> > >>> UnsupportedOperationException.
> > >>>
> > >> Yes.
> > >>
> > >>
> > >>> 3. With regard to your statement:
> > >>>
> > >>>> [...] basically means watermark alignment is an non-optional feature
> > to
> > >>>> the end users.
> > >>>
> > >>> You actually mean that "pausable splits" are non-optional for the app
> > >>> developer if watermark alignment is enabled. However, watermark
> > alignment
> > >>> is optional and can be enabled/disabled.
> > >>>
> > >> Yes, watermark alignment can be enabled/disabled in individual sources
> > in
> > >> Flink jobs, which basically means the code supporting watermark
> > alignment
> > >> has to already be there. That again means the Source developers are
> also
> > >> expected to support pausable splits by default. So this way we
> > essentially
> > >> tell the end users that you may enable / disable this feature as you
> > wish,
> > >> and tell the source developers that you SHOULD implement this because
> > the
> > >> end users may turn it on/off at will. And if the source does not
> support
> > >> pausable splits, that goes to an error handling path when watermark
> > >> alignment is enabled on it. So users know they have to explicitly
> > exclude
> > >> this source.
> > >>
> > >>
> > >>>
> > >>> So far it's totally clear to me and I hope this is what you mean. I
> > also
> > >>> agree with both statements:
> > >>>
> > >>> So making that expectation aligned with the source developers seems
> > >>>> reasonable.
> > >>>>
> > >>>
> > >>> I think this is a simple and clean solution from both the end user
> and
> > >>>> source developers' standpoint.
> > >>>>
> > >>>
> > >>> However, a last conclusion derives from 3. and is an open question
> for
> > >>> me:
> > >>>
> > >>> 4. The feature of "pausable splits" is now tightly bound to watermark
> > >>> alignment, i.e., if sources do not support "pausable splits" one can
> > not
> > >>> enable watermark alignment for these sources. This dependency is not
> > the
> > >>> current status of watermark alignment implementation because it
> is/was
> > >>> implemented without pausable splits. Do we want to introduce this
> > >>> dependency? (This is an open question. I cannot judge that.)
> > >>>
> > >> The watermark alignment basically relies on the pausable splits,
> right?
> > >> So personally I found it quite reasonable that if the source does not
> > >> support pausable splits, end users cannot enable watermark alignment
> on
> > it.
> > >>
> > >>
> > >>> If something is wrong, please correct me.
> > >>>
> > >>> Regards,
> > >>> Sebastian
> > >>>
> > >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com>
> > wrote:
> > >>>
> > >>>> Hi Sebastian,
> > >>>>
> > >>>> Thanks for the reply and patient discussion. I agree this is a
> tricky
> > >>>> decision.
> > >>>>
> > >>>>
> > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I see
> as
> > >>>>> follows:
> > >>>>> (1) An interface with default NOOP implementation makes the
> > >>>>> implementation optional. And in my opinion, a default
> implementation
> > is and
> > >>>>> will remain a way of making implementation optional because even in
> > future
> > >>>>> a developer can decide to implement the "old flavor" without
> support
> > for
> > >>>>> pausable splits.
> > >>>>> (2) It may not be too critical but I also find it suboptimal that
> > with
> > >>>>> a NOOP default implementation there is no way to check at runtime
> if
> > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
> one
> > would
> > >>>>> need a supportsX method which makes it again more complicated.)\
> > >>>>
> > >>>>
> > >>>> Based on the last few messages in the mailing list.  Piotr and I
> > agreed
> > >>>> that the default implementation should just throw an
> > >>>> UnsupportedOperationException if the source is unpausable. So this
> > >>>> basically tells the Source developers that this feature is expected
> > to be
> > >>>> supported. Because we cannot prevent end users from putting an
> > unpausable
> > >>>> source into the watermark alignment group, that basically means
> > watermark
> > >>>> alignment is an non-optional feature to the end users. So making
> that
> > >>>> expectation aligned with the source developers seems reasonable.
> And
> > if a
> > >>>> source does not support this feature, the end users should
> explicitly
> > >>>> remove that source from the watermark alignment group.
> > >>>>
> > >>>> Personally speaking I think this is a simple and clean solution from
> > >>>> both the end user and source developers' standpoint.
> > >>>>
> > >>>> Does this address your concerns?
> > >>>>
> > >>>> Thanks,
> > >>>>
> > >>>> Jiangjie (Becket) Qin
> > >>>>
> > >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> > >>>> sebastian@ververica.com> wrote:
> > >>>>
> > >>>>> Hi Piotr, Hi Becket, Hi everybody,
> > >>>>>
> > >>>>> we, Dawid and I, discussed the various suggestions/options and we
> > >>>>> would be okay either way because we find neither solution is
> perfect
> > just
> > >>>>> because of the already present complexity.
> > >>>>>
> > >>>>> Option c) Adding methods to the interfaces of SourceReader and
> > >>>>> SplitReader
> > >>>>> Option a) Adding decorative interfaces to be used by SourceReader
> and
> > >>>>> SplitReader
> > >>>>>
> > >>>>> As of the current status (v. 12) of the FLIP [1], it is based on
> > >>>>> Option c) which we find acceptable because the complexity added is
> > only a
> > >>>>> single method.
> > >>>>>
> > >>>>> Nevertheless, Piotr has valid concerns about Option c) which I see
> as
> > >>>>> follows:
> > >>>>> (1) An interface with default NOOP implementation makes the
> > >>>>> implementation optional. And in my opinion, a default
> implementation
> > is and
> > >>>>> will remain a way of making implementation optional because even in
> > future
> > >>>>> a developer can decide to implement the "old flavor" without
> support
> > for
> > >>>>> pausable splits.
> > >>>>> (2) It may not be too critical but I also find it suboptimal that
> > with
> > >>>>> a NOOP default implementation there is no way to check at runtime
> if
> > >>>>> SourceReader or SplitReader actually support pausing. (To do so,
> one
> > would
> > >>>>> need a supportsX method which makes it again more complicated.)
> > >>>>>
> > >>>>> However, we haven't changed it because Option a) is also not
> optimal
> > >>>>> or straight-forward:
> > >>>>> (1) We need to add two distinct yet similar decorative interfaces
> > >>>>> since, as mentioned, the signatures of the methods are different.
> For
> > >>>>> example, we would need decorative interfaces like
> > >>>>> `SplitReaderWithPausableSplits` and
> `SourceReaderWithPausableSplits`.
> > >>>>> (2) As a consequence, we would need to somehow document how/where
> to
> > >>>>> implement both interfaces and how this relates to each other. This
> > we could
> > >>>>> solve by adding a note in the interface of SourceReader and
> > SplitReader and
> > >>>>> reference to the decorative interfaces but it still increases
> > complexity
> > >>>>> too.
> > >>>>>
> > >>>>> In summary, we see both as acceptable and preferred over other
> > >>>>> options. The question is if we can find a solution or compromise
> > that is
> > >>>>> acceptable for everybody to reach consensus.
> > >>>>>
> > >>>>> Please let us know what you think because we would be happy if we
> can
> > >>>>> conclude the discussion to avoid dropping the initiative on this
> > FLIP.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Sebastian
> > >>>>>
> > >>>>> [1]
> > >>>>>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> > >>>>> (v. 12)
> > >>>>>
> > >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <
> pnowojski@apache.org
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi Guowei,
> > >>>>>>
> > >>>>>> as Dawid wrote a couple of messages back:
> > >>>>>>
> > >>>>>> > This is covered in the previous FLIP[1] which has been already
> > >>>>>> implemented in 1.15. In short, it must be enabled with the
> watermark
> > >>>>>> strategy which also configures drift and update interval
> > >>>>>>
> > >>>>>> So by default watermark alignment is disabled, regardless if a
> > source
> > >>>>>> supports it or not.
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Piotrek
> > >>>>>>
> > >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com>
> > napisał(a):
> > >>>>>>
> > >>>>>>> Hi,
> > >>>>>>>
> > >>>>>>> We know that in the case of Bounded input Flink supports the
> Batch
> > >>>>>>> execution mode. Currently in Batch execution mode, flink is
> > executed
> > >>>>>>> on a
> > >>>>>>> stage-by-stage basis. In this way, perhaps watermark alignment
> > might
> > >>>>>>> not
> > >>>>>>> gain much.
> > >>>>>>>
> > >>>>>>> So my question is: Is watermark alignment the default
> behavior(for
> > >>>>>>> implemented source only)? If so, have you considered evaluating
> the
> > >>>>>>> impact
> > >>>>>>> of this behavior on the Batch execution mode? Or thinks it is not
> > >>>>>>> necessary.
> > >>>>>>>
> > >>>>>>> Correct me if I miss something.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Guowei
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> > >>>>>>> piotr.nowojski@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>> > Hi Becket and Dawid,
> > >>>>>>> >
> > >>>>>>> > > I feel that no matter which option we choose this can not be
> > >>>>>>> solved
> > >>>>>>> > entirely in either of the options, because of the point above
> and
> > >>>>>>> because
> > >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
> > >>>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
> > >>>>>>> identifies
> > >>>>>>> > splits with splitId the other one passes the splits directly).
> > >>>>>>> >
> > >>>>>>> > Yes, that's a good point in this case and for features that
> need
> > >>>>>>> to be
> > >>>>>>> > implemented in more than one place.
> > >>>>>>> >
> > >>>>>>> > > Is there any reason for pausing reading from a split an
> > optional
> > >>>>>>> feature,
> > >>>>>>> > > other than that this was not included in the original
> > interface?
> > >>>>>>> >
> > >>>>>>> > An additional argument in favor of making it optional is to
> > >>>>>>> simplify source
> > >>>>>>> > implementation. But on its own I'm not sure if that would be
> > >>>>>>> enough to
> > >>>>>>> > justify making this feature optional. Maybe.
> > >>>>>>> >
> > >>>>>>> > > I think it would be way simpler and clearer to just let end
> > >>>>>>> users and
> > >>>>>>> > Flink
> > >>>>>>> > > assume all the connectors will implement this feature.
> > >>>>>>> >
> > >>>>>>> > As I wrote above that would be an interesting choice to make
> > (ease
> > >>>>>>> of
> > >>>>>>> > implementation for new users, vs system consistency).
> Regardless
> > >>>>>>> of that,
> > >>>>>>> > yes, for me the main argument is the API backward
> compatibility.
> > >>>>>>> But let's
> > >>>>>>> > clear a couple of points:
> > >>>>>>> > - The current proposal adding methods to the base interface
> with
> > >>>>>>> default
> > >>>>>>> > implementations is an OPTIONAL feature. Same as the decorative
> > >>>>>>> version
> > >>>>>>> > would be.
> > >>>>>>> > - Decorative version could implement "throw
> > >>>>>>> UnsupportedOperationException"
> > >>>>>>> > if user enabled watermark alignment just as well and I agree
> > >>>>>>> that's a
> > >>>>>>> > better option compared to logging a warning.
> > >>>>>>> >
> > >>>>>>> > Best,
> > >>>>>>> > Piotrek
> > >>>>>>> >
> > >>>>>>> >
> > >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
> > >>>>>>> napisał(a):
> > >>>>>>> >
> > >>>>>>> > > Thanks for the reply and patient discussion, Piotr and Dawid.
> > >>>>>>> > >
> > >>>>>>> > > Is there any reason for pausing reading from a split an
> > optional
> > >>>>>>> feature,
> > >>>>>>> > > other than that this was not included in the original
> > interface?
> > >>>>>>> > >
> > >>>>>>> > > To be honest I am really worried about the complexity of the
> > >>>>>>> user story
> > >>>>>>> > > here. Optional features like this have a high overhead.
> Imagine
> > >>>>>>> this
> > >>>>>>> > > feature is optional, now a user enabled watermark alignment
> and
> > >>>>>>> defined a
> > >>>>>>> > > few watermark groups. Would it work? Hmm, that depends on
> > >>>>>>> whether the
> > >>>>>>> > > involved Source has implmemented this feature. If the Sources
> > >>>>>>> are well
> > >>>>>>> > > documented, good luck. Otherwise end users may have to look
> > into
> > >>>>>>> the code
> > >>>>>>> > > of the Source to see whether the feature is supported. Which
> is
> > >>>>>>> something
> > >>>>>>> > > they shouldn't have to do.
> > >>>>>>> > >
> > >>>>>>> > > I think it would be way simpler and clearer to just let end
> > >>>>>>> users and
> > >>>>>>> > Flink
> > >>>>>>> > > assume all the connectors will implement this feature. After
> > all
> > >>>>>>> the
> > >>>>>>> > > watermark group is not optinoal to the end users. If in some
> > >>>>>>> rare cases,
> > >>>>>>> > > the feature cannot be supported, a clear
> > >>>>>>> UnsupportedOperationException
> > >>>>>>> > will
> > >>>>>>> > > be thrown to tell users to explicitly remove this Source from
> > the
> > >>>>>>> > watermark
> > >>>>>>> > > group. I don't think we should have a warning message here,
> as
> > >>>>>>> they tend
> > >>>>>>> > to
> > >>>>>>> > > be ignored in many cases. If we do this, we don't even need
> the
> > >>>>>>> > supportXXX
> > >>>>>>> > > method in the Source for this feature. In fact this is
> exactly
> > >>>>>>> how many
> > >>>>>>> > > interfaces works today. For example,
> > >>>>>>> SplitEnumerator#addSplitsBack() is
> > >>>>>>> > not
> > >>>>>>> > > supported by Pravega source because it does not support
> partial
> > >>>>>>> failover.
> > >>>>>>> > > In that case, it simply throws an exception to trigger a
> global
> > >>>>>>> recovery.
> > >>>>>>> > >
> > >>>>>>> > > The reason we add a default implementation in this case would
> > >>>>>>> just for
> > >>>>>>> > the
> > >>>>>>> > > sake of backwards compatibility so the old source can still
> > >>>>>>> compile.
> > >>>>>>> > Sure,
> > >>>>>>> > > in short term, this feature might not be supported by many
> > >>>>>>> existing
> > >>>>>>> > > sources. That is OK, and it is quite visible to the source
> > >>>>>>> developers
> > >>>>>>> > that
> > >>>>>>> > > they did not override the default impl which throws an
> > >>>>>>> > > UnsupportedOperationException.
> > >>>>>>> > >
> > >>>>>>> > > @Dawid,
> > >>>>>>> > >
> > >>>>>>> > > the Java doc of the SupportXXX() method in the Source would
> be
> > >>>>>>> the single
> > >>>>>>> > > >> source of truth regarding how to implement this feature.
> > >>>>>>> > > >
> > >>>>>>> > > >
> > >>>>>>> > >
> > >>>>>>> > > I also don't find it entirely true. Half of the classes are
> > >>>>>>> theoretically
> > >>>>>>> > > > optional and are utility classes from the point of view how
> > the
> > >>>>>>> > > interfaces
> > >>>>>>> > > > are organized. Theoretically users do not need to use any
> of
> > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
> their
> > >>>>>>> methods in
> > >>>>>>> > > the
> > >>>>>>> > > > Source interface.
> > >>>>>>> > >
> > >>>>>>> > > I think the ultimate goal of java docs is to guide users to
> > >>>>>>> implement the
> > >>>>>>> > > Source. If SourceReaderBase is the preferred way to
> implement a
> > >>>>>>> > > SourceReader, it seems worth mentioning that. Even the Java
> > >>>>>>> language
> > >>>>>>> > > documentation interfaces lists the konwn implementations [1]
> so
> > >>>>>>> people
> > >>>>>>> > can
> > >>>>>>> > > leverage them. But for this particular case, if we make the
> > >>>>>>> feature
> > >>>>>>> > > non-optional, we don't even need the supportXXX() method for
> > now.
> > >>>>>>> > >
> > >>>>>>> > > Thanks,
> > >>>>>>> > >
> > >>>>>>> > > Jiangjie (Becket) Qin
> > >>>>>>> > >
> > >>>>>>> > >
> > >>>>>>> > >
> > >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
> > >>>>>>> dwysakowicz@apache.org>
> > >>>>>>> > > wrote:
> > >>>>>>> > >
> > >>>>>>> > > > Hey Piotr and Becket,
> > >>>>>>> > > >
> > >>>>>>> > > > First of all, let me say I am happy with whichever option
> is
> > >>>>>>> agreed in
> > >>>>>>> > > the
> > >>>>>>> > > > discussion.
> > >>>>>>> > > >
> > >>>>>>> > > > I wanted to clarify a few points from the discussion
> though:
> > >>>>>>> > > >
> > >>>>>>> > > > @Becket:
> > >>>>>>> > > >
> > >>>>>>> > > > The main argument for adding the methods to the
> SourceReader
> > >>>>>>> is that
> > >>>>>>> > > these
> > >>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader impl,
> > i.e.
> > >>>>>>> > starting
> > >>>>>>> > > > from this FLIP, all the SourceReaders impl are expected to
> > >>>>>>> support this
> > >>>>>>> > > > method, although some old implementations may not have
> > >>>>>>> implemented this
> > >>>>>>> > > > feature. I think we should distinguish the new features
> from
> > >>>>>>> the
> > >>>>>>> > optional
> > >>>>>>> > > > features. While the public decorative interface is a
> solution
> > >>>>>>> to the
> > >>>>>>> > > > optional features. We should not use it for the features
> that
> > >>>>>>> are
> > >>>>>>> > > > non-optional.
> > >>>>>>> > > >
> > >>>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even
> though
> > >>>>>>> > preferred, I
> > >>>>>>> > > > still think it can be simply optional.
> > >>>>>>> > > >
> > >>>>>>> > > > the Java doc of the SupportXXX() method in the Source would
> > be
> > >>>>>>> the
> > >>>>>>> > single
> > >>>>>>> > > > source of truth regarding how to implement this feature.
> > >>>>>>> > > >
> > >>>>>>> > > > I also don't find it entirely true. Half of the classes are
> > >>>>>>> > theoretically
> > >>>>>>> > > > optional and are utility classes from the point of view how
> > the
> > >>>>>>> > > interfaces
> > >>>>>>> > > > are organized. Theoretically users do not need to use any
> of
> > >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list
> their
> > >>>>>>> methods in
> > >>>>>>> > > the
> > >>>>>>> > > > Source interface.
> > >>>>>>> > > >
> > >>>>>>> > > > @Piotr
> > >>>>>>> > > >
> > >>>>>>> > > > If we have all of the methods with default implementation
> in
> > >>>>>>> the base
> > >>>>>>> > > > interface, the API doesn't give any clue to the user which
> > set
> > >>>>>>> of
> > >>>>>>> > methods
> > >>>>>>> > > > are required to be implemented at the same time.
> > >>>>>>> > > >
> > >>>>>>> > > > I feel that no matter which option we choose this can not
> be
> > >>>>>>> solved
> > >>>>>>> > > > entirely in either of the options, because of the point
> above
> > >>>>>>> and
> > >>>>>>> > because
> > >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
> > >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different
> (one
> > >>>>>>> identifies
> > >>>>>>> > > > splits with splitId the other one passes the splits
> > directly).
> > >>>>>>> > > >
> > >>>>>>> > > > Best,
> > >>>>>>> > > >
> > >>>>>>> > > > Dawid
> > >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > >>>>>>> > > >
> > >>>>>>> > > > Hi Piotr,
> > >>>>>>> > > >
> > >>>>>>> > > > Thanks for the comment.
> > >>>>>>> > > >
> > >>>>>>> > > > Just to clarify, I am not against the decorative
> interfaces,
> > >>>>>>> but I do
> > >>>>>>> > > > think we should use them with caution. The main argument
> for
> > >>>>>>> adding the
> > >>>>>>> > > > methods to the SourceReader is that these methods are
> > >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e.
> starting
> > >>>>>>> from this
> > >>>>>>> > > > FLIP, all the SourceReaders impl are expected to support
> this
> > >>>>>>> > > > method, although some old implementations may not have
> > >>>>>>> implemented this
> > >>>>>>> > > > feature. I think we should distinguish the new features
> from
> > >>>>>>> the
> > >>>>>>> > optional
> > >>>>>>> > > > features. While the public decorative interface is a
> solution
> > >>>>>>> to the
> > >>>>>>> > > > optional features. We should not use it for the features
> that
> > >>>>>>> are
> > >>>>>>> > > > non-optional.
> > >>>>>>> > > >
> > >>>>>>> > > > That said, this feature is optional for SplitReaders.
> > Arguably
> > >>>>>>> we can
> > >>>>>>> > > have
> > >>>>>>> > > > a decorative interface for that, but for simplicity and
> > >>>>>>> symmetry of the
> > >>>>>>> > > > interface, personally I prefer just adding a new method.
> > >>>>>>> > > >
> > >>>>>>> > > > Regarding the advantages you mentioned about the decorative
> > >>>>>>> interfaces,
> > >>>>>>> > > > they would make sense if:
> > >>>>>>> > > > 1. The feature is optional.
> > >>>>>>> > > > 2. There is only one decorative interface involved for a
> > >>>>>>> feature.
> > >>>>>>> > > > Otherwise the argument that all the methods are grouped
> > >>>>>>> together will
> > >>>>>>> > not
> > >>>>>>> > > > stand.
> > >>>>>>> > > >
> > >>>>>>> > > > Compared with that, I think the current solution works fine
> > in
> > >>>>>>> all
> > >>>>>>> > cases,
> > >>>>>>> > > > i.e. "having supportXXX() method in Source, and default
> > >>>>>>> methods /
> > >>>>>>> > > > decorative interfaces in base interfaces.".
> > >>>>>>> > > >
> > >>>>>>> > > > The advantages are:
> > >>>>>>> > > >> - clean and easy to implement base interface
> > >>>>>>> > > >
> > >>>>>>> > > > In the current approach, the Java doc of the SupportXXX()
> > >>>>>>> method in the
> > >>>>>>> > > > Source would be the single source of truth regarding how to
> > >>>>>>> implement
> > >>>>>>> > > this
> > >>>>>>> > > > feature. It lists the method that has to be implemented to
> > >>>>>>> support this
> > >>>>>>> > > > feature, regardless of how many classes / interfaces are
> > >>>>>>> involved.
> > >>>>>>> > > >
> > >>>>>>> > > > When implementing the base interface, users do not need to
> > >>>>>>> implement a
> > >>>>>>> > > > method with default implementation. If they are curious
> what
> > >>>>>>> the method
> > >>>>>>> > > is
> > >>>>>>> > > > for, the java doc of that method simply points users to the
> > >>>>>>> > SupportXXX()
> > >>>>>>> > > > method in the Source. It seems not adding work to the users
> > >>>>>>> compared
> > >>>>>>> > with
> > >>>>>>> > > > decorative interfaces, but gives much better
> discoverability.
> > >>>>>>> > > >
> > >>>>>>> > > > - all of the methods from a single feature are grouped in a
> > >>>>>>> single
> > >>>>>>> > > >> decorator interface, together with their dedicated java
> doc.
> > >>>>>>> It's also
> > >>>>>>> > > >> easier to google search for help using the decorator name
> > >>>>>>> > > >
> > >>>>>>> > > > - if an optional feature requires two methods to be
> > >>>>>>> implemented at
> > >>>>>>> > once,
> > >>>>>>> > > >> decorator can guarantee that
> > >>>>>>> > > >
> > >>>>>>> > > > These two points are not true when multiple components and
> > >>>>>>> classes are
> > >>>>>>> > > > involved collaboratively to provide a feature. In our case,
> > we
> > >>>>>>> have
> > >>>>>>> > both
> > >>>>>>> > > > SourceReader and SplitReader involved. And there might be
> > other
> > >>>>>>> > > interfaces
> > >>>>>>> > > > on the JM side involved for some future features. So the
> > >>>>>>> relevant
> > >>>>>>> > methods
> > >>>>>>> > > > can actually be scattered over the places. That said, we
> may
> > >>>>>>> still use
> > >>>>>>> > > > decorative interfaces for each component, if the feature is
> > >>>>>>> optional,
> > >>>>>>> > > given
> > >>>>>>> > > > there is a single source of truth for the feature.
> > >>>>>>> > > >
> > >>>>>>> > > > Here I would strongly lean towards making life easier for
> new
> > >>>>>>> users,
> > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
> for
> > >>>>>>> the power
> > >>>>>>> > > >> users.
> > >>>>>>> > > >
> > >>>>>>> > > > I actually think the current approach is simpler, more
> > >>>>>>> extensible and
> > >>>>>>> > > more
> > >>>>>>> > > > general for all the users. Can you articulate a bit more on
> > >>>>>>> which part
> > >>>>>>> > > you
> > >>>>>>> > > > think makes users harder to understand?
> > >>>>>>> > > >
> > >>>>>>> > > >
> > >>>>>>> > > > There is another benefit of the decorative interfaces which
> > is
> > >>>>>>> not
> > >>>>>>> > > > mentioned, but might be worth considering here. Usually the
> > >>>>>>> decorative
> > >>>>>>> > > > interfaces give slightly better backwards compatibility
> than
> > >>>>>>> the new
> > >>>>>>> > > > default method in the interfaces. That is when users are
> > using
> > >>>>>>> a jar
> > >>>>>>> > that
> > >>>>>>> > > > was compiled with an older version of Flink which does not
> > >>>>>>> have the
> > >>>>>>> > > default
> > >>>>>>> > > > method in the interfaces in question. A decorative
> interface
> > >>>>>>> may still
> > >>>>>>> > > > provide backwards compatibility in that case, while default
> > >>>>>>> method impl
> > >>>>>>> > > > cannot.
> > >>>>>>> > > >
> > >>>>>>> > > > I think in Flink we in general do not guarantee custom
> > >>>>>>> components
> > >>>>>>> > > compiled
> > >>>>>>> > > > with an older version can run with a newer version of
> Flink.
> > A
> > >>>>>>> > recompile
> > >>>>>>> > > > with a newer version would be required. That said, if we do
> > >>>>>>> care about
> > >>>>>>> > > > this, we can just change the "supportXXX()" method in the
> > >>>>>>> Source
> > >>>>>>> > > interface
> > >>>>>>> > > > to use decorative interfaces, and leave the other parts
> > >>>>>>> unchanged.
> > >>>>>>> > > >
> > >>>>>>> > > > Thanks,
> > >>>>>>> > > >
> > >>>>>>> > > > Jiangjie (Becket) Qin
> > >>>>>>> > > >
> > >>>>>>> > > >
> > >>>>>>> > > >
> > >>>>>>> > > >
> > >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
> > >>>>>>> pnowojski@apache.org>
> > >>>>>>> > > > wrote:
> > >>>>>>> > > >
> > >>>>>>> > > >> Hi,
> > >>>>>>> > > >>
> > >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for the last
> > two
> > >>>>>>> weeks.
> > >>>>>>> > > >> Regarding the interfaces, I would be actually against
> adding
> > >>>>>>> those
> > >>>>>>> > > methods
> > >>>>>>> > > >> to the base interfaces for the reasons mentioned above.
> > >>>>>>> Clogging the
> > >>>>>>> > > base
> > >>>>>>> > > >> interface for new users with tons of methods that they do
> > not
> > >>>>>>> need, do
> > >>>>>>> > > not
> > >>>>>>> > > >> understand and do not know what to do with them. Moreover,
> > >>>>>>> such
> > >>>>>>> > > decorative
> > >>>>>>> > > >> interfaces are solving a problem if a feature requires two
> > or
> > >>>>>>> more
> > >>>>>>> > > methods
> > >>>>>>> > > >> to be implemented at the same time. If we have all of the
> > >>>>>>> methods with
> > >>>>>>> > > >> default implementation in the base interface, the API
> > doesn't
> > >>>>>>> give any
> > >>>>>>> > > >> clue
> > >>>>>>> > > >> to the user which set of methods are required to be
> > >>>>>>> implemented at the
> > >>>>>>> > > >> same
> > >>>>>>> > > >> time.
> > >>>>>>> > > >>
> > >>>>>>> > > >> > a) I feel the biggest drawback of decorative interfaces
> is
> > >>>>>>> which
> > >>>>>>> > > >> interface
> > >>>>>>> > > >> > they can decorate and which combinations of multiple
> > >>>>>>> decorative
> > >>>>>>> > > >> interfaces
> > >>>>>>> > > >> > are valid. (...)
> > >>>>>>> > > >> > In the future, if there is a new feature added
> > >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we
> going
> > >>>>>>> to create
> > >>>>>>> > > >> another
> > >>>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
> > >>>>>>> > > >> PrePartitionedAware?
> > >>>>>>> > > >> > Can they be combined? So I think the additional
> decorative
> > >>>>>>> interface
> > >>>>>>> > > >> like
> > >>>>>>> > > >> > withSplitsAlignment actually increases the understanding
> > >>>>>>> cost of
> > >>>>>>> > users
> > >>>>>>> > > >> > because they have to know what decorative interfaces are
> > >>>>>>> there,
> > >>>>>>> > which
> > >>>>>>> > > >> > interface they can decorate and which combinations of
> the
> > >>>>>>> decorative
> > >>>>>>> > > >> > interfaces are valid and which are not. Ideally we want
> to
> > >>>>>>> avoid
> > >>>>>>> > that.
> > >>>>>>> > > >>
> > >>>>>>> > > >> I'm not sure if I understand how embedding default methods
> > in
> > >>>>>>> the base
> > >>>>>>> > > >> interface is solving the problem: what can be combined or
> > >>>>>>> not? If
> > >>>>>>> > there
> > >>>>>>> > > >> are
> > >>>>>>> > > >> two conflicting features, having decorative interfaces
> that
> > >>>>>>> can not be
> > >>>>>>> > > >> mixed together actually makes much more sense to me rather
> > >>>>>>> than having
> > >>>>>>> > > >> them
> > >>>>>>> > > >> all in one base class. How would you allow users to
> > implement
> > >>>>>>> only one
> > >>>>>>> > > of
> > >>>>>>> > > >> those two features?
> > >>>>>>> > > >>
> > >>>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
> > >>>>>>> > > >> - how can a user discover what decorators are there?
> > >>>>>>> > > >> - how can a user know where the decorator can be applied
> to?
> > >>>>>>> > > >>
> > >>>>>>> > > >> However those are drawbacks for more power users, that can
> > be
> > >>>>>>> > mitigated
> > >>>>>>> > > by
> > >>>>>>> > > >> the documentation. For example listing all of the
> decorators
> > >>>>>>> with
> > >>>>>>> > > >> detailed explanation both in the docs and in the java
> docs.
> > >>>>>>> More
> > >>>>>>> > > >> experienced users will be able to deal with those issues
> > >>>>>>> easier, as
> > >>>>>>> > they
> > >>>>>>> > > >> will already have some basic understanding of Flink. Also
> if
> > >>>>>>> user has
> > >>>>>>> > a
> > >>>>>>> > > >> problem that he wants to solve, he will google search a
> > >>>>>>> potential
> > >>>>>>> > > solution
> > >>>>>>> > > >> to his problem anyway, and while doing that he is very
> > likely
> > >>>>>>> to
> > >>>>>>> > > discover
> > >>>>>>> > > >> the decorator that he needs anyway in the docs.
> > >>>>>>> > > >>
> > >>>>>>> > > >> The advantages are:
> > >>>>>>> > > >> - clean and easy to implement base interface
> > >>>>>>> > > >> - all of the methods from a single feature are grouped in
> a
> > >>>>>>> single
> > >>>>>>> > > >> decorator interface, together with their dedicated java
> doc.
> > >>>>>>> It's also
> > >>>>>>> > > >> easier to google search for help using the decorator name
> > >>>>>>> > > >> - if an optional feature requires two methods to be
> > >>>>>>> implemented at
> > >>>>>>> > once,
> > >>>>>>> > > >> decorator can guarantee that
> > >>>>>>> > > >>
> > >>>>>>> > > >> Here I would strongly lean towards making life easier for
> > new
> > >>>>>>> users,
> > >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense
> for
> > >>>>>>> the power
> > >>>>>>> > > >> users.
> > >>>>>>> > > >>
> > >>>>>>> > > >> Best,
> > >>>>>>> > > >> Piotrek
> > >>>>>>> > > >>
> > >>>>>>> > > >>
> > >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <becket.qin@gmail.com
> >
> > >>>>>>> > napisał(a):
> > >>>>>>> > > >>
> > >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > I think Sebastion has a good summary. This is a really
> > >>>>>>> helpful
> > >>>>>>> > > >> discussion.
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > Thinking a bit more, I feel that it might still be
> better
> > >>>>>>> to add the
> > >>>>>>> > > >> > supportsXXX() method in the Source rather than
> > SourceReader.
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > Generally speaking, what we are trying to do here is to
> > let
> > >>>>>>> the
> > >>>>>>> > Flink
> > >>>>>>> > > >> > framework know what the Source is capable of. In this
> > FLIP,
> > >>>>>>> it
> > >>>>>>> > happens
> > >>>>>>> > > >> to
> > >>>>>>> > > >> > be the capability that only involves SourceReader. But
> in
> > >>>>>>> the
> > >>>>>>> > future,
> > >>>>>>> > > >> it is
> > >>>>>>> > > >> > possible that another functionality involves both the
> > >>>>>>> > SplitEnumerator
> > >>>>>>> > > >> and
> > >>>>>>> > > >> > SourceReader. In that case, following the current
> > approach,
> > >>>>>>> we
> > >>>>>>> > should
> > >>>>>>> > > >> put
> > >>>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
> > >>>>>>> SourceReader.
> > >>>>>>> > > >> > Because if we only put this in the SourceReader, then
> the
> > >>>>>>> JM would
> > >>>>>>> > > have
> > >>>>>>> > > >> to
> > >>>>>>> > > >> > create a SourceReader in order to know whether this
> > feature
> > >>>>>>> is
> > >>>>>>> > > >> supported,
> > >>>>>>> > > >> > which is a little ugly. But if we put the
> "supportsXXX()"
> > >>>>>>> method in
> > >>>>>>> > > the
> > >>>>>>> > > >> > Source, we will break the "symmetric" design because
> this
> > >>>>>>> FLIP
> > >>>>>>> > chose a
> > >>>>>>> > > >> > different way.
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > This is also why I think supportsXXX() method seems a
> good
> > >>>>>>> thing to
> > >>>>>>> > > >> have,
> > >>>>>>> > > >> > because when there are a few interfaces / methods that
> are
> > >>>>>>> expected
> > >>>>>>> > to
> > >>>>>>> > > >> be
> > >>>>>>> > > >> > implemented at the same time in order to deliver a
> > feature,
> > >>>>>>> it is
> > >>>>>>> > > always
> > >>>>>>> > > >> > good to have a single source of truth to tell the
> > framework
> > >>>>>>> what to
> > >>>>>>> > > do,
> > >>>>>>> > > >> so
> > >>>>>>> > > >> > the framework can do consistent things in different
> parts.
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader
> +
> > >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is that we
> > are
> > >>>>>>> > > essentially
> > >>>>>>> > > >> > expecting almost all the SourceReader implementations to
> > >>>>>>> extend
> > >>>>>>> > > >> > SourceReaderBase, which effectively makes the
> SourceReader
> > >>>>>>> interface
> > >>>>>>> > > >> > without the pausing support useless. So this indicates
> > that
> > >>>>>>> public
> > >>>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
> > >>>>>>> purpose) only
> > >>>>>>> > > >> > make sense if the original interface is also expected to
> > be
> > >>>>>>> used.
> > >>>>>>> > > >> > Otherwise, it seems makes more sense to add the method
> to
> > >>>>>>> the
> > >>>>>>> > original
> > >>>>>>> > > >> > interface itself.
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > Cheers,
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > Jiangjie (Becket) Qin
> > >>>>>>> > > >> >
> > >>>>>>> > > >> >
> > >>>>>>> > > >> >
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>>>>> > > >> > wrote:
> > >>>>>>> > > >> >
> > >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > I think most of your points aligned with the
> suggestions
> > >>>>>>> I made to
> > >>>>>>> > > the
> > >>>>>>> > > >> > > FLIP, while you were writing your reply (I believe we
> > hit
> > >>>>>>> enter
> > >>>>>>> > > >> nearly at
> > >>>>>>> > > >> > > the same time ;) )
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Two points after we synced offline
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment
> > to
> > >>>>>>> > > >> > > supportsPausingSplits to express the general
> capability
> > of
> > >>>>>>> > pausing.
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > 2. As for if we should
> > >>>>>>> PausingSourceReader/PausingSplitReader
> > >>>>>>> > > (option
> > >>>>>>> > > >> b)
> > >>>>>>> > > >> > > or if we should just add the methods (option c), I
> > >>>>>>> suggest to
> > >>>>>>> > simply
> > >>>>>>> > > >> add
> > >>>>>>> > > >> > > the two methods as I felt this is much preferred
> > approach
> > >>>>>>> Becket,
> > >>>>>>> > > >> which
> > >>>>>>> > > >> > > others do not object. Unless there is an opposition
> > let's
> > >>>>>>> go with
> > >>>>>>> > > this
> > >>>>>>> > > >> > > option c.
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Best,
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Dawid
> > >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Hi folks,
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all the
> > >>>>>>> input and
> > >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time to
> > >>>>>>> catch up and
> > >>>>>>> > > >> > structure
> > >>>>>>> > > >> > > (for myself) the discussion and I wanted to find a way
> > to
> > >>>>>>> > structure
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > > conclusions. (Also because I had the feeling that some
> > >>>>>>> concerns
> > >>>>>>> > got
> > >>>>>>> > > >> lost
> > >>>>>>> > > >> > in
> > >>>>>>> > > >> > > the discussion.) This is my attempt and please correct
> > me
> > >>>>>>> if
> > >>>>>>> > > >> something is
> > >>>>>>> > > >> > > wrong or misunderstood. I tried to collect and
> assemble
> > >>>>>>> the
> > >>>>>>> > > opinions,
> > >>>>>>> > > >> > > suggestions, and conclusions (to the best of my
> > >>>>>>> knowledge):
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > # Top A: Should split alignment (pause/resume
> behavior)
> > >>>>>>> be a
> > >>>>>>> > general
> > >>>>>>> > > >> > > capability?
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > I personally don't see any reason no to have it a
> > general
> > >>>>>>> > capability
> > >>>>>>> > > >> > > because for the alignSplit method it is actually
> > >>>>>>> independent of
> > >>>>>>> > the
> > >>>>>>> > > >> > > watermarks. If we agree here to have it a general
> > >>>>>>> capability, we
> > >>>>>>> > > >> should
> > >>>>>>> > > >> > > also agree on the right wording. Does
> > >>>>>>> "alignSplits(splitsToResume,
> > >>>>>>> > > >> > > splitsToPause)" refer to what is then actually meant?
> (I
> > >>>>>>> see it as
> > >>>>>>> > > >> okay.
> > >>>>>>> > > >> > I
> > >>>>>>> > > >> > > don't have any better idea whilst Arvid suggested
> > >>>>>>> > > >> "pauseOrResumeSplits".)
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > # Top B: Should it be possible do enable/disable split
> > >>>>>>> alignment?
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > I would personally not disable the split alignment on
> > the
> > >>>>>>> source
> > >>>>>>> > > >> reader
> > >>>>>>> > > >> > > side because if split alignment is used for some other
> > >>>>>>> use case
> > >>>>>>> > (see
> > >>>>>>> > > >> A)
> > >>>>>>> > > >> > it
> > >>>>>>> > > >> > > could have nasty side effects on other/future use
> cases.
> > >>>>>>> Instead,
> > >>>>>>> > I
> > >>>>>>> > > >> would
> > >>>>>>> > > >> > > disable "watermark split alignment" where I think it
> > >>>>>>> should
> > >>>>>>> > disable
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > > watermark-dependent trigger for split alignment.
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > I find it difficult to define the scope of a supportsX
> > >>>>>>> method
> > >>>>>>> > w.r.t.
> > >>>>>>> > > >> to
> > >>>>>>> > > >> > > the following questions: a) Where is it used? and b)
> > What
> > >>>>>>> is the
> > >>>>>>> > > >> expected
> > >>>>>>> > > >> > > output? To b), it's not straight-forward to provide a
> > >>>>>>> meaningful
> > >>>>>>> > > >> output,
> > >>>>>>> > > >> > > e.g., if SourceReader supports split alignment but
> > >>>>>>> SplitReader
> > >>>>>>> > not.
> > >>>>>>> > > >> This
> > >>>>>>> > > >> > is
> > >>>>>>> > > >> > > because with the current implementation, we can
> > determine
> > >>>>>>> whether
> > >>>>>>> > > >> split
> > >>>>>>> > > >> > > alignment is fully supported only during runtime and
> > >>>>>>> specifically
> > >>>>>>> > > >> > actually
> > >>>>>>> > > >> > > only when calling alignSplits down the call hierarchy
> up
> > >>>>>>> to the
> > >>>>>>> > > actual
> > >>>>>>> > > >> > > SplitReaders.
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Therefore, I would suggest to either raise an error or
> > >>>>>>> warning if
> > >>>>>>> > > the
> > >>>>>>> > > >> > > alignment is called but not supported at some point. I
> > >>>>>>> know we
> > >>>>>>> > > should
> > >>>>>>> > > >> > > carefully think about when this could be the case
> > because
> > >>>>>>> we don't
> > >>>>>>> > > >> want
> > >>>>>>> > > >> > to
> > >>>>>>> > > >> > > flood anybody with such warnings. However, warnings
> > could
> > >>>>>>> be an
> > >>>>>>> > > >> indicator
> > >>>>>>> > > >> > > for the user that for watermark split alignment use
> case
> > >>>>>>> split
> > >>>>>>> > > >> reading is
> > >>>>>>> > > >> > > imbalanced with the conclusion to either disable the
> > >>>>>>> trigger for
> > >>>>>>> > > >> > watermark
> > >>>>>>> > > >> > > split alignment (see Top B) or to use/implement a
> source
> > >>>>>>> and
> > >>>>>>> > reader
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > > fully supports split alignment.
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > # Top D: How to design interfaces?
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Thanks for structuring the discussion with the the
> > various
> > >>>>>>> > > >> possibilities
> > >>>>>>> > > >> > > (a-d). From the discussion and emails, I would like to
> > >>>>>>> summarize
> > >>>>>>> > the
> > >>>>>>> > > >> > > following requirements:
> > >>>>>>> > > >> > > - Interfaces should be consistent ("symmetric"), i.e.,
> > >>>>>>> similar
> > >>>>>>> > > >> semantics
> > >>>>>>> > > >> > > should have similar interfaces with similar usage.
> > >>>>>>> > > >> > > - Make explicit which implementations implement
> > >>>>>>> interfaces/support
> > >>>>>>> > > >> > > behavior.
> > >>>>>>> > > >> > > - Make clear what are default implementations and how
> to
> > >>>>>>> implement
> > >>>>>>> > > >> > > interfaces with desired behavior.
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > This is a simplified view of the relations between
> > >>>>>>> relevant
> > >>>>>>> > classes
> > >>>>>>> > > of
> > >>>>>>> > > >> > the
> > >>>>>>> > > >> > > PoC implementation:
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal)
> > >>>>>>> <|-- ..
> > >>>>>>> > <|--
> > >>>>>>> > > >> > > MySourceReader
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal)
> <>--
> > >>>>>>> > SplitFetcher
> > >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|--
> MySplitReader
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
> > >>>>>>> implement most
> > >>>>>>> > of
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > > "logic" for split alignment just because we wanted to
> > >>>>>>> implement
> > >>>>>>> > > split
> > >>>>>>> > > >> > > alignment and wanted it to be available as kind of a
> > >>>>>>> default. As a
> > >>>>>>> > > >> > > consequence, we have a "default implementation" for
> > >>>>>>> SourceReader
> > >>>>>>> > > that
> > >>>>>>> > > >> > > implements the actual logic for split alignment. For
> > that
> > >>>>>>> reason,
> > >>>>>>> > I
> > >>>>>>> > > >> find
> > >>>>>>> > > >> > it
> > >>>>>>> > > >> > > very confusing to have a NOOP default implementation
> in
> > >>>>>>> the
> > >>>>>>> > > interface
> > >>>>>>> > > >> for
> > >>>>>>> > > >> > > the SourceReader. As a consequence, interface strategy
> > c)
> > >>>>>>> is
> > >>>>>>> > > difficult
> > >>>>>>> > > >> > > because this would require NOOP default
> implementations
> > >>>>>>> in the
> > >>>>>>> > > public
> > >>>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is
> the
> > >>>>>>> same for
> > >>>>>>> > > >> strategy
> > >>>>>>> > > >> > > d) because it would require NOOP default
> implementation
> > >>>>>>> in the
> > >>>>>>> > > >> > > SourceReader. Further, as Dawid described method
> > >>>>>>> signatures of
> > >>>>>>> > > >> alignSplit
> > >>>>>>> > > >> > > for SourceReader and SplitReader differ and it would
> be
> > >>>>>>> extremely
> > >>>>>>> > > >> > difficult
> > >>>>>>> > > >> > > to make the signatures the same (with even potential
> > >>>>>>> performance
> > >>>>>>> > > >> impact
> > >>>>>>> > > >> > > because of additional loop-ups of split ids).
> Therefore,
> > >>>>>>> having a
> > >>>>>>> > > >> > symmetric
> > >>>>>>> > > >> > > decorative interface as of strategy a) is actually not
> > >>>>>>> possible
> > >>>>>>> > and
> > >>>>>>> > > >> > having
> > >>>>>>> > > >> > > two decorative interfaces with different method
> > >>>>>>> signatures is
> > >>>>>>> > > >> confusing.
> > >>>>>>> > > >> > My
> > >>>>>>> > > >> > > conclusion is that we are best with strategy b) which
> > >>>>>>> means to
> > >>>>>>> > have
> > >>>>>>> > > >> > > specializing sub-interfaces that inherit from the
> parent
> > >>>>>>> > interface:
> > >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader
> <|--
> > >>>>>>> > > >> > AlignedSplitReader
> > >>>>>>> > > >> > > With this option, I'm not 100% sure what the
> > implications
> > >>>>>>> are and
> > >>>>>>> > if
> > >>>>>>> > > >> this
> > >>>>>>> > > >> > > could get nasty. I would suggest that Dawid and I just
> > >>>>>>> try to
> > >>>>>>> > > >> implement
> > >>>>>>> > > >> > and
> > >>>>>>> > > >> > > see if we like it. :)
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > # Summary
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > In conclusion, please let me know your perspectives.
> > >>>>>>> Please
> > >>>>>>> > correct
> > >>>>>>> > > >> me,
> > >>>>>>> > > >> > if
> > >>>>>>> > > >> > > something is wrong or if I misunderstood something. My
> > >>>>>>> perspective
> > >>>>>>> > > >> would
> > >>>>>>> > > >> > be:
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Top A: Yes
> > >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
> > >>>>>>> alignment)
> > >>>>>>> > > >> > > Top C: No
> > >>>>>>> > > >> > > Top D: b)
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > Best,
> > >>>>>>> > > >> > > Sebastian
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> > >>>>>>> > > >> dwysakowicz@apache.org
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > > wrote:
> > >>>>>>> > > >> > >
> > >>>>>>> > > >> > >> @Arvid:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
> fear
> > >>>>>>> that it
> > >>>>>>> > > >> doesn't
> > >>>>>>> > > >> > work
> > >>>>>>> > > >> > >> for this particular use case: Sources can always be
> > >>>>>>> aligned
> > >>>>>>> > > >> cross-task
> > >>>>>>> > > >> > and
> > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> > >>>>>>> plausible to put
> > >>>>>>> > > >> > sources
> > >>>>>>> > > >> > >> into an alignment group even though they do not use
> any
> > >>>>>>> of the
> > >>>>>>> > > >> presented
> > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
> > >>>>>>> they handle
> > >>>>>>> > > >> > multiple
> > >>>>>>> > > >> > >> splits (see motivation section).
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for telling
> > if
> > >>>>>>> it
> > >>>>>>> > > supports
> > >>>>>>> > > >> > that
> > >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment would
> always
> > >>>>>>> be
> > >>>>>>> > > supported.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I updated interfaces to what I believe to be closest
> > to a
> > >>>>>>> > consensus
> > >>>>>>> > > >> > >> between all participants. Do you mind taking a look?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Dawid
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks for pushing this effort.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree that
> > >>>>>>> decorator
> > >>>>>>> > > >> interfaces
> > >>>>>>> > > >> > >> should be avoided but I'm also not a big fan of
> > >>>>>>> overloading the
> > >>>>>>> > > base
> > >>>>>>> > > >> > >> interfaces (they are hard to implement as is). The
> > usual
> > >>>>>>> feedback
> > >>>>>>> > > to
> > >>>>>>> > > >> > >> Source-related interfaces are always that they are
> > >>>>>>> overwhelming
> > >>>>>>> > and
> > >>>>>>> > > >> too
> > >>>>>>> > > >> > >> hard to implement. However, I'd also not oppose 1c)
> as
> > >>>>>>> scattered
> > >>>>>>> > > >> > interfaces
> > >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> > >>>>>>> > > >> > >> While I also like Becket's capability approach, I
> fear
> > >>>>>>> that it
> > >>>>>>> > > >> doesn't
> > >>>>>>> > > >> > work
> > >>>>>>> > > >> > >> for this particular use case: Sources can always be
> > >>>>>>> aligned
> > >>>>>>> > > >> cross-task
> > >>>>>>> > > >> > and
> > >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> > >>>>>>> plausible to put
> > >>>>>>> > > >> > sources
> > >>>>>>> > > >> > >> into an alignment group even though they do not use
> any
> > >>>>>>> of the
> > >>>>>>> > > >> presented
> > >>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
> > >>>>>>> they handle
> > >>>>>>> > > >> > multiple
> > >>>>>>> > > >> > >> splits (see motivation section).
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I think renaming alignSplits to facilitate future use
> > >>>>>>> cases makes
> > >>>>>>> > > >> sense
> > >>>>>>> > > >> > but
> > >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be
> > adjusted
> > >>>>>>> > > >> accordingly.
> > >>>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader and
> > I'd
> > >>>>>>> go for
> > >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We
> could
> > >>>>>>> also
> > >>>>>>> > split
> > >>>>>>> > > it
> > >>>>>>> > > >> > into
> > >>>>>>> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits
> > >>>>>>> may allow
> > >>>>>>> > > >> Sources
> > >>>>>>> > > >> > to
> > >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as written in
> > the
> > >>>>>>> > Javadoc),
> > >>>>>>> > > >> both
> > >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if
> there
> > >>>>>>> is a
> > >>>>>>> > system
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >> can.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Some nit for the FLIP:
> > >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> > >>>>>>> > > >> > >> - Not sure if it's worth it in the capability
> section:
> > >>>>>>> Sources
> > >>>>>>> > that
> > >>>>>>> > > >> > adopt
> > >>>>>>> > > >> > >> this interface cannot be used in earlier versions. So
> > it
> > >>>>>>> feels
> > >>>>>>> > like
> > >>>>>>> > > >> we
> > >>>>>>> > > >> > are
> > >>>>>>> > > >> > >> only forward compatible (old sources can be used
> after
> > >>>>>>> the
> > >>>>>>> > change);
> > >>>>>>> > > >> but
> > >>>>>>> > > >> > I
> > >>>>>>> > > >> > >> guess this holds for any API addition.
> > >>>>>>> > > >> > >> - You might want to add what happens when all splits
> > are
> > >>>>>>> paused.
> > >>>>>>> > > >> > >> - You may want to describe how the 3 flavors of
> > >>>>>>> SourceReaderBase
> > >>>>>>> > > >> > interact
> > >>>>>>> > > >> > >> with the interface.
> > >>>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka and
> > >>>>>>> Pulsar in
> > >>>>>>> > the
> > >>>>>>> > > >> > FLIP.
> > >>>>>>> > > >> > >> For me, this is rather immediate follow-up work.
> (could
> > >>>>>>> be in the
> > >>>>>>> > > >> same
> > >>>>>>> > > >> > >> umbrella ticket)
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Arvid
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >>>>>>> > > >> > >> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > >>>>>>> WithSplitsAlignment",
> > >>>>>>> > > >> along
> > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > >>>>>>> WithSplitsAlignment",
> > >>>>>>> > > or
> > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
> and
> > >>>>>>> > > "MySplitReader
> > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> > >>>>>>> "MySplitReader
> > >>>>>>> > > >> > implements
> > >>>>>>> > > >> > >> SplitReader".
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I think the latest proposal according to Dawid would
> > be:
> > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> > >>>>>>> "MySplitReader
> > >>>>>>> > > >> > implements
> > >>>>>>> > > >> > >> AlignedSplitReader".
> > >>>>>>> > > >> > >> I am fine with this API, although personally
> speaking I
> > >>>>>>> think it
> > >>>>>>> > is
> > >>>>>>> > > >> > simpler
> > >>>>>>> > > >> > >> to just add a new method to the split reader with
> > >>>>>>> default impl.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I think that is a good idea to have it aligned as
> much
> > as
> > >>>>>>> > possible.
> > >>>>>>> > > >> I'd
> > >>>>>>> > > >> > be
> > >>>>>>> > > >> > >> +1 for your option c). We can merge
> AlignedSplitReader
> > >>>>>>> with
> > >>>>>>> > > >> > SplitReader. We
> > >>>>>>> > > >> > >> will update the FLIP shortly.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Dawid
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks for the comment, Jark.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Can the interface be used to align other things in
> the
> > >>>>>>> future?
> > >>>>>>> > For
> > >>>>>>> > > >> > example,
> > >>>>>>> > > >> > >> align read speed, I have
> > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> feature
> > >>>>>>> may also
> > >>>>>>> > > need
> > >>>>>>> > > >> an
> > >>>>>>> > > >> > >> interface like this.
> > >>>>>>> > > >> > >> If we don't plan to extend this interface to support
> > >>>>>>> align other
> > >>>>>>> > > >> > things, I
> > >>>>>>> > > >> > >> suggest explicitly declaring
> > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >>>>>>> `alignWatermarksForSplits`
> > >>>>>>> > > >> instead
> > >>>>>>> > > >> > of
> > >>>>>>> > > >> > >> `alignSplits`.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> This is a good point. Naming wise, it would usually
> be
> > >>>>>>> more
> > >>>>>>> > > >> extensible
> > >>>>>>> > > >> > to
> > >>>>>>> > > >> > >> just describe what the method actually does, instead
> of
> > >>>>>>> assuming
> > >>>>>>> > > the
> > >>>>>>> > > >> > >> purpose of doing this. For example, in this case,
> > >>>>>>> > > >> pauseOrResumeSplits()
> > >>>>>>> > > >> > >> would be more extensible because this can be used for
> > >>>>>>> any kind of
> > >>>>>>> > > >> flow
> > >>>>>>> > > >> > >> control, be it watermark alignment or simple rate
> > >>>>>>> limiting.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 4. Interface or Method.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think they
> > have
> > >>>>>>> their
> > >>>>>>> > own
> > >>>>>>> > > >> > >> advantages.
> > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
> > >>>>>>> abilities
> > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
> > >>>>>>> case. When
> > >>>>>>> > you
> > >>>>>>> > > >> have
> > >>>>>>> > > >> > a
> > >>>>>>> > > >> > >> bunch of abilities and each ability
> > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> > >>>>>>> organize them
> > >>>>>>> > and
> > >>>>>>> > > >> make
> > >>>>>>> > > >> > >> users clear which methods
> > >>>>>>> > > >> > >> need to implement when you want to have an ability.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I am OK with decorative interfaces if this is a
> general
> > >>>>>>> design
> > >>>>>>> > > >> pattern
> > >>>>>>> > > >> > in
> > >>>>>>> > > >> > >> the other components in Flink. But it looks like the
> > >>>>>>> current API
> > >>>>>>> > > >> > proposal
> > >>>>>>> > > >> > >> is not symmetric.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> The current proposal is essentially "MySourceReader
> > >>>>>>> implements
> > >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
> > >>>>>>> "MySplitReader
> > >>>>>>> > > >> implements
> > >>>>>>> > > >> > >> AlignedSplitsReader".
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Should we make the API symmetric? I'd consider any
> one
> > >>>>>>> of the
> > >>>>>>> > > >> following
> > >>>>>>> > > >> > as
> > >>>>>>> > > >> > >> symmetric.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> > >>>>>>> WithSplitsAlignment",
> > >>>>>>> > > >> along
> > >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> > >>>>>>> WithSplitsAlignment",
> > >>>>>>> > > or
> > >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader"
> and
> > >>>>>>> > > "MySplitReader
> > >>>>>>> > > >> > >> implements AlignedSplitReader", or
> > >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> > >>>>>>> "MySplitReader
> > >>>>>>> > > >> > implements
> > >>>>>>> > > >> > >> SplitReader".
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I think the latest proposal according to Dawid would
> > be:
> > >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> > >>>>>>> "MySplitReader
> > >>>>>>> > > >> > implements
> > >>>>>>> > > >> > >> AlignedSplitReader".
> > >>>>>>> > > >> > >> I am fine with this API, although personally
> speaking I
> > >>>>>>> think it
> > >>>>>>> > is
> > >>>>>>> > > >> > simpler
> > >>>>>>> > > >> > >> to just add a new method to the split reader with
> > >>>>>>> default impl.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
> > >>>>>>> > dwysakowicz@apache.org
> > >>>>>>> > > >
> > >>>>>>> > > >> <
> > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>,
> thanks
> > >>>>>>> for the
> > >>>>>>> > > reply.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Having said that, as I don't have a preference and I
> > >>>>>>> agree most
> > >>>>>>> > of
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> sources will support the alignment I am fine
> following
> > >>>>>>> your
> > >>>>>>> > > >> suggestion
> > >>>>>>> > > >> > to
> > >>>>>>> > > >> > >> have the SourceReader extending from
> > >>>>>>> > WithWatermarksSplitsAlignment,
> > >>>>>>> > > >> but
> > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source
> to
> > >>>>>>> keep the
> > >>>>>>> > > two
> > >>>>>>> > > >> > >> methods together.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source is
> > >>>>>>> that this
> > >>>>>>> > > allows
> > >>>>>>> > > >> > some
> > >>>>>>> > > >> > >> compile time check. For example, if a user enabled
> > >>>>>>> watermark
> > >>>>>>> > > >> alignment
> > >>>>>>> > > >> > >> while it is not supported by the Source, an exception
> > >>>>>>> can be
> > >>>>>>> > thrown
> > >>>>>>> > > >> at
> > >>>>>>> > > >> > >> compile time. It seems in general useful. That said,
> I
> > >>>>>>> agree that
> > >>>>>>> > > API
> > >>>>>>> > > >> > >> cleanliness wise it is better to put the two methods
> > >>>>>>> together.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
> > >>>>>>> imjark@gmail.com> <
> > >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did
> > >>>>>>> miss it.
> > >>>>>>> > > >> > >> I don't have other concerns then.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >> Jark
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
> > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>>>>> > > >> > >> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Jark:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
> when
> > >>>>>>> the
> > >>>>>>> > source
> > >>>>>>> > > >> > >> implements the interface?
> > >>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment
> > even
> > >>>>>>> if Kafka
> > >>>>>>> > > >> > >> implements the interface,
> > >>>>>>> > > >> > >> and this will affect the throughput somehow. I agree
> > >>>>>>> with Becket
> > >>>>>>> > > >> > >> we may need a
> > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> > configure
> > >>>>>>> the
> > >>>>>>> > source
> > >>>>>>> > > >> to
> > >>>>>>> > > >> > >> enable/disable the alignment.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 2. How does the framework calculate
> > maxDesiredWatermark?
> > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> > >>>>>>> greatly affect
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> throughput
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> if the reader is constantly
> > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> > configure
> > >>>>>>> the
> > >>>>>>> > > >> alignment
> > >>>>>>> > > >> > >> offset?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> This is covered in the previous FLIP[1] which has
> been
> > >>>>>>> already
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> implemented
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the
> > watermark
> > >>>>>>> strategy
> > >>>>>>> > > >> which
> > >>>>>>> > > >> > >> also configures drift and update interval.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> If we don't plan to extend this interface to support
> > >>>>>>> align other
> > >>>>>>> > > >> things,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> suggest explicitly declaring
> > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >>>>>>> `alignWatermarksForSplits`
> > >>>>>>> > > >> instead
> > >>>>>>> > > >> > of
> > >>>>>>> > > >> > >> `alignSplits`.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Sure let's rename it.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Becket:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I understand your point. On the other hand putting
> all
> > >>>>>>> methods,
> > >>>>>>> > > even
> > >>>>>>> > > >> > with
> > >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain features,
> > >>>>>>> makes the
> > >>>>>>> > > entry
> > >>>>>>> > > >> > >> threshold for writing a new source higher. Instead of
> > >>>>>>> focusing on
> > >>>>>>> > > the
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> basic
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> and required properties of the Source, the person
> > >>>>>>> implementing a
> > >>>>>>> > > >> source
> > >>>>>>> > > >> > >> must bother with and need to figure out what all of
> the
> > >>>>>>> extra
> > >>>>>>> > > >> features
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> are
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> about and how to deal with them. It makes it also
> > harder
> > >>>>>>> to
> > >>>>>>> > > organize
> > >>>>>>> > > >> > >> methods in coupled groups as Jark said.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Having said that, as I don't have a preference and I
> > >>>>>>> agree most
> > >>>>>>> > of
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > >> sources will support the alignment I am fine
> following
> > >>>>>>> your
> > >>>>>>> > > >> suggestion
> > >>>>>>> > > >> > to
> > >>>>>>> > > >> > >> have the SourceReader extending from
> > >>>>>>> > WithWatermarksSplitsAlignment,
> > >>>>>>> > > >> but
> > >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source
> to
> > >>>>>>> keep the
> > >>>>>>> > > two
> > >>>>>>> > > >> > >> methods together.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
> > >>>>>>> "alignSplits"
> > >>>>>>> > methods
> > >>>>>>> > > >> > differ
> > >>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The reason
> > for
> > >>>>>>> that is
> > >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
> > >>>>>>> SplitReader
> > >>>>>>> > > needs
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > >> actual splits to pause them. I found the discrepancy
> > >>>>>>> acceptable
> > >>>>>>> > for
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > >> sake of simplifying changes significantly, especially
> > as
> > >>>>>>> they
> > >>>>>>> > would
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> highly
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> likely impact performance as we would have to perform
> > >>>>>>> additional
> > >>>>>>> > > >> > lookups.
> > >>>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Dawid
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I just have some minor questions (maybe I missed
> > >>>>>>> something).
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 1. Will the framework always align with watermarks
> when
> > >>>>>>> the
> > >>>>>>> > source
> > >>>>>>> > > >> > >> implements the interface?
> > >>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment
> > even
> > >>>>>>> if Kafka
> > >>>>>>> > > >> > >> implements the interface,
> > >>>>>>> > > >> > >> and this will affect the throughput somehow. I agree
> > >>>>>>> with Becket
> > >>>>>>> > > >> > >> we may need a
> > >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> > configure
> > >>>>>>> the
> > >>>>>>> > source
> > >>>>>>> > > >> to
> > >>>>>>> > > >> > >> enable/disable the alignment.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 2. How does the framework calculate
> > maxDesiredWatermark?
> > >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> > >>>>>>> greatly affect
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> throughput
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> if the reader is constantly
> > >>>>>>> > > >> > >>  switching between pause and resume. Can users
> > configure
> > >>>>>>> the
> > >>>>>>> > > >> alignment
> > >>>>>>> > > >> > >> offset?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 3. Interface/Method Name.
> > >>>>>>> > > >> > >> Can the interface be used to align other things in
> the
> > >>>>>>> future?
> > >>>>>>> > For
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> example,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> align read speed, I have
> > >>>>>>> > > >> > >> seen users requesting global rate limits. This
> feature
> > >>>>>>> may also
> > >>>>>>> > > need
> > >>>>>>> > > >> an
> > >>>>>>> > > >> > >> interface like this.
> > >>>>>>> > > >> > >> If we don't plan to extend this interface to support
> > >>>>>>> align other
> > >>>>>>> > > >> things,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> suggest explicitly declaring
> > >>>>>>> > > >> > >> the purpose of the methods, such as
> > >>>>>>> `alignWatermarksForSplits`
> > >>>>>>> > > >> instead
> > >>>>>>> > > >> > of
> > >>>>>>> > > >> > >> `alignSplits`.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 4. Interface or Method.
> > >>>>>>> > > >> > >> I don't have a strong opinion on this. I think they
> > have
> > >>>>>>> their
> > >>>>>>> > own
> > >>>>>>> > > >> > >> advantages.
> > >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
> > >>>>>>> abilities
> > >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
> > >>>>>>> case. When
> > >>>>>>> > you
> > >>>>>>> > > >> have
> > >>>>>>> > > >> > a
> > >>>>>>> > > >> > >> bunch of abilities and each ability
> > >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> > >>>>>>> organize them
> > >>>>>>> > and
> > >>>>>>> > > >> make
> > >>>>>>> > > >> > >> users clear which methods
> > >>>>>>> > > >> > >> need to implement when you want to have an ability.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >> Jark
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
> > >>>>>>> becket.qin@gmail.com>
> > >>>>>>> > <
> > >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
> > >>>>>>> becket.qin@gmail.com>
> > >>>>>>> > <
> > >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Hi Dawid,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks for the explanation. Apologies that I somehow
> > >>>>>>> misread a
> > >>>>>>> > > bunch
> > >>>>>>> > > >> of
> > >>>>>>> > > >> > >> "align" and thought they were "assign".
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Regarding 1, by default implementation, I was
> thinking
> > >>>>>>> of the
> > >>>>>>> > > default
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> no-op
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> implementation. I am a little worried about the
> > >>>>>>> proliferation of
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> decorative
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> interfaces. I think the most important thing about
> > >>>>>>> interfaces is
> > >>>>>>> > > that
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> they
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> are easy to understand. In this case, I prefer adding
> > >>>>>>> new method
> > >>>>>>> > to
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > >> existing interface for the following reasons:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> a) I feel the biggest drawback of decorative
> interfaces
> > >>>>>>> is which
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> interface
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> they can decorate and which combinations of multiple
> > >>>>>>> decorative
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> interfaces
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> are valid. In the current FLIP, the
> withSplitsAlignment
> > >>>>>>> interface
> > >>>>>>> > > is
> > >>>>>>> > > >> > only
> > >>>>>>> > > >> > >> applicable to the SourceReader which means it can't
> > >>>>>>> decorate any
> > >>>>>>> > > >> other
> > >>>>>>> > > >> > >> interface. From an interface design perspective, a
> > >>>>>>> natural
> > >>>>>>> > question
> > >>>>>>> > > >> is
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> why
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> > >>>>>>> "withSplitsAlignment"? And it
> > >>>>>>> > > is
> > >>>>>>> > > >> > also
> > >>>>>>> > > >> > >> natural to assume that a split reader implementing
> both
> > >>>>>>> > SplitReader
> > >>>>>>> > > >> and
> > >>>>>>> > > >> > >> WithSplitAlignment would work, because a source
> reader
> > >>>>>>> > implementing
> > >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why
> > isn't
> > >>>>>>> there an
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> interface
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if there is a
> > new
> > >>>>>>> feature
> > >>>>>>> > > >> added
> > >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we
> > >>>>>>> going to
> > >>>>>>> > create
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> another
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> interface of SplitReader such as SortedSplitReader or
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> PrePartitionedAware?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Can they be combined? So I think the additional
> > >>>>>>> decorative
> > >>>>>>> > > interface
> > >>>>>>> > > >> > like
> > >>>>>>> > > >> > >> withSplitsAlignment actually increases the
> > understanding
> > >>>>>>> cost of
> > >>>>>>> > > >> users
> > >>>>>>> > > >> > >> because they have to know what decorative interfaces
> > are
> > >>>>>>> there,
> > >>>>>>> > > which
> > >>>>>>> > > >> > >> interface they can decorate and which combinations of
> > the
> > >>>>>>> > > decorative
> > >>>>>>> > > >> > >> interfaces are valid and which are not. Ideally we
> want
> > >>>>>>> to avoid
> > >>>>>>> > > >> that.
> > >>>>>>> > > >> > To
> > >>>>>>> > > >> > >> be clear, I am not opposing having an interface of
> > >>>>>>> > > >> withSplitsAlignment,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> it
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> is completely OK to have it as an internal interface
> > and
> > >>>>>>> let
> > >>>>>>> > > >> > SourceReader
> > >>>>>>> > > >> > >> and SplitReader both extend it.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> b) Adding a new method to the SourceReader with a
> > default
> > >>>>>>> > > >> implementation
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> of
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> no-op would help avoid logic branching in the source
> > >>>>>>> logic,
> > >>>>>>> > > >> especially
> > >>>>>>> > > >> > >> given that we agree that the vast majority of the
> > >>>>>>> SourceReader
> > >>>>>>> > > >> > >> implementations, if not all, would just extend from
> the
> > >>>>>>> > > >> > SourceReaderBase.
> > >>>>>>> > > >> > >> That means adding a new method to the interface would
> > >>>>>>> effectively
> > >>>>>>> > > >> give
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> the
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> same user experience, but simpler.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> c) A related design principle that may be worth
> > >>>>>>> discussing is how
> > >>>>>>> > > do
> > >>>>>>> > > >> we
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> let
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> the Source implementations tell Flink what capability
> > is
> > >>>>>>> > supported
> > >>>>>>> > > >> and
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> what
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> is not. Personally speaking I feel the most intuitive
> > >>>>>>> place to me
> > >>>>>>> > > is
> > >>>>>>> > > >> in
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> the
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Source itself, because that is the entrance of the
> > >>>>>>> entire Source
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> connector
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> logic.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
> > >>>>>>> following
> > >>>>>>> > > >> interface
> > >>>>>>> > > >> > >> would be easier to understand by the users.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal interface,
> > >>>>>>> let both
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> SourceReader
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> and SplitReader extend from it, with a default no-op
> > >>>>>>> > > implementation.
> > >>>>>>> > > >> > >> - Add a new method "boolean supportSplitsAlignment()"
> > to
> > >>>>>>> the
> > >>>>>>> > Source
> > >>>>>>> > > >> > >> interface, with a default implementation returning
> > >>>>>>> false. Sources
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> have
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> implemented the alignment logic can change this to
> > >>>>>>> return true,
> > >>>>>>> > and
> > >>>>>>> > > >> > >> override the alignSplits() methods in the
> SourceReader
> > /
> > >>>>>>> > > SplitReader
> > >>>>>>> > > >> if
> > >>>>>>> > > >> > >> needed.
> > >>>>>>> > > >> > >> - In the future, if a new optional feature is going
> to
> > >>>>>>> be added
> > >>>>>>> > to
> > >>>>>>> > > >> the
> > >>>>>>> > > >> > >> Source, and that feature requires the awareness from
> > >>>>>>> Flink, we
> > >>>>>>> > can
> > >>>>>>> > > >> add
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> more
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> such methods to the Source.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> What do you think?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
> > >>>>>>> > > >> dwysakowicz@apache.org>
> > >>>>>>> > > >> > <dw...@apache.org>
> > >>>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
> > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Konstantin:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> > interface
> > >>>>>>> (aka the
> > >>>>>>> > > >> stop &
> > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> > >>>>>>> Pulsar only,
> > >>>>>>> > > >> correct?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Correct, as far as I know though, those are the only
> > >>>>>>> sources
> > >>>>>>> > which
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> consume
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> concurrently from multiple splits and thus alignment
> > >>>>>>> applies.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Thomas:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > >>>>>>> SourceReaderBase and
> > >>>>>>> > > then
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> doing
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> nothing if the split reader does not implement
> > >>>>>>> AlignedSplitReader
> > >>>>>>> > > >> could
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> be
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead
> be
> > >>>>>>> added to
> > >>>>>>> > the
> > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
> make
> > >>>>>>> it
> > >>>>>>> > explicit
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >> the source actually supports it.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I understand your concern. Hmm, I think we could
> > >>>>>>> actually do
> > >>>>>>> > that.
> > >>>>>>> > > >> Given
> > >>>>>>> > > >> > >> the actual implementation of the
> > >>>>>>> SourceReaderBase#alignSplits is
> > >>>>>>> > > >> rather
> > >>>>>>> > > >> > >> short (just a forward to the corresponding method of
> > >>>>>>> > SplitFetcher),
> > >>>>>>> > > >> we
> > >>>>>>> > > >> > >> could reimplement it in the actual source
> > >>>>>>> implementations. This
> > >>>>>>> > > >> solution
> > >>>>>>> > > >> > >> has the downside though. Authors of new sources would
> > >>>>>>> have to do
> > >>>>>>> > > two
> > >>>>>>> > > >> > >> things: extend from AlignedSplitReader and implement
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> WithSplitsAssignment,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> instead of just extending AlignedSplitReader. I would
> > be
> > >>>>>>> fine
> > >>>>>>> > with
> > >>>>>>> > > >> such
> > >>>>>>> > > >> > a
> > >>>>>>> > > >> > >> tradeoff though. What others think?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Steven:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> For this part from the motivation section, is it
> > >>>>>>> accurate? Let's
> > >>>>>>> > > >> assume
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> one
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
> > >>>>>>> partition
> > >>>>>>> > is
> > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
> for
> > >>>>>>> this
> > >>>>>>> > source
> > >>>>>>> > > >> task
> > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
> other
> > >>>>>>> two Kafka
> > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
> overall
> > >>>>>>> > watermark.
> > >>>>>>> > > I
> > >>>>>>> > > >> > >> thought the problem is that we may have late data for
> > >>>>>>> this slow
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> partition.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> It will hold back the watermark. Watermark of an
> > >>>>>>> operator is the
> > >>>>>>> > > >> minimum
> > >>>>>>> > > >> > >> of watermarks of all splits[1]
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I have another question about the restart. Say split
> > >>>>>>> alignment is
> > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> > >>>>>>> restored from
> > >>>>>>> > > the
> > >>>>>>> > > >> > last
> > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> > >>>>>>> checkpointed,
> > >>>>>>> > > initially
> > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
> > >>>>>>> watermark
> > >>>>>>> > > >> > aggregation
> > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
> > >>>>>>> problem. Just
> > >>>>>>> > > >> want
> > >>>>>>> > > >> > to
> > >>>>>>> > > >> > >> understand it more.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Your understanding is correct.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> @Becket:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general
> use
> > >>>>>>> case, so
> > >>>>>>> > > >> should
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> we
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> just add the related methods to SourceReader directly
> > >>>>>>> instead of
> > >>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment?
> > We
> > >>>>>>> can
> > >>>>>>> > > provide
> > >>>>>>> > > >> > >> default implementations, so backwards compatibility
> > >>>>>>> won't be an
> > >>>>>>> > > >> issue.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I don't think we can provide a default
> implementation.
> > >>>>>>> How would
> > >>>>>>> > we
> > >>>>>>> > > >> do
> > >>>>>>> > > >> > >> that? Would it be just a no-op? Is it better than
> > having
> > >>>>>>> an
> > >>>>>>> > opt-in
> > >>>>>>> > > >> > >> interface? The default implementation would have to
> be
> > >>>>>>> added
> > >>>>>>> > > >> exclusively
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> in
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> a *Public* SourceReader interface. By the way notice
> > >>>>>>> > > SourceReaderBase
> > >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so effectively
> > all
> > >>>>>>> > > >> implementations
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> do
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> handle the alignment case. To be honest I think it is
> > >>>>>>> impossible
> > >>>>>>> > to
> > >>>>>>> > > >> > >> implement the SourceReader interface directly by end
> > >>>>>>> users.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> probably
> > >>>>>>> also
> > >>>>>>> > needs
> > >>>>>>> > > >> some
> > >>>>>>> > > >> > >> change to support throttling at the split
> granularity.
> > >>>>>>> Can you
> > >>>>>>> > add
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >> interface change into the public interface section as
> > >>>>>>> well?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> It has been added from the beginning. See
> > >>>>>>> *AlignedSplitReader.*
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> assignSplits
> > >>>>>>> here,
> > >>>>>>> > given
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> it
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> is not actually changing the split assignments? It
> > seems
> > >>>>>>> > something
> > >>>>>>> > > >> like
> > >>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is
> > >>>>>>> more
> > >>>>>>> > > accurate.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do you
> > >>>>>>> still
> > >>>>>>> > prefer
> > >>>>>>> > > a
> > >>>>>>> > > >> > >> different name for that? Personally, I am open for
> > >>>>>>> suggestions
> > >>>>>>> > > here.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Dawid
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> [1]
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> >
> > >>>>>>> > > >>
> > >>>>>>> > >
> > >>>>>>> >
> > >>>>>>>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think the
> > >>>>>>> motivation
> > >>>>>>> > > >> makes a
> > >>>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general
> use
> > >>>>>>> case, so
> > >>>>>>> > > >> should
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> we
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> just add the related methods to SourceReader directly
> > >>>>>>> instead of
> > >>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment?
> > We
> > >>>>>>> can
> > >>>>>>> > > provide
> > >>>>>>> > > >> > >> default implementations, so backwards compatibility
> > >>>>>>> won't be an
> > >>>>>>> > > >> issue.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface
> probably
> > >>>>>>> also
> > >>>>>>> > needs
> > >>>>>>> > > >> some
> > >>>>>>> > > >> > >> change to support throttling at the split
> granularity.
> > >>>>>>> Can you
> > >>>>>>> > add
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >> interface change into the public interface section as
> > >>>>>>> well?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name
> assignSplits
> > >>>>>>> here,
> > >>>>>>> > given
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> it
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> is not actually changing the split assignments? It
> > seems
> > >>>>>>> > something
> > >>>>>>> > > >> like
> > >>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is
> > >>>>>>> more
> > >>>>>>> > > accurate.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
> > >>>>>>> stevenz3wu@gmail.com
> > >>>>>>> > >
> > >>>>>>> > > <
> > >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
> > >>>>>>> stevenz3wu@gmail.com>
> > >>>>>>> > <
> > >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> > >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> However, a single source operator may read data from
> > >>>>>>> multiple
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions,
> > such
> > >>>>>>> that
> > >>>>>>> > even
> > >>>>>>> > > >> with
> > >>>>>>> > > >> > >> watermark alignment the source operator may need to
> > >>>>>>> buffer
> > >>>>>>> > > excessive
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> amount
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> of data if one split emits data faster than another.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> For this part from the motivation section, is it
> > >>>>>>> accurate? Let's
> > >>>>>>> > > >> assume
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> one
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
> > >>>>>>> partition
> > >>>>>>> > is
> > >>>>>>> > > >> > >> significantly slower. In this situation, watermark
> for
> > >>>>>>> this
> > >>>>>>> > source
> > >>>>>>> > > >> task
> > >>>>>>> > > >> > >> won't hold back as it is reading recent data from
> other
> > >>>>>>> two Kafka
> > >>>>>>> > > >> > >> partitions. As a result, it won't hold back the
> overall
> > >>>>>>> > watermark.
> > >>>>>>> > > I
> > >>>>>>> > > >> > >> thought the problem is that we may have late data for
> > >>>>>>> this slow
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> partition.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I have another question about the restart. Say split
> > >>>>>>> alignment is
> > >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> > >>>>>>> restored from
> > >>>>>>> > > the
> > >>>>>>> > > >> > last
> > >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> > >>>>>>> checkpointed,
> > >>>>>>> > > initially
> > >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
> > >>>>>>> watermark
> > >>>>>>> > > >> > aggregation
> > >>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
> > >>>>>>> problem. Just
> > >>>>>>> > > >> want
> > >>>>>>> > > >> > to
> > >>>>>>> > > >> > >> understand it more.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
> > >>>>>>> thw@apache.org> <
> > >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> > >>>>>>> > > >> > >> thw@apache.org> <
> > >>>>>>> > > >> > >> thw@apache.org> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks for working on this!
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> > >>>>>>> SourceReaderBase and
> > >>>>>>> > > then
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> doing
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> nothing if the split reader does not implement
> > >>>>>>> AlignedSplitReader
> > >>>>>>> > > >> could
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> be
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead
> be
> > >>>>>>> added to
> > >>>>>>> > the
> > >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to
> make
> > >>>>>>> it
> > >>>>>>> > explicit
> > >>>>>>> > > >> that
> > >>>>>>> > > >> > >> the source actually supports it.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Thanks,
> > >>>>>>> > > >> > >> Thomas
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
> > >>>>>>> > > knaufk@apache.org>
> > >>>>>>> > > >> <
> > >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <
> knaufk@apache.org
> > >
> > >>>>>>> <
> > >>>>>>> > > >> > >> knaufk@apache.org> <
> > >>>>>>> > > >> > >> knaufk@apache.org>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> > interface
> > >>>>>>> (aka the
> > >>>>>>> > > >> stop
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> &
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> > >>>>>>> Pulsar only,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> correct?
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> +1 in general. I believe it is valuable to complete
> the
> > >>>>>>> watermark
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> aligned
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> story with this FLIP.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Cheers,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Konstantin
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> > >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> To be explicit, having worked on it, I support it ;)
> I
> > >>>>>>> think we
> > >>>>>>> > can
> > >>>>>>> > > >> > >> start a vote thread soonish, as there are no concerns
> > so
> > >>>>>>> far.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Dawid
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Dear Flink developers,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1] for
> > an
> > >>>>>>> > extension
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> of
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Watermark Alignment to perform alignment also in
> > >>>>>>> SplitReaders. To
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> do
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> so,
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> SplitReaders must be able to suspend and resume
> reading
> > >>>>>>> from
> > >>>>>>> > split
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> sources
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> where the SourceOperator coordinates and controlls
> > >>>>>>> suspend and
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> resume.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> To
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> gather information about current watermarks of the
> > >>>>>>> SplitReaders,
> > >>>>>>> > we
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> extend
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
> > >>>>>>> watermarks to
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> the
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> SourceOperator.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid
> > >>>>>>> Heise and
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> revised
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> and
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the
> work.)
> > >>>>>>> and me.
> > >>>>>>> > The
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> changes
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> are backwards compatible in a way that if affected
> > >>>>>>> components do
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> not
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> support split alignment the behavior is as before.
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Best,
> > >>>>>>> > > >> > >> Sebastian
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> [1]
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> >
> > >>>>>>> > > >>
> > >>>>>>> > >
> > >>>>>>> >
> > >>>>>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> [2]
> > >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> --
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >> Konstantin Knaufhttps://
> > >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> > >>
> > >>>>>>> > > >> >
> > >>>>>>> > > >>
> > >>>>>>> > > >
> > >>>>>>> > >
> > >>>>>>> >
> > >>>>>>>
> > >>>>>>
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Thanks for the clarification, Piotr and Sebastian.

It looks like the key problem is still whether the implementation of
pausable splits in the Sources should be optional or not.

I think it might be helpful to agree on the definition of optional in our
case. To me:
Optional = "You CAN leave the method unimplemented, and that is fine."
Non-Optional = "You CAN leave the method unimplemented, but you SHOULD NOT,
because people assume this works."

I think one sufficient condition of a Non-Optional feature is that if the
feature is exposed through the framework API, Flink should expect the
pluggables to support this feature by default. Otherwise the availability
of that feature becomes undefined.

Please note that so far we do not assume whether the feature is in
the original API or it is added later. A newly added feature can also be
non-optional, although it might take some time for all the pluggable
developers to catch up, and they should still work if the new feature is
not used until they catch up. In contrast, we may never expect an optional
feature to catch up, because leaving it unimplemented is also blessed.

Let's take the checkpointing as an example. Imagine Flink did not support
checkpointing before release 1.16. And now we are trying to add
checkpointing to Flink. So we exposed the checkpoint configuration to the
end users. In the meantime, will we tell the pluggable (e.g. operators,
connectors) developers that methods like "snapshotState()" is optional? If
we do that, the availability of checkpointing in Flink would be severely
weakened. But apparently we should still allow the existing implementations
to work without checkpointing. It looks to me that adding the method to the
pluggable interfaces with a default implementation throwing
"UnsupportedOperationException" would be the solution here. Please note
that in this case, having the default implementation does not mean this is
optional. It is just the technique to support backwards compatibility in
the feature evolution. The fact that this method is in the base interface
suggests it is not optional, so the developers SHOULD implement it.

When it comes to this FLIP, I think it meets the criteria of non-optional
features, so we should just use the evolution path of non-optional features.

Thanks,

Jiangjie (Becket) Qin



On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski <pn...@apache.org> wrote:

> Hi,
>
> Actually previously I thought about having a decorative interface and
> whenever watermark alignment is enabled, checking that the source
> implements the decorative interface. If not, throwing an exception.
>
> The option with default methods in the source interfaces throwing
> `UnsupportedOperationException` I think still suffers from the same
> problems I mentioned before. It's still an optional implementation and at
> the same time it's clogging the base interface. I think I would still vote
> soft -1 on this option, but I wouldn't block it in case I am out-voted.
>
> Best,
> Piotrek
>
> śr., 11 maj 2022 o 14:22 Sebastian Mattheis <se...@ververica.com>
> napisał(a):
>
> > Hi Becket,
> >
> > Thanks a lot for your fast and detailed response. For me, it converges
> and
> > dropping the supportsX method sounds very reasonable to me. (Side note:
> > With "pausable splits" enabled as "default" I think we misunderstood. As
> > you described now "default" I understand as that it should be the new
> > recommended way of implementation, and I think that is fully valid.
> Before,
> > I understood "default" here as the default implementation, i.e., throwing
> > UnsupportedOperationException, which is the exact opposite. :) )
> >
> > Nevertheless: As mentioned, an open question for me is if watermark
> > alignment should enforce pausable splits. For clarification, the current
> > documentation [1] says:
> >
> > *Note:* As of 1.15, Flink supports aligning across tasks of the same
> >> source and/or different sources. It does not support aligning
> >> splits/partitions/shards in the same task.
> >>
> >> In a case where there are e.g. two Kafka partitions that produce
> >> watermarks at different pace, that get assigned to the same task
> watermark
> >> might not behave as expected. Fortunately, worst case it should not
> perform
> >> worse than without alignment.
> >>
> >> Given the limitation above, we suggest applying watermark alignment in
> >> two situations:
> >>
> >>    1. You have two different sources (e.g. Kafka and File) that produce
> >>    watermarks at different speeds
> >>    2. You run your source with parallelism equal to the number of
> >>    splits/shards/partitions, which results in every subtask being
> assigned a
> >>    single unit of work.
> >>
> >> I personally see no issue in implementing and I see no reason against
> > implementing this dependency of watermark alignment and pausable splits.
> (I
> > think this would even be a good path towards shaping watermark alignment
> in
> > 1.16.) However, "I don't see" means that I would be happy to hear Dawid's
> > and Piotrek's opinions as they implemented watermark alignment based on
> > FLIP-182 [2] and I don't want to miss relevant rationale/background info
> > from their side.
> >
> > *@Piotrek* *@Dawid *What do you think?
> >
> > Regards,
> > Sebastian
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
> >
> > On Wed, May 11, 2022 at 1:30 PM Becket Qin <be...@gmail.com> wrote:
> >
> >> +dev
> >>
> >> Hi Sebastian,
> >>
> >> Thank you for the summary. Please see the detailed replies inline. As a
> >> recap of my suggestions.
> >>
> >> 1. Pausable splits API.
> >>   a) Add default implementations to methods "pauseOrResumeSplits" in
> both
> >> SourceReader and SplitReader where both default implementations throw
> >>  UnsupportedOperationException.
> >>
> >> 2. User story.
> >>     a) We tell users to enable the watermark alignment as they like.
> This
> >> is exactly what the current Flink API is.
> >>     b) We tell the source developers, please implement pausable splits,
> >> otherwise bad things may happen. Think of it like you are expected to
> >> implement SourceReader#snapshotState() properly, otherwise exceptions
> will
> >> be thrown when users enable checkpointing.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> >> sebastian@ververica.com> wrote:
> >>
> >>> Hi Becket, Hi everybody,
> >>>
> >>> I'm sorry if I misread the messages but I could not derive an agreement
> >>> from the mailing list. Nevertheless, if I understand you right the
> >>> suggestion is:
> >>>
> >>> * Add default implementations to methods "pauseOrResumeSplits" in both
> >>> SourceReader and SplitReader where both default implementations throw
> >>> UnsupportedOperationException.
> >>>
> >> Yes.
> >>
> >> * Add "supportsPauseOrResumeSplits" to the Source interface. (In the
> >>> following, I refer to supporting this as "pausable splits".)
> >>>
> >> We may no longer need this if pausable splits are expected to be
> >> implemented by the source developers, i.e. non-optional. Having this
> method
> >> would then be somewhat misleading as it looks like the sources that do
> not
> >> support pausable splits are also acceptable in the long term. So API
> wise,
> >> I'd say maybe we should remove this for this FLIP, although I believe
> this
> >> supportXXX pattern itself is still attractive for optional features.
> >>
> >>
> >>>
> >>> To make the conclusions explicit:
> >>>
> >>> 1. The implementation of pauseOrResumeSplits in both interfaces
> >>> SourceReader and SplitReader are optional where the default is that it
> >>> doesn't support it. (--> This means that the implementation is still
> >>> optional for the source developer.)
> >>>
> >> It is optional for backwards compatibility with existing sources, as
> they
> >> may still compile without code change. But starting from this FLIP,
> Flink
> >> will always optimistically assume that all the sources support pausable
> >> splits. If a source does not support pausable splits, it goes to an
> error
> >> handling path when watermark alignment is enabled on it. This is
> different
> >> from a usual optional feature, where no error is expected.
> >>
> >>
> >>> 2. If watermark alignment is enabled in the application code by adding
> >>> withWatermarkAlignment to the WatermarkStrategy while SourceReader or
> >>> SplitReader do not support pausableSplits, we throw an
> >>> UnsupportedOperationException.
> >>>
> >> Yes.
> >>
> >>
> >>> 3. With regard to your statement:
> >>>
> >>>> [...] basically means watermark alignment is an non-optional feature
> to
> >>>> the end users.
> >>>
> >>> You actually mean that "pausable splits" are non-optional for the app
> >>> developer if watermark alignment is enabled. However, watermark
> alignment
> >>> is optional and can be enabled/disabled.
> >>>
> >> Yes, watermark alignment can be enabled/disabled in individual sources
> in
> >> Flink jobs, which basically means the code supporting watermark
> alignment
> >> has to already be there. That again means the Source developers are also
> >> expected to support pausable splits by default. So this way we
> essentially
> >> tell the end users that you may enable / disable this feature as you
> wish,
> >> and tell the source developers that you SHOULD implement this because
> the
> >> end users may turn it on/off at will. And if the source does not support
> >> pausable splits, that goes to an error handling path when watermark
> >> alignment is enabled on it. So users know they have to explicitly
> exclude
> >> this source.
> >>
> >>
> >>>
> >>> So far it's totally clear to me and I hope this is what you mean. I
> also
> >>> agree with both statements:
> >>>
> >>> So making that expectation aligned with the source developers seems
> >>>> reasonable.
> >>>>
> >>>
> >>> I think this is a simple and clean solution from both the end user and
> >>>> source developers' standpoint.
> >>>>
> >>>
> >>> However, a last conclusion derives from 3. and is an open question for
> >>> me:
> >>>
> >>> 4. The feature of "pausable splits" is now tightly bound to watermark
> >>> alignment, i.e., if sources do not support "pausable splits" one can
> not
> >>> enable watermark alignment for these sources. This dependency is not
> the
> >>> current status of watermark alignment implementation because it is/was
> >>> implemented without pausable splits. Do we want to introduce this
> >>> dependency? (This is an open question. I cannot judge that.)
> >>>
> >> The watermark alignment basically relies on the pausable splits, right?
> >> So personally I found it quite reasonable that if the source does not
> >> support pausable splits, end users cannot enable watermark alignment on
> it.
> >>
> >>
> >>> If something is wrong, please correct me.
> >>>
> >>> Regards,
> >>> Sebastian
> >>>
> >>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com>
> wrote:
> >>>
> >>>> Hi Sebastian,
> >>>>
> >>>> Thanks for the reply and patient discussion. I agree this is a tricky
> >>>> decision.
> >>>>
> >>>>
> >>>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
> >>>>> follows:
> >>>>> (1) An interface with default NOOP implementation makes the
> >>>>> implementation optional. And in my opinion, a default implementation
> is and
> >>>>> will remain a way of making implementation optional because even in
> future
> >>>>> a developer can decide to implement the "old flavor" without support
> for
> >>>>> pausable splits.
> >>>>> (2) It may not be too critical but I also find it suboptimal that
> with
> >>>>> a NOOP default implementation there is no way to check at runtime if
> >>>>> SourceReader or SplitReader actually support pausing. (To do so, one
> would
> >>>>> need a supportsX method which makes it again more complicated.)\
> >>>>
> >>>>
> >>>> Based on the last few messages in the mailing list.  Piotr and I
> agreed
> >>>> that the default implementation should just throw an
> >>>> UnsupportedOperationException if the source is unpausable. So this
> >>>> basically tells the Source developers that this feature is expected
> to be
> >>>> supported. Because we cannot prevent end users from putting an
> unpausable
> >>>> source into the watermark alignment group, that basically means
> watermark
> >>>> alignment is an non-optional feature to the end users. So making that
> >>>> expectation aligned with the source developers seems reasonable.  And
> if a
> >>>> source does not support this feature, the end users should explicitly
> >>>> remove that source from the watermark alignment group.
> >>>>
> >>>> Personally speaking I think this is a simple and clean solution from
> >>>> both the end user and source developers' standpoint.
> >>>>
> >>>> Does this address your concerns?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Jiangjie (Becket) Qin
> >>>>
> >>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
> >>>> sebastian@ververica.com> wrote:
> >>>>
> >>>>> Hi Piotr, Hi Becket, Hi everybody,
> >>>>>
> >>>>> we, Dawid and I, discussed the various suggestions/options and we
> >>>>> would be okay either way because we find neither solution is perfect
> just
> >>>>> because of the already present complexity.
> >>>>>
> >>>>> Option c) Adding methods to the interfaces of SourceReader and
> >>>>> SplitReader
> >>>>> Option a) Adding decorative interfaces to be used by SourceReader and
> >>>>> SplitReader
> >>>>>
> >>>>> As of the current status (v. 12) of the FLIP [1], it is based on
> >>>>> Option c) which we find acceptable because the complexity added is
> only a
> >>>>> single method.
> >>>>>
> >>>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
> >>>>> follows:
> >>>>> (1) An interface with default NOOP implementation makes the
> >>>>> implementation optional. And in my opinion, a default implementation
> is and
> >>>>> will remain a way of making implementation optional because even in
> future
> >>>>> a developer can decide to implement the "old flavor" without support
> for
> >>>>> pausable splits.
> >>>>> (2) It may not be too critical but I also find it suboptimal that
> with
> >>>>> a NOOP default implementation there is no way to check at runtime if
> >>>>> SourceReader or SplitReader actually support pausing. (To do so, one
> would
> >>>>> need a supportsX method which makes it again more complicated.)
> >>>>>
> >>>>> However, we haven't changed it because Option a) is also not optimal
> >>>>> or straight-forward:
> >>>>> (1) We need to add two distinct yet similar decorative interfaces
> >>>>> since, as mentioned, the signatures of the methods are different. For
> >>>>> example, we would need decorative interfaces like
> >>>>> `SplitReaderWithPausableSplits` and `SourceReaderWithPausableSplits`.
> >>>>> (2) As a consequence, we would need to somehow document how/where to
> >>>>> implement both interfaces and how this relates to each other. This
> we could
> >>>>> solve by adding a note in the interface of SourceReader and
> SplitReader and
> >>>>> reference to the decorative interfaces but it still increases
> complexity
> >>>>> too.
> >>>>>
> >>>>> In summary, we see both as acceptable and preferred over other
> >>>>> options. The question is if we can find a solution or compromise
> that is
> >>>>> acceptable for everybody to reach consensus.
> >>>>>
> >>>>> Please let us know what you think because we would be happy if we can
> >>>>> conclude the discussion to avoid dropping the initiative on this
> FLIP.
> >>>>>
> >>>>> Regards,
> >>>>> Sebastian
> >>>>>
> >>>>> [1]
> >>>>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> >>>>> (v. 12)
> >>>>>
> >>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <pnowojski@apache.org
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Guowei,
> >>>>>>
> >>>>>> as Dawid wrote a couple of messages back:
> >>>>>>
> >>>>>> > This is covered in the previous FLIP[1] which has been already
> >>>>>> implemented in 1.15. In short, it must be enabled with the watermark
> >>>>>> strategy which also configures drift and update interval
> >>>>>>
> >>>>>> So by default watermark alignment is disabled, regardless if a
> source
> >>>>>> supports it or not.
> >>>>>>
> >>>>>> Best,
> >>>>>> Piotrek
> >>>>>>
> >>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com>
> napisał(a):
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> We know that in the case of Bounded input Flink supports the Batch
> >>>>>>> execution mode. Currently in Batch execution mode, flink is
> executed
> >>>>>>> on a
> >>>>>>> stage-by-stage basis. In this way, perhaps watermark alignment
> might
> >>>>>>> not
> >>>>>>> gain much.
> >>>>>>>
> >>>>>>> So my question is: Is watermark alignment the default behavior(for
> >>>>>>> implemented source only)? If so, have you considered evaluating the
> >>>>>>> impact
> >>>>>>> of this behavior on the Batch execution mode? Or thinks it is not
> >>>>>>> necessary.
> >>>>>>>
> >>>>>>> Correct me if I miss something.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Guowei
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
> >>>>>>> piotr.nowojski@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> > Hi Becket and Dawid,
> >>>>>>> >
> >>>>>>> > > I feel that no matter which option we choose this can not be
> >>>>>>> solved
> >>>>>>> > entirely in either of the options, because of the point above and
> >>>>>>> because
> >>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
> >>>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
> >>>>>>> identifies
> >>>>>>> > splits with splitId the other one passes the splits directly).
> >>>>>>> >
> >>>>>>> > Yes, that's a good point in this case and for features that need
> >>>>>>> to be
> >>>>>>> > implemented in more than one place.
> >>>>>>> >
> >>>>>>> > > Is there any reason for pausing reading from a split an
> optional
> >>>>>>> feature,
> >>>>>>> > > other than that this was not included in the original
> interface?
> >>>>>>> >
> >>>>>>> > An additional argument in favor of making it optional is to
> >>>>>>> simplify source
> >>>>>>> > implementation. But on its own I'm not sure if that would be
> >>>>>>> enough to
> >>>>>>> > justify making this feature optional. Maybe.
> >>>>>>> >
> >>>>>>> > > I think it would be way simpler and clearer to just let end
> >>>>>>> users and
> >>>>>>> > Flink
> >>>>>>> > > assume all the connectors will implement this feature.
> >>>>>>> >
> >>>>>>> > As I wrote above that would be an interesting choice to make
> (ease
> >>>>>>> of
> >>>>>>> > implementation for new users, vs system consistency). Regardless
> >>>>>>> of that,
> >>>>>>> > yes, for me the main argument is the API backward compatibility.
> >>>>>>> But let's
> >>>>>>> > clear a couple of points:
> >>>>>>> > - The current proposal adding methods to the base interface with
> >>>>>>> default
> >>>>>>> > implementations is an OPTIONAL feature. Same as the decorative
> >>>>>>> version
> >>>>>>> > would be.
> >>>>>>> > - Decorative version could implement "throw
> >>>>>>> UnsupportedOperationException"
> >>>>>>> > if user enabled watermark alignment just as well and I agree
> >>>>>>> that's a
> >>>>>>> > better option compared to logging a warning.
> >>>>>>> >
> >>>>>>> > Best,
> >>>>>>> > Piotrek
> >>>>>>> >
> >>>>>>> >
> >>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
> >>>>>>> napisał(a):
> >>>>>>> >
> >>>>>>> > > Thanks for the reply and patient discussion, Piotr and Dawid.
> >>>>>>> > >
> >>>>>>> > > Is there any reason for pausing reading from a split an
> optional
> >>>>>>> feature,
> >>>>>>> > > other than that this was not included in the original
> interface?
> >>>>>>> > >
> >>>>>>> > > To be honest I am really worried about the complexity of the
> >>>>>>> user story
> >>>>>>> > > here. Optional features like this have a high overhead. Imagine
> >>>>>>> this
> >>>>>>> > > feature is optional, now a user enabled watermark alignment and
> >>>>>>> defined a
> >>>>>>> > > few watermark groups. Would it work? Hmm, that depends on
> >>>>>>> whether the
> >>>>>>> > > involved Source has implmemented this feature. If the Sources
> >>>>>>> are well
> >>>>>>> > > documented, good luck. Otherwise end users may have to look
> into
> >>>>>>> the code
> >>>>>>> > > of the Source to see whether the feature is supported. Which is
> >>>>>>> something
> >>>>>>> > > they shouldn't have to do.
> >>>>>>> > >
> >>>>>>> > > I think it would be way simpler and clearer to just let end
> >>>>>>> users and
> >>>>>>> > Flink
> >>>>>>> > > assume all the connectors will implement this feature. After
> all
> >>>>>>> the
> >>>>>>> > > watermark group is not optinoal to the end users. If in some
> >>>>>>> rare cases,
> >>>>>>> > > the feature cannot be supported, a clear
> >>>>>>> UnsupportedOperationException
> >>>>>>> > will
> >>>>>>> > > be thrown to tell users to explicitly remove this Source from
> the
> >>>>>>> > watermark
> >>>>>>> > > group. I don't think we should have a warning message here, as
> >>>>>>> they tend
> >>>>>>> > to
> >>>>>>> > > be ignored in many cases. If we do this, we don't even need the
> >>>>>>> > supportXXX
> >>>>>>> > > method in the Source for this feature. In fact this is exactly
> >>>>>>> how many
> >>>>>>> > > interfaces works today. For example,
> >>>>>>> SplitEnumerator#addSplitsBack() is
> >>>>>>> > not
> >>>>>>> > > supported by Pravega source because it does not support partial
> >>>>>>> failover.
> >>>>>>> > > In that case, it simply throws an exception to trigger a global
> >>>>>>> recovery.
> >>>>>>> > >
> >>>>>>> > > The reason we add a default implementation in this case would
> >>>>>>> just for
> >>>>>>> > the
> >>>>>>> > > sake of backwards compatibility so the old source can still
> >>>>>>> compile.
> >>>>>>> > Sure,
> >>>>>>> > > in short term, this feature might not be supported by many
> >>>>>>> existing
> >>>>>>> > > sources. That is OK, and it is quite visible to the source
> >>>>>>> developers
> >>>>>>> > that
> >>>>>>> > > they did not override the default impl which throws an
> >>>>>>> > > UnsupportedOperationException.
> >>>>>>> > >
> >>>>>>> > > @Dawid,
> >>>>>>> > >
> >>>>>>> > > the Java doc of the SupportXXX() method in the Source would be
> >>>>>>> the single
> >>>>>>> > > >> source of truth regarding how to implement this feature.
> >>>>>>> > > >
> >>>>>>> > > >
> >>>>>>> > >
> >>>>>>> > > I also don't find it entirely true. Half of the classes are
> >>>>>>> theoretically
> >>>>>>> > > > optional and are utility classes from the point of view how
> the
> >>>>>>> > > interfaces
> >>>>>>> > > > are organized. Theoretically users do not need to use any of
> >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
> >>>>>>> methods in
> >>>>>>> > > the
> >>>>>>> > > > Source interface.
> >>>>>>> > >
> >>>>>>> > > I think the ultimate goal of java docs is to guide users to
> >>>>>>> implement the
> >>>>>>> > > Source. If SourceReaderBase is the preferred way to implement a
> >>>>>>> > > SourceReader, it seems worth mentioning that. Even the Java
> >>>>>>> language
> >>>>>>> > > documentation interfaces lists the konwn implementations [1] so
> >>>>>>> people
> >>>>>>> > can
> >>>>>>> > > leverage them. But for this particular case, if we make the
> >>>>>>> feature
> >>>>>>> > > non-optional, we don't even need the supportXXX() method for
> now.
> >>>>>>> > >
> >>>>>>> > > Thanks,
> >>>>>>> > >
> >>>>>>> > > Jiangjie (Becket) Qin
> >>>>>>> > >
> >>>>>>> > >
> >>>>>>> > >
> >>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
> >>>>>>> dwysakowicz@apache.org>
> >>>>>>> > > wrote:
> >>>>>>> > >
> >>>>>>> > > > Hey Piotr and Becket,
> >>>>>>> > > >
> >>>>>>> > > > First of all, let me say I am happy with whichever option is
> >>>>>>> agreed in
> >>>>>>> > > the
> >>>>>>> > > > discussion.
> >>>>>>> > > >
> >>>>>>> > > > I wanted to clarify a few points from the discussion though:
> >>>>>>> > > >
> >>>>>>> > > > @Becket:
> >>>>>>> > > >
> >>>>>>> > > > The main argument for adding the methods to the SourceReader
> >>>>>>> is that
> >>>>>>> > > these
> >>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader impl,
> i.e.
> >>>>>>> > starting
> >>>>>>> > > > from this FLIP, all the SourceReaders impl are expected to
> >>>>>>> support this
> >>>>>>> > > > method, although some old implementations may not have
> >>>>>>> implemented this
> >>>>>>> > > > feature. I think we should distinguish the new features from
> >>>>>>> the
> >>>>>>> > optional
> >>>>>>> > > > features. While the public decorative interface is a solution
> >>>>>>> to the
> >>>>>>> > > > optional features. We should not use it for the features that
> >>>>>>> are
> >>>>>>> > > > non-optional.
> >>>>>>> > > >
> >>>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even though
> >>>>>>> > preferred, I
> >>>>>>> > > > still think it can be simply optional.
> >>>>>>> > > >
> >>>>>>> > > > the Java doc of the SupportXXX() method in the Source would
> be
> >>>>>>> the
> >>>>>>> > single
> >>>>>>> > > > source of truth regarding how to implement this feature.
> >>>>>>> > > >
> >>>>>>> > > > I also don't find it entirely true. Half of the classes are
> >>>>>>> > theoretically
> >>>>>>> > > > optional and are utility classes from the point of view how
> the
> >>>>>>> > > interfaces
> >>>>>>> > > > are organized. Theoretically users do not need to use any of
> >>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
> >>>>>>> methods in
> >>>>>>> > > the
> >>>>>>> > > > Source interface.
> >>>>>>> > > >
> >>>>>>> > > > @Piotr
> >>>>>>> > > >
> >>>>>>> > > > If we have all of the methods with default implementation in
> >>>>>>> the base
> >>>>>>> > > > interface, the API doesn't give any clue to the user which
> set
> >>>>>>> of
> >>>>>>> > methods
> >>>>>>> > > > are required to be implemented at the same time.
> >>>>>>> > > >
> >>>>>>> > > > I feel that no matter which option we choose this can not be
> >>>>>>> solved
> >>>>>>> > > > entirely in either of the options, because of the point above
> >>>>>>> and
> >>>>>>> > because
> >>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
> >>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different (one
> >>>>>>> identifies
> >>>>>>> > > > splits with splitId the other one passes the splits
> directly).
> >>>>>>> > > >
> >>>>>>> > > > Best,
> >>>>>>> > > >
> >>>>>>> > > > Dawid
> >>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
> >>>>>>> > > >
> >>>>>>> > > > Hi Piotr,
> >>>>>>> > > >
> >>>>>>> > > > Thanks for the comment.
> >>>>>>> > > >
> >>>>>>> > > > Just to clarify, I am not against the decorative interfaces,
> >>>>>>> but I do
> >>>>>>> > > > think we should use them with caution. The main argument for
> >>>>>>> adding the
> >>>>>>> > > > methods to the SourceReader is that these methods are
> >>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting
> >>>>>>> from this
> >>>>>>> > > > FLIP, all the SourceReaders impl are expected to support this
> >>>>>>> > > > method, although some old implementations may not have
> >>>>>>> implemented this
> >>>>>>> > > > feature. I think we should distinguish the new features from
> >>>>>>> the
> >>>>>>> > optional
> >>>>>>> > > > features. While the public decorative interface is a solution
> >>>>>>> to the
> >>>>>>> > > > optional features. We should not use it for the features that
> >>>>>>> are
> >>>>>>> > > > non-optional.
> >>>>>>> > > >
> >>>>>>> > > > That said, this feature is optional for SplitReaders.
> Arguably
> >>>>>>> we can
> >>>>>>> > > have
> >>>>>>> > > > a decorative interface for that, but for simplicity and
> >>>>>>> symmetry of the
> >>>>>>> > > > interface, personally I prefer just adding a new method.
> >>>>>>> > > >
> >>>>>>> > > > Regarding the advantages you mentioned about the decorative
> >>>>>>> interfaces,
> >>>>>>> > > > they would make sense if:
> >>>>>>> > > > 1. The feature is optional.
> >>>>>>> > > > 2. There is only one decorative interface involved for a
> >>>>>>> feature.
> >>>>>>> > > > Otherwise the argument that all the methods are grouped
> >>>>>>> together will
> >>>>>>> > not
> >>>>>>> > > > stand.
> >>>>>>> > > >
> >>>>>>> > > > Compared with that, I think the current solution works fine
> in
> >>>>>>> all
> >>>>>>> > cases,
> >>>>>>> > > > i.e. "having supportXXX() method in Source, and default
> >>>>>>> methods /
> >>>>>>> > > > decorative interfaces in base interfaces.".
> >>>>>>> > > >
> >>>>>>> > > > The advantages are:
> >>>>>>> > > >> - clean and easy to implement base interface
> >>>>>>> > > >
> >>>>>>> > > > In the current approach, the Java doc of the SupportXXX()
> >>>>>>> method in the
> >>>>>>> > > > Source would be the single source of truth regarding how to
> >>>>>>> implement
> >>>>>>> > > this
> >>>>>>> > > > feature. It lists the method that has to be implemented to
> >>>>>>> support this
> >>>>>>> > > > feature, regardless of how many classes / interfaces are
> >>>>>>> involved.
> >>>>>>> > > >
> >>>>>>> > > > When implementing the base interface, users do not need to
> >>>>>>> implement a
> >>>>>>> > > > method with default implementation. If they are curious what
> >>>>>>> the method
> >>>>>>> > > is
> >>>>>>> > > > for, the java doc of that method simply points users to the
> >>>>>>> > SupportXXX()
> >>>>>>> > > > method in the Source. It seems not adding work to the users
> >>>>>>> compared
> >>>>>>> > with
> >>>>>>> > > > decorative interfaces, but gives much better discoverability.
> >>>>>>> > > >
> >>>>>>> > > > - all of the methods from a single feature are grouped in a
> >>>>>>> single
> >>>>>>> > > >> decorator interface, together with their dedicated java doc.
> >>>>>>> It's also
> >>>>>>> > > >> easier to google search for help using the decorator name
> >>>>>>> > > >
> >>>>>>> > > > - if an optional feature requires two methods to be
> >>>>>>> implemented at
> >>>>>>> > once,
> >>>>>>> > > >> decorator can guarantee that
> >>>>>>> > > >
> >>>>>>> > > > These two points are not true when multiple components and
> >>>>>>> classes are
> >>>>>>> > > > involved collaboratively to provide a feature. In our case,
> we
> >>>>>>> have
> >>>>>>> > both
> >>>>>>> > > > SourceReader and SplitReader involved. And there might be
> other
> >>>>>>> > > interfaces
> >>>>>>> > > > on the JM side involved for some future features. So the
> >>>>>>> relevant
> >>>>>>> > methods
> >>>>>>> > > > can actually be scattered over the places. That said, we may
> >>>>>>> still use
> >>>>>>> > > > decorative interfaces for each component, if the feature is
> >>>>>>> optional,
> >>>>>>> > > given
> >>>>>>> > > > there is a single source of truth for the feature.
> >>>>>>> > > >
> >>>>>>> > > > Here I would strongly lean towards making life easier for new
> >>>>>>> users,
> >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for
> >>>>>>> the power
> >>>>>>> > > >> users.
> >>>>>>> > > >
> >>>>>>> > > > I actually think the current approach is simpler, more
> >>>>>>> extensible and
> >>>>>>> > > more
> >>>>>>> > > > general for all the users. Can you articulate a bit more on
> >>>>>>> which part
> >>>>>>> > > you
> >>>>>>> > > > think makes users harder to understand?
> >>>>>>> > > >
> >>>>>>> > > >
> >>>>>>> > > > There is another benefit of the decorative interfaces which
> is
> >>>>>>> not
> >>>>>>> > > > mentioned, but might be worth considering here. Usually the
> >>>>>>> decorative
> >>>>>>> > > > interfaces give slightly better backwards compatibility than
> >>>>>>> the new
> >>>>>>> > > > default method in the interfaces. That is when users are
> using
> >>>>>>> a jar
> >>>>>>> > that
> >>>>>>> > > > was compiled with an older version of Flink which does not
> >>>>>>> have the
> >>>>>>> > > default
> >>>>>>> > > > method in the interfaces in question. A decorative interface
> >>>>>>> may still
> >>>>>>> > > > provide backwards compatibility in that case, while default
> >>>>>>> method impl
> >>>>>>> > > > cannot.
> >>>>>>> > > >
> >>>>>>> > > > I think in Flink we in general do not guarantee custom
> >>>>>>> components
> >>>>>>> > > compiled
> >>>>>>> > > > with an older version can run with a newer version of Flink.
> A
> >>>>>>> > recompile
> >>>>>>> > > > with a newer version would be required. That said, if we do
> >>>>>>> care about
> >>>>>>> > > > this, we can just change the "supportXXX()" method in the
> >>>>>>> Source
> >>>>>>> > > interface
> >>>>>>> > > > to use decorative interfaces, and leave the other parts
> >>>>>>> unchanged.
> >>>>>>> > > >
> >>>>>>> > > > Thanks,
> >>>>>>> > > >
> >>>>>>> > > > Jiangjie (Becket) Qin
> >>>>>>> > > >
> >>>>>>> > > >
> >>>>>>> > > >
> >>>>>>> > > >
> >>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
> >>>>>>> pnowojski@apache.org>
> >>>>>>> > > > wrote:
> >>>>>>> > > >
> >>>>>>> > > >> Hi,
> >>>>>>> > > >>
> >>>>>>> > > >> Sorry for chipping in so late, but I was OoO for the last
> two
> >>>>>>> weeks.
> >>>>>>> > > >> Regarding the interfaces, I would be actually against adding
> >>>>>>> those
> >>>>>>> > > methods
> >>>>>>> > > >> to the base interfaces for the reasons mentioned above.
> >>>>>>> Clogging the
> >>>>>>> > > base
> >>>>>>> > > >> interface for new users with tons of methods that they do
> not
> >>>>>>> need, do
> >>>>>>> > > not
> >>>>>>> > > >> understand and do not know what to do with them. Moreover,
> >>>>>>> such
> >>>>>>> > > decorative
> >>>>>>> > > >> interfaces are solving a problem if a feature requires two
> or
> >>>>>>> more
> >>>>>>> > > methods
> >>>>>>> > > >> to be implemented at the same time. If we have all of the
> >>>>>>> methods with
> >>>>>>> > > >> default implementation in the base interface, the API
> doesn't
> >>>>>>> give any
> >>>>>>> > > >> clue
> >>>>>>> > > >> to the user which set of methods are required to be
> >>>>>>> implemented at the
> >>>>>>> > > >> same
> >>>>>>> > > >> time.
> >>>>>>> > > >>
> >>>>>>> > > >> > a) I feel the biggest drawback of decorative interfaces is
> >>>>>>> which
> >>>>>>> > > >> interface
> >>>>>>> > > >> > they can decorate and which combinations of multiple
> >>>>>>> decorative
> >>>>>>> > > >> interfaces
> >>>>>>> > > >> > are valid. (...)
> >>>>>>> > > >> > In the future, if there is a new feature added
> >>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we going
> >>>>>>> to create
> >>>>>>> > > >> another
> >>>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
> >>>>>>> > > >> PrePartitionedAware?
> >>>>>>> > > >> > Can they be combined? So I think the additional decorative
> >>>>>>> interface
> >>>>>>> > > >> like
> >>>>>>> > > >> > withSplitsAlignment actually increases the understanding
> >>>>>>> cost of
> >>>>>>> > users
> >>>>>>> > > >> > because they have to know what decorative interfaces are
> >>>>>>> there,
> >>>>>>> > which
> >>>>>>> > > >> > interface they can decorate and which combinations of the
> >>>>>>> decorative
> >>>>>>> > > >> > interfaces are valid and which are not. Ideally we want to
> >>>>>>> avoid
> >>>>>>> > that.
> >>>>>>> > > >>
> >>>>>>> > > >> I'm not sure if I understand how embedding default methods
> in
> >>>>>>> the base
> >>>>>>> > > >> interface is solving the problem: what can be combined or
> >>>>>>> not? If
> >>>>>>> > there
> >>>>>>> > > >> are
> >>>>>>> > > >> two conflicting features, having decorative interfaces that
> >>>>>>> can not be
> >>>>>>> > > >> mixed together actually makes much more sense to me rather
> >>>>>>> than having
> >>>>>>> > > >> them
> >>>>>>> > > >> all in one base class. How would you allow users to
> implement
> >>>>>>> only one
> >>>>>>> > > of
> >>>>>>> > > >> those two features?
> >>>>>>> > > >>
> >>>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
> >>>>>>> > > >> - how can a user discover what decorators are there?
> >>>>>>> > > >> - how can a user know where the decorator can be applied to?
> >>>>>>> > > >>
> >>>>>>> > > >> However those are drawbacks for more power users, that can
> be
> >>>>>>> > mitigated
> >>>>>>> > > by
> >>>>>>> > > >> the documentation. For example listing all of the decorators
> >>>>>>> with
> >>>>>>> > > >> detailed explanation both in the docs and in the java docs.
> >>>>>>> More
> >>>>>>> > > >> experienced users will be able to deal with those issues
> >>>>>>> easier, as
> >>>>>>> > they
> >>>>>>> > > >> will already have some basic understanding of Flink. Also if
> >>>>>>> user has
> >>>>>>> > a
> >>>>>>> > > >> problem that he wants to solve, he will google search a
> >>>>>>> potential
> >>>>>>> > > solution
> >>>>>>> > > >> to his problem anyway, and while doing that he is very
> likely
> >>>>>>> to
> >>>>>>> > > discover
> >>>>>>> > > >> the decorator that he needs anyway in the docs.
> >>>>>>> > > >>
> >>>>>>> > > >> The advantages are:
> >>>>>>> > > >> - clean and easy to implement base interface
> >>>>>>> > > >> - all of the methods from a single feature are grouped in a
> >>>>>>> single
> >>>>>>> > > >> decorator interface, together with their dedicated java doc.
> >>>>>>> It's also
> >>>>>>> > > >> easier to google search for help using the decorator name
> >>>>>>> > > >> - if an optional feature requires two methods to be
> >>>>>>> implemented at
> >>>>>>> > once,
> >>>>>>> > > >> decorator can guarantee that
> >>>>>>> > > >>
> >>>>>>> > > >> Here I would strongly lean towards making life easier for
> new
> >>>>>>> users,
> >>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for
> >>>>>>> the power
> >>>>>>> > > >> users.
> >>>>>>> > > >>
> >>>>>>> > > >> Best,
> >>>>>>> > > >> Piotrek
> >>>>>>> > > >>
> >>>>>>> > > >>
> >>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
> >>>>>>> > napisał(a):
> >>>>>>> > > >>
> >>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
> >>>>>>> > > >> >
> >>>>>>> > > >> > I think Sebastion has a good summary. This is a really
> >>>>>>> helpful
> >>>>>>> > > >> discussion.
> >>>>>>> > > >> >
> >>>>>>> > > >> > Thinking a bit more, I feel that it might still be better
> >>>>>>> to add the
> >>>>>>> > > >> > supportsXXX() method in the Source rather than
> SourceReader.
> >>>>>>> > > >> >
> >>>>>>> > > >> > Generally speaking, what we are trying to do here is to
> let
> >>>>>>> the
> >>>>>>> > Flink
> >>>>>>> > > >> > framework know what the Source is capable of. In this
> FLIP,
> >>>>>>> it
> >>>>>>> > happens
> >>>>>>> > > >> to
> >>>>>>> > > >> > be the capability that only involves SourceReader. But in
> >>>>>>> the
> >>>>>>> > future,
> >>>>>>> > > >> it is
> >>>>>>> > > >> > possible that another functionality involves both the
> >>>>>>> > SplitEnumerator
> >>>>>>> > > >> and
> >>>>>>> > > >> > SourceReader. In that case, following the current
> approach,
> >>>>>>> we
> >>>>>>> > should
> >>>>>>> > > >> put
> >>>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
> >>>>>>> SourceReader.
> >>>>>>> > > >> > Because if we only put this in the SourceReader, then the
> >>>>>>> JM would
> >>>>>>> > > have
> >>>>>>> > > >> to
> >>>>>>> > > >> > create a SourceReader in order to know whether this
> feature
> >>>>>>> is
> >>>>>>> > > >> supported,
> >>>>>>> > > >> > which is a little ugly. But if we put the "supportsXXX()"
> >>>>>>> method in
> >>>>>>> > > the
> >>>>>>> > > >> > Source, we will break the "symmetric" design because this
> >>>>>>> FLIP
> >>>>>>> > chose a
> >>>>>>> > > >> > different way.
> >>>>>>> > > >> >
> >>>>>>> > > >> > This is also why I think supportsXXX() method seems a good
> >>>>>>> thing to
> >>>>>>> > > >> have,
> >>>>>>> > > >> > because when there are a few interfaces / methods that are
> >>>>>>> expected
> >>>>>>> > to
> >>>>>>> > > >> be
> >>>>>>> > > >> > implemented at the same time in order to deliver a
> feature,
> >>>>>>> it is
> >>>>>>> > > always
> >>>>>>> > > >> > good to have a single source of truth to tell the
> framework
> >>>>>>> what to
> >>>>>>> > > do,
> >>>>>>> > > >> so
> >>>>>>> > > >> > the framework can do consistent things in different parts.
> >>>>>>> > > >> >
> >>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
> >>>>>>> > > >> >
> >>>>>>> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
> >>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is that we
> are
> >>>>>>> > > essentially
> >>>>>>> > > >> > expecting almost all the SourceReader implementations to
> >>>>>>> extend
> >>>>>>> > > >> > SourceReaderBase, which effectively makes the SourceReader
> >>>>>>> interface
> >>>>>>> > > >> > without the pausing support useless. So this indicates
> that
> >>>>>>> public
> >>>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
> >>>>>>> purpose) only
> >>>>>>> > > >> > make sense if the original interface is also expected to
> be
> >>>>>>> used.
> >>>>>>> > > >> > Otherwise, it seems makes more sense to add the method to
> >>>>>>> the
> >>>>>>> > original
> >>>>>>> > > >> > interface itself.
> >>>>>>> > > >> >
> >>>>>>> > > >> > Cheers,
> >>>>>>> > > >> >
> >>>>>>> > > >> > Jiangjie (Becket) Qin
> >>>>>>> > > >> >
> >>>>>>> > > >> >
> >>>>>>> > > >> >
> >>>>>>> > > >> >
> >>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> >>>>>>> > > >> dwysakowicz@apache.org>
> >>>>>>> > > >> > wrote:
> >>>>>>> > > >> >
> >>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > I think most of your points aligned with the suggestions
> >>>>>>> I made to
> >>>>>>> > > the
> >>>>>>> > > >> > > FLIP, while you were writing your reply (I believe we
> hit
> >>>>>>> enter
> >>>>>>> > > >> nearly at
> >>>>>>> > > >> > > the same time ;) )
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Two points after we synced offline
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment
> to
> >>>>>>> > > >> > > supportsPausingSplits to express the general capability
> of
> >>>>>>> > pausing.
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > 2. As for if we should
> >>>>>>> PausingSourceReader/PausingSplitReader
> >>>>>>> > > (option
> >>>>>>> > > >> b)
> >>>>>>> > > >> > > or if we should just add the methods (option c), I
> >>>>>>> suggest to
> >>>>>>> > simply
> >>>>>>> > > >> add
> >>>>>>> > > >> > > the two methods as I felt this is much preferred
> approach
> >>>>>>> Becket,
> >>>>>>> > > >> which
> >>>>>>> > > >> > > others do not object. Unless there is an opposition
> let's
> >>>>>>> go with
> >>>>>>> > > this
> >>>>>>> > > >> > > option c.
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Best,
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Dawid
> >>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Hi folks,
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all the
> >>>>>>> input and
> >>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time to
> >>>>>>> catch up and
> >>>>>>> > > >> > structure
> >>>>>>> > > >> > > (for myself) the discussion and I wanted to find a way
> to
> >>>>>>> > structure
> >>>>>>> > > >> the
> >>>>>>> > > >> > > conclusions. (Also because I had the feeling that some
> >>>>>>> concerns
> >>>>>>> > got
> >>>>>>> > > >> lost
> >>>>>>> > > >> > in
> >>>>>>> > > >> > > the discussion.) This is my attempt and please correct
> me
> >>>>>>> if
> >>>>>>> > > >> something is
> >>>>>>> > > >> > > wrong or misunderstood. I tried to collect and assemble
> >>>>>>> the
> >>>>>>> > > opinions,
> >>>>>>> > > >> > > suggestions, and conclusions (to the best of my
> >>>>>>> knowledge):
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > # Top A: Should split alignment (pause/resume behavior)
> >>>>>>> be a
> >>>>>>> > general
> >>>>>>> > > >> > > capability?
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > I personally don't see any reason no to have it a
> general
> >>>>>>> > capability
> >>>>>>> > > >> > > because for the alignSplit method it is actually
> >>>>>>> independent of
> >>>>>>> > the
> >>>>>>> > > >> > > watermarks. If we agree here to have it a general
> >>>>>>> capability, we
> >>>>>>> > > >> should
> >>>>>>> > > >> > > also agree on the right wording. Does
> >>>>>>> "alignSplits(splitsToResume,
> >>>>>>> > > >> > > splitsToPause)" refer to what is then actually meant? (I
> >>>>>>> see it as
> >>>>>>> > > >> okay.
> >>>>>>> > > >> > I
> >>>>>>> > > >> > > don't have any better idea whilst Arvid suggested
> >>>>>>> > > >> "pauseOrResumeSplits".)
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > # Top B: Should it be possible do enable/disable split
> >>>>>>> alignment?
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > I would personally not disable the split alignment on
> the
> >>>>>>> source
> >>>>>>> > > >> reader
> >>>>>>> > > >> > > side because if split alignment is used for some other
> >>>>>>> use case
> >>>>>>> > (see
> >>>>>>> > > >> A)
> >>>>>>> > > >> > it
> >>>>>>> > > >> > > could have nasty side effects on other/future use cases.
> >>>>>>> Instead,
> >>>>>>> > I
> >>>>>>> > > >> would
> >>>>>>> > > >> > > disable "watermark split alignment" where I think it
> >>>>>>> should
> >>>>>>> > disable
> >>>>>>> > > >> the
> >>>>>>> > > >> > > watermark-dependent trigger for split alignment.
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > # Top C: Should we add a supportsX method?
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > I find it difficult to define the scope of a supportsX
> >>>>>>> method
> >>>>>>> > w.r.t.
> >>>>>>> > > >> to
> >>>>>>> > > >> > > the following questions: a) Where is it used? and b)
> What
> >>>>>>> is the
> >>>>>>> > > >> expected
> >>>>>>> > > >> > > output? To b), it's not straight-forward to provide a
> >>>>>>> meaningful
> >>>>>>> > > >> output,
> >>>>>>> > > >> > > e.g., if SourceReader supports split alignment but
> >>>>>>> SplitReader
> >>>>>>> > not.
> >>>>>>> > > >> This
> >>>>>>> > > >> > is
> >>>>>>> > > >> > > because with the current implementation, we can
> determine
> >>>>>>> whether
> >>>>>>> > > >> split
> >>>>>>> > > >> > > alignment is fully supported only during runtime and
> >>>>>>> specifically
> >>>>>>> > > >> > actually
> >>>>>>> > > >> > > only when calling alignSplits down the call hierarchy up
> >>>>>>> to the
> >>>>>>> > > actual
> >>>>>>> > > >> > > SplitReaders.
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Therefore, I would suggest to either raise an error or
> >>>>>>> warning if
> >>>>>>> > > the
> >>>>>>> > > >> > > alignment is called but not supported at some point. I
> >>>>>>> know we
> >>>>>>> > > should
> >>>>>>> > > >> > > carefully think about when this could be the case
> because
> >>>>>>> we don't
> >>>>>>> > > >> want
> >>>>>>> > > >> > to
> >>>>>>> > > >> > > flood anybody with such warnings. However, warnings
> could
> >>>>>>> be an
> >>>>>>> > > >> indicator
> >>>>>>> > > >> > > for the user that for watermark split alignment use case
> >>>>>>> split
> >>>>>>> > > >> reading is
> >>>>>>> > > >> > > imbalanced with the conclusion to either disable the
> >>>>>>> trigger for
> >>>>>>> > > >> > watermark
> >>>>>>> > > >> > > split alignment (see Top B) or to use/implement a source
> >>>>>>> and
> >>>>>>> > reader
> >>>>>>> > > >> that
> >>>>>>> > > >> > > fully supports split alignment.
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > # Top D: How to design interfaces?
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Thanks for structuring the discussion with the the
> various
> >>>>>>> > > >> possibilities
> >>>>>>> > > >> > > (a-d). From the discussion and emails, I would like to
> >>>>>>> summarize
> >>>>>>> > the
> >>>>>>> > > >> > > following requirements:
> >>>>>>> > > >> > > - Interfaces should be consistent ("symmetric"), i.e.,
> >>>>>>> similar
> >>>>>>> > > >> semantics
> >>>>>>> > > >> > > should have similar interfaces with similar usage.
> >>>>>>> > > >> > > - Make explicit which implementations implement
> >>>>>>> interfaces/support
> >>>>>>> > > >> > > behavior.
> >>>>>>> > > >> > > - Make clear what are default implementations and how to
> >>>>>>> implement
> >>>>>>> > > >> > > interfaces with desired behavior.
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > This is a simplified view of the relations between
> >>>>>>> relevant
> >>>>>>> > classes
> >>>>>>> > > of
> >>>>>>> > > >> > the
> >>>>>>> > > >> > > PoC implementation:
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal)
> >>>>>>> <|-- ..
> >>>>>>> > <|--
> >>>>>>> > > >> > > MySourceReader
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
> >>>>>>> > SplitFetcher
> >>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
> >>>>>>> implement most
> >>>>>>> > of
> >>>>>>> > > >> the
> >>>>>>> > > >> > > "logic" for split alignment just because we wanted to
> >>>>>>> implement
> >>>>>>> > > split
> >>>>>>> > > >> > > alignment and wanted it to be available as kind of a
> >>>>>>> default. As a
> >>>>>>> > > >> > > consequence, we have a "default implementation" for
> >>>>>>> SourceReader
> >>>>>>> > > that
> >>>>>>> > > >> > > implements the actual logic for split alignment. For
> that
> >>>>>>> reason,
> >>>>>>> > I
> >>>>>>> > > >> find
> >>>>>>> > > >> > it
> >>>>>>> > > >> > > very confusing to have a NOOP default implementation in
> >>>>>>> the
> >>>>>>> > > interface
> >>>>>>> > > >> for
> >>>>>>> > > >> > > the SourceReader. As a consequence, interface strategy
> c)
> >>>>>>> is
> >>>>>>> > > difficult
> >>>>>>> > > >> > > because this would require NOOP default implementations
> >>>>>>> in the
> >>>>>>> > > public
> >>>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is the
> >>>>>>> same for
> >>>>>>> > > >> strategy
> >>>>>>> > > >> > > d) because it would require NOOP default implementation
> >>>>>>> in the
> >>>>>>> > > >> > > SourceReader. Further, as Dawid described method
> >>>>>>> signatures of
> >>>>>>> > > >> alignSplit
> >>>>>>> > > >> > > for SourceReader and SplitReader differ and it would be
> >>>>>>> extremely
> >>>>>>> > > >> > difficult
> >>>>>>> > > >> > > to make the signatures the same (with even potential
> >>>>>>> performance
> >>>>>>> > > >> impact
> >>>>>>> > > >> > > because of additional loop-ups of split ids). Therefore,
> >>>>>>> having a
> >>>>>>> > > >> > symmetric
> >>>>>>> > > >> > > decorative interface as of strategy a) is actually not
> >>>>>>> possible
> >>>>>>> > and
> >>>>>>> > > >> > having
> >>>>>>> > > >> > > two decorative interfaces with different method
> >>>>>>> signatures is
> >>>>>>> > > >> confusing.
> >>>>>>> > > >> > My
> >>>>>>> > > >> > > conclusion is that we are best with strategy b) which
> >>>>>>> means to
> >>>>>>> > have
> >>>>>>> > > >> > > specializing sub-interfaces that inherit from the parent
> >>>>>>> > interface:
> >>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
> >>>>>>> > > >> > AlignedSplitReader
> >>>>>>> > > >> > > With this option, I'm not 100% sure what the
> implications
> >>>>>>> are and
> >>>>>>> > if
> >>>>>>> > > >> this
> >>>>>>> > > >> > > could get nasty. I would suggest that Dawid and I just
> >>>>>>> try to
> >>>>>>> > > >> implement
> >>>>>>> > > >> > and
> >>>>>>> > > >> > > see if we like it. :)
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > # Summary
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > In conclusion, please let me know your perspectives.
> >>>>>>> Please
> >>>>>>> > correct
> >>>>>>> > > >> me,
> >>>>>>> > > >> > if
> >>>>>>> > > >> > > something is wrong or if I misunderstood something. My
> >>>>>>> perspective
> >>>>>>> > > >> would
> >>>>>>> > > >> > be:
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Top A: Yes
> >>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
> >>>>>>> alignment)
> >>>>>>> > > >> > > Top C: No
> >>>>>>> > > >> > > Top D: b)
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > Best,
> >>>>>>> > > >> > > Sebastian
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> >>>>>>> > > >> dwysakowicz@apache.org
> >>>>>>> > > >> > >
> >>>>>>> > > >> > > wrote:
> >>>>>>> > > >> > >
> >>>>>>> > > >> > >> @Arvid:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> While I also like Becket's capability approach, I fear
> >>>>>>> that it
> >>>>>>> > > >> doesn't
> >>>>>>> > > >> > work
> >>>>>>> > > >> > >> for this particular use case: Sources can always be
> >>>>>>> aligned
> >>>>>>> > > >> cross-task
> >>>>>>> > > >> > and
> >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> >>>>>>> plausible to put
> >>>>>>> > > >> > sources
> >>>>>>> > > >> > >> into an alignment group even though they do not use any
> >>>>>>> of the
> >>>>>>> > > >> presented
> >>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
> >>>>>>> they handle
> >>>>>>> > > >> > multiple
> >>>>>>> > > >> > >> splits (see motivation section).
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for telling
> if
> >>>>>>> it
> >>>>>>> > > supports
> >>>>>>> > > >> > that
> >>>>>>> > > >> > >> intra-task alignment. Cross-task alignment would always
> >>>>>>> be
> >>>>>>> > > supported.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I updated interfaces to what I believe to be closest
> to a
> >>>>>>> > consensus
> >>>>>>> > > >> > >> between all participants. Do you mind taking a look?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Dawid
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks for pushing this effort.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree that
> >>>>>>> decorator
> >>>>>>> > > >> interfaces
> >>>>>>> > > >> > >> should be avoided but I'm also not a big fan of
> >>>>>>> overloading the
> >>>>>>> > > base
> >>>>>>> > > >> > >> interfaces (they are hard to implement as is). The
> usual
> >>>>>>> feedback
> >>>>>>> > > to
> >>>>>>> > > >> > >> Source-related interfaces are always that they are
> >>>>>>> overwhelming
> >>>>>>> > and
> >>>>>>> > > >> too
> >>>>>>> > > >> > >> hard to implement. However, I'd also not oppose 1c) as
> >>>>>>> scattered
> >>>>>>> > > >> > interfaces
> >>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> >>>>>>> > > >> > >> While I also like Becket's capability approach, I fear
> >>>>>>> that it
> >>>>>>> > > >> doesn't
> >>>>>>> > > >> > work
> >>>>>>> > > >> > >> for this particular use case: Sources can always be
> >>>>>>> aligned
> >>>>>>> > > >> cross-task
> >>>>>>> > > >> > and
> >>>>>>> > > >> > >> this is just about intra-task alignment. So it's
> >>>>>>> plausible to put
> >>>>>>> > > >> > sources
> >>>>>>> > > >> > >> into an alignment group even though they do not use any
> >>>>>>> of the
> >>>>>>> > > >> presented
> >>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
> >>>>>>> they handle
> >>>>>>> > > >> > multiple
> >>>>>>> > > >> > >> splits (see motivation section).
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I think renaming alignSplits to facilitate future use
> >>>>>>> cases makes
> >>>>>>> > > >> sense
> >>>>>>> > > >> > but
> >>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be
> adjusted
> >>>>>>> > > >> accordingly.
> >>>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader and
> I'd
> >>>>>>> go for
> >>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could
> >>>>>>> also
> >>>>>>> > split
> >>>>>>> > > it
> >>>>>>> > > >> > into
> >>>>>>> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits
> >>>>>>> may allow
> >>>>>>> > > >> Sources
> >>>>>>> > > >> > to
> >>>>>>> > > >> > >> just use 1 instead of 2 library calls (as written in
> the
> >>>>>>> > Javadoc),
> >>>>>>> > > >> both
> >>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if there
> >>>>>>> is a
> >>>>>>> > system
> >>>>>>> > > >> that
> >>>>>>> > > >> > >> can.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Some nit for the FLIP:
> >>>>>>> > > >> > >> - Please replace "stop" with "pause".
> >>>>>>> > > >> > >> - Not sure if it's worth it in the capability section:
> >>>>>>> Sources
> >>>>>>> > that
> >>>>>>> > > >> > adopt
> >>>>>>> > > >> > >> this interface cannot be used in earlier versions. So
> it
> >>>>>>> feels
> >>>>>>> > like
> >>>>>>> > > >> we
> >>>>>>> > > >> > are
> >>>>>>> > > >> > >> only forward compatible (old sources can be used after
> >>>>>>> the
> >>>>>>> > change);
> >>>>>>> > > >> but
> >>>>>>> > > >> > I
> >>>>>>> > > >> > >> guess this holds for any API addition.
> >>>>>>> > > >> > >> - You might want to add what happens when all splits
> are
> >>>>>>> paused.
> >>>>>>> > > >> > >> - You may want to describe how the 3 flavors of
> >>>>>>> SourceReaderBase
> >>>>>>> > > >> > interact
> >>>>>>> > > >> > >> with the interface.
> >>>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka and
> >>>>>>> Pulsar in
> >>>>>>> > the
> >>>>>>> > > >> > FLIP.
> >>>>>>> > > >> > >> For me, this is rather immediate follow-up work. (could
> >>>>>>> be in the
> >>>>>>> > > >> same
> >>>>>>> > > >> > >> umbrella ticket)
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Arvid
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> >>>>>>> > > >> > >> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> >>>>>>> WithSplitsAlignment",
> >>>>>>> > > >> along
> >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> >>>>>>> WithSplitsAlignment",
> >>>>>>> > > or
> >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
> >>>>>>> > > "MySplitReader
> >>>>>>> > > >> > >> implements AlignedSplitReader", or
> >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> >>>>>>> "MySplitReader
> >>>>>>> > > >> > implements
> >>>>>>> > > >> > >> SplitReader".
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I think the latest proposal according to Dawid would
> be:
> >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> >>>>>>> "MySplitReader
> >>>>>>> > > >> > implements
> >>>>>>> > > >> > >> AlignedSplitReader".
> >>>>>>> > > >> > >> I am fine with this API, although personally speaking I
> >>>>>>> think it
> >>>>>>> > is
> >>>>>>> > > >> > simpler
> >>>>>>> > > >> > >> to just add a new method to the split reader with
> >>>>>>> default impl.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I think that is a good idea to have it aligned as much
> as
> >>>>>>> > possible.
> >>>>>>> > > >> I'd
> >>>>>>> > > >> > be
> >>>>>>> > > >> > >> +1 for your option c). We can merge AlignedSplitReader
> >>>>>>> with
> >>>>>>> > > >> > SplitReader. We
> >>>>>>> > > >> > >> will update the FLIP shortly.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Dawid
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks for the comment, Jark.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 3. Interface/Method Name.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Can the interface be used to align other things in the
> >>>>>>> future?
> >>>>>>> > For
> >>>>>>> > > >> > example,
> >>>>>>> > > >> > >> align read speed, I have
> >>>>>>> > > >> > >> seen users requesting global rate limits. This feature
> >>>>>>> may also
> >>>>>>> > > need
> >>>>>>> > > >> an
> >>>>>>> > > >> > >> interface like this.
> >>>>>>> > > >> > >> If we don't plan to extend this interface to support
> >>>>>>> align other
> >>>>>>> > > >> > things, I
> >>>>>>> > > >> > >> suggest explicitly declaring
> >>>>>>> > > >> > >> the purpose of the methods, such as
> >>>>>>> `alignWatermarksForSplits`
> >>>>>>> > > >> instead
> >>>>>>> > > >> > of
> >>>>>>> > > >> > >> `alignSplits`.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> This is a good point. Naming wise, it would usually be
> >>>>>>> more
> >>>>>>> > > >> extensible
> >>>>>>> > > >> > to
> >>>>>>> > > >> > >> just describe what the method actually does, instead of
> >>>>>>> assuming
> >>>>>>> > > the
> >>>>>>> > > >> > >> purpose of doing this. For example, in this case,
> >>>>>>> > > >> pauseOrResumeSplits()
> >>>>>>> > > >> > >> would be more extensible because this can be used for
> >>>>>>> any kind of
> >>>>>>> > > >> flow
> >>>>>>> > > >> > >> control, be it watermark alignment or simple rate
> >>>>>>> limiting.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 4. Interface or Method.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I don't have a strong opinion on this. I think they
> have
> >>>>>>> their
> >>>>>>> > own
> >>>>>>> > > >> > >> advantages.
> >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
> >>>>>>> abilities
> >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
> >>>>>>> case. When
> >>>>>>> > you
> >>>>>>> > > >> have
> >>>>>>> > > >> > a
> >>>>>>> > > >> > >> bunch of abilities and each ability
> >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> >>>>>>> organize them
> >>>>>>> > and
> >>>>>>> > > >> make
> >>>>>>> > > >> > >> users clear which methods
> >>>>>>> > > >> > >> need to implement when you want to have an ability.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I am OK with decorative interfaces if this is a general
> >>>>>>> design
> >>>>>>> > > >> pattern
> >>>>>>> > > >> > in
> >>>>>>> > > >> > >> the other components in Flink. But it looks like the
> >>>>>>> current API
> >>>>>>> > > >> > proposal
> >>>>>>> > > >> > >> is not symmetric.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> The current proposal is essentially "MySourceReader
> >>>>>>> implements
> >>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
> >>>>>>> "MySplitReader
> >>>>>>> > > >> implements
> >>>>>>> > > >> > >> AlignedSplitsReader".
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Should we make the API symmetric? I'd consider any one
> >>>>>>> of the
> >>>>>>> > > >> following
> >>>>>>> > > >> > as
> >>>>>>> > > >> > >> symmetric.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
> >>>>>>> WithSplitsAlignment",
> >>>>>>> > > >> along
> >>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
> >>>>>>> WithSplitsAlignment",
> >>>>>>> > > or
> >>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
> >>>>>>> > > "MySplitReader
> >>>>>>> > > >> > >> implements AlignedSplitReader", or
> >>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
> >>>>>>> "MySplitReader
> >>>>>>> > > >> > implements
> >>>>>>> > > >> > >> SplitReader".
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I think the latest proposal according to Dawid would
> be:
> >>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
> >>>>>>> "MySplitReader
> >>>>>>> > > >> > implements
> >>>>>>> > > >> > >> AlignedSplitReader".
> >>>>>>> > > >> > >> I am fine with this API, although personally speaking I
> >>>>>>> think it
> >>>>>>> > is
> >>>>>>> > > >> > simpler
> >>>>>>> > > >> > >> to just add a new method to the split reader with
> >>>>>>> default impl.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
> >>>>>>> > dwysakowicz@apache.org
> >>>>>>> > > >
> >>>>>>> > > >> <
> >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks
> >>>>>>> for the
> >>>>>>> > > reply.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Having said that, as I don't have a preference and I
> >>>>>>> agree most
> >>>>>>> > of
> >>>>>>> > > >> the
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> sources will support the alignment I am fine following
> >>>>>>> your
> >>>>>>> > > >> suggestion
> >>>>>>> > > >> > to
> >>>>>>> > > >> > >> have the SourceReader extending from
> >>>>>>> > WithWatermarksSplitsAlignment,
> >>>>>>> > > >> but
> >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
> >>>>>>> keep the
> >>>>>>> > > two
> >>>>>>> > > >> > >> methods together.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source is
> >>>>>>> that this
> >>>>>>> > > allows
> >>>>>>> > > >> > some
> >>>>>>> > > >> > >> compile time check. For example, if a user enabled
> >>>>>>> watermark
> >>>>>>> > > >> alignment
> >>>>>>> > > >> > >> while it is not supported by the Source, an exception
> >>>>>>> can be
> >>>>>>> > thrown
> >>>>>>> > > >> at
> >>>>>>> > > >> > >> compile time. It seems in general useful. That said, I
> >>>>>>> agree that
> >>>>>>> > > API
> >>>>>>> > > >> > >> cleanliness wise it is better to put the two methods
> >>>>>>> together.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
> >>>>>>> imjark@gmail.com> <
> >>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com>
> >>>>>>> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did
> >>>>>>> miss it.
> >>>>>>> > > >> > >> I don't have other concerns then.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >> Jark
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> >>>>>>> > > >> dwysakowicz@apache.org>
> >>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
> >>>>>>> > > >> dwysakowicz@apache.org>
> >>>>>>> > > >> > >> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Jark:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 1. Will the framework always align with watermarks when
> >>>>>>> the
> >>>>>>> > source
> >>>>>>> > > >> > >> implements the interface?
> >>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment
> even
> >>>>>>> if Kafka
> >>>>>>> > > >> > >> implements the interface,
> >>>>>>> > > >> > >> and this will affect the throughput somehow. I agree
> >>>>>>> with Becket
> >>>>>>> > > >> > >> we may need a
> >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> configure
> >>>>>>> the
> >>>>>>> > source
> >>>>>>> > > >> to
> >>>>>>> > > >> > >> enable/disable the alignment.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 2. How does the framework calculate
> maxDesiredWatermark?
> >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> >>>>>>> greatly affect
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> throughput
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> if the reader is constantly
> >>>>>>> > > >> > >>  switching between pause and resume. Can users
> configure
> >>>>>>> the
> >>>>>>> > > >> alignment
> >>>>>>> > > >> > >> offset?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> This is covered in the previous FLIP[1] which has been
> >>>>>>> already
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> implemented
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the
> watermark
> >>>>>>> strategy
> >>>>>>> > > >> which
> >>>>>>> > > >> > >> also configures drift and update interval.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> If we don't plan to extend this interface to support
> >>>>>>> align other
> >>>>>>> > > >> things,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> suggest explicitly declaring
> >>>>>>> > > >> > >> the purpose of the methods, such as
> >>>>>>> `alignWatermarksForSplits`
> >>>>>>> > > >> instead
> >>>>>>> > > >> > of
> >>>>>>> > > >> > >> `alignSplits`.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Sure let's rename it.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Becket:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I understand your point. On the other hand putting all
> >>>>>>> methods,
> >>>>>>> > > even
> >>>>>>> > > >> > with
> >>>>>>> > > >> > >> "supportsXXX" methods for enabling certain features,
> >>>>>>> makes the
> >>>>>>> > > entry
> >>>>>>> > > >> > >> threshold for writing a new source higher. Instead of
> >>>>>>> focusing on
> >>>>>>> > > the
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> basic
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> and required properties of the Source, the person
> >>>>>>> implementing a
> >>>>>>> > > >> source
> >>>>>>> > > >> > >> must bother with and need to figure out what all of the
> >>>>>>> extra
> >>>>>>> > > >> features
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> are
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> about and how to deal with them. It makes it also
> harder
> >>>>>>> to
> >>>>>>> > > organize
> >>>>>>> > > >> > >> methods in coupled groups as Jark said.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Having said that, as I don't have a preference and I
> >>>>>>> agree most
> >>>>>>> > of
> >>>>>>> > > >> the
> >>>>>>> > > >> > >> sources will support the alignment I am fine following
> >>>>>>> your
> >>>>>>> > > >> suggestion
> >>>>>>> > > >> > to
> >>>>>>> > > >> > >> have the SourceReader extending from
> >>>>>>> > WithWatermarksSplitsAlignment,
> >>>>>>> > > >> but
> >>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
> >>>>>>> keep the
> >>>>>>> > > two
> >>>>>>> > > >> > >> methods together.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
> >>>>>>> "alignSplits"
> >>>>>>> > methods
> >>>>>>> > > >> > differ
> >>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The reason
> for
> >>>>>>> that is
> >>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
> >>>>>>> SplitReader
> >>>>>>> > > needs
> >>>>>>> > > >> the
> >>>>>>> > > >> > >> actual splits to pause them. I found the discrepancy
> >>>>>>> acceptable
> >>>>>>> > for
> >>>>>>> > > >> the
> >>>>>>> > > >> > >> sake of simplifying changes significantly, especially
> as
> >>>>>>> they
> >>>>>>> > would
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> highly
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> likely impact performance as we would have to perform
> >>>>>>> additional
> >>>>>>> > > >> > lookups.
> >>>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Dawid
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I just have some minor questions (maybe I missed
> >>>>>>> something).
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 1. Will the framework always align with watermarks when
> >>>>>>> the
> >>>>>>> > source
> >>>>>>> > > >> > >> implements the interface?
> >>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment
> even
> >>>>>>> if Kafka
> >>>>>>> > > >> > >> implements the interface,
> >>>>>>> > > >> > >> and this will affect the throughput somehow. I agree
> >>>>>>> with Becket
> >>>>>>> > > >> > >> we may need a
> >>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to
> configure
> >>>>>>> the
> >>>>>>> > source
> >>>>>>> > > >> to
> >>>>>>> > > >> > >> enable/disable the alignment.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 2. How does the framework calculate
> maxDesiredWatermark?
> >>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
> >>>>>>> greatly affect
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> throughput
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> if the reader is constantly
> >>>>>>> > > >> > >>  switching between pause and resume. Can users
> configure
> >>>>>>> the
> >>>>>>> > > >> alignment
> >>>>>>> > > >> > >> offset?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 3. Interface/Method Name.
> >>>>>>> > > >> > >> Can the interface be used to align other things in the
> >>>>>>> future?
> >>>>>>> > For
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> example,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> align read speed, I have
> >>>>>>> > > >> > >> seen users requesting global rate limits. This feature
> >>>>>>> may also
> >>>>>>> > > need
> >>>>>>> > > >> an
> >>>>>>> > > >> > >> interface like this.
> >>>>>>> > > >> > >> If we don't plan to extend this interface to support
> >>>>>>> align other
> >>>>>>> > > >> things,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> suggest explicitly declaring
> >>>>>>> > > >> > >> the purpose of the methods, such as
> >>>>>>> `alignWatermarksForSplits`
> >>>>>>> > > >> instead
> >>>>>>> > > >> > of
> >>>>>>> > > >> > >> `alignSplits`.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 4. Interface or Method.
> >>>>>>> > > >> > >> I don't have a strong opinion on this. I think they
> have
> >>>>>>> their
> >>>>>>> > own
> >>>>>>> > > >> > >> advantages.
> >>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
> >>>>>>> abilities
> >>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> >>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
> >>>>>>> case. When
> >>>>>>> > you
> >>>>>>> > > >> have
> >>>>>>> > > >> > a
> >>>>>>> > > >> > >> bunch of abilities and each ability
> >>>>>>> > > >> > >> has more than one method, Interfaces can help to
> >>>>>>> organize them
> >>>>>>> > and
> >>>>>>> > > >> make
> >>>>>>> > > >> > >> users clear which methods
> >>>>>>> > > >> > >> need to implement when you want to have an ability.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >> Jark
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
> >>>>>>> becket.qin@gmail.com>
> >>>>>>> > <
> >>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
> >>>>>>> becket.qin@gmail.com>
> >>>>>>> > <
> >>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Hi Dawid,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks for the explanation. Apologies that I somehow
> >>>>>>> misread a
> >>>>>>> > > bunch
> >>>>>>> > > >> of
> >>>>>>> > > >> > >> "align" and thought they were "assign".
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Regarding 1, by default implementation, I was thinking
> >>>>>>> of the
> >>>>>>> > > default
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> no-op
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> implementation. I am a little worried about the
> >>>>>>> proliferation of
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> decorative
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> interfaces. I think the most important thing about
> >>>>>>> interfaces is
> >>>>>>> > > that
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> they
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> are easy to understand. In this case, I prefer adding
> >>>>>>> new method
> >>>>>>> > to
> >>>>>>> > > >> the
> >>>>>>> > > >> > >> existing interface for the following reasons:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> a) I feel the biggest drawback of decorative interfaces
> >>>>>>> is which
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> interface
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> they can decorate and which combinations of multiple
> >>>>>>> decorative
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> interfaces
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> are valid. In the current FLIP, the withSplitsAlignment
> >>>>>>> interface
> >>>>>>> > > is
> >>>>>>> > > >> > only
> >>>>>>> > > >> > >> applicable to the SourceReader which means it can't
> >>>>>>> decorate any
> >>>>>>> > > >> other
> >>>>>>> > > >> > >> interface. From an interface design perspective, a
> >>>>>>> natural
> >>>>>>> > question
> >>>>>>> > > >> is
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> why
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> not let "AlignedSplitReader" extend
> >>>>>>> "withSplitsAlignment"? And it
> >>>>>>> > > is
> >>>>>>> > > >> > also
> >>>>>>> > > >> > >> natural to assume that a split reader implementing both
> >>>>>>> > SplitReader
> >>>>>>> > > >> and
> >>>>>>> > > >> > >> WithSplitAlignment would work, because a source reader
> >>>>>>> > implementing
> >>>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why
> isn't
> >>>>>>> there an
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> interface
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> of AlignedSourceReader? In the future, if there is a
> new
> >>>>>>> feature
> >>>>>>> > > >> added
> >>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we
> >>>>>>> going to
> >>>>>>> > create
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> another
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> interface of SplitReader such as SortedSplitReader or
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> PrePartitionedAware?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Can they be combined? So I think the additional
> >>>>>>> decorative
> >>>>>>> > > interface
> >>>>>>> > > >> > like
> >>>>>>> > > >> > >> withSplitsAlignment actually increases the
> understanding
> >>>>>>> cost of
> >>>>>>> > > >> users
> >>>>>>> > > >> > >> because they have to know what decorative interfaces
> are
> >>>>>>> there,
> >>>>>>> > > which
> >>>>>>> > > >> > >> interface they can decorate and which combinations of
> the
> >>>>>>> > > decorative
> >>>>>>> > > >> > >> interfaces are valid and which are not. Ideally we want
> >>>>>>> to avoid
> >>>>>>> > > >> that.
> >>>>>>> > > >> > To
> >>>>>>> > > >> > >> be clear, I am not opposing having an interface of
> >>>>>>> > > >> withSplitsAlignment,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> it
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> is completely OK to have it as an internal interface
> and
> >>>>>>> let
> >>>>>>> > > >> > SourceReader
> >>>>>>> > > >> > >> and SplitReader both extend it.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> b) Adding a new method to the SourceReader with a
> default
> >>>>>>> > > >> implementation
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> of
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> no-op would help avoid logic branching in the source
> >>>>>>> logic,
> >>>>>>> > > >> especially
> >>>>>>> > > >> > >> given that we agree that the vast majority of the
> >>>>>>> SourceReader
> >>>>>>> > > >> > >> implementations, if not all, would just extend from the
> >>>>>>> > > >> > SourceReaderBase.
> >>>>>>> > > >> > >> That means adding a new method to the interface would
> >>>>>>> effectively
> >>>>>>> > > >> give
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> the
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> same user experience, but simpler.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> c) A related design principle that may be worth
> >>>>>>> discussing is how
> >>>>>>> > > do
> >>>>>>> > > >> we
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> let
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> the Source implementations tell Flink what capability
> is
> >>>>>>> > supported
> >>>>>>> > > >> and
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> what
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> is not. Personally speaking I feel the most intuitive
> >>>>>>> place to me
> >>>>>>> > > is
> >>>>>>> > > >> in
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> the
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Source itself, because that is the entrance of the
> >>>>>>> entire Source
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> connector
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> logic.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
> >>>>>>> following
> >>>>>>> > > >> interface
> >>>>>>> > > >> > >> would be easier to understand by the users.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal interface,
> >>>>>>> let both
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> SourceReader
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> and SplitReader extend from it, with a default no-op
> >>>>>>> > > implementation.
> >>>>>>> > > >> > >> - Add a new method "boolean supportSplitsAlignment()"
> to
> >>>>>>> the
> >>>>>>> > Source
> >>>>>>> > > >> > >> interface, with a default implementation returning
> >>>>>>> false. Sources
> >>>>>>> > > >> that
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> have
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> implemented the alignment logic can change this to
> >>>>>>> return true,
> >>>>>>> > and
> >>>>>>> > > >> > >> override the alignSplits() methods in the SourceReader
> /
> >>>>>>> > > SplitReader
> >>>>>>> > > >> if
> >>>>>>> > > >> > >> needed.
> >>>>>>> > > >> > >> - In the future, if a new optional feature is going to
> >>>>>>> be added
> >>>>>>> > to
> >>>>>>> > > >> the
> >>>>>>> > > >> > >> Source, and that feature requires the awareness from
> >>>>>>> Flink, we
> >>>>>>> > can
> >>>>>>> > > >> add
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> more
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> such methods to the Source.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> What do you think?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
> >>>>>>> > > >> dwysakowicz@apache.org>
> >>>>>>> > > >> > <dw...@apache.org>
> >>>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
> >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Konstantin:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> interface
> >>>>>>> (aka the
> >>>>>>> > > >> stop &
> >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> >>>>>>> Pulsar only,
> >>>>>>> > > >> correct?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Correct, as far as I know though, those are the only
> >>>>>>> sources
> >>>>>>> > which
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> consume
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> concurrently from multiple splits and thus alignment
> >>>>>>> applies.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Thomas:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> >>>>>>> SourceReaderBase and
> >>>>>>> > > then
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> doing
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> nothing if the split reader does not implement
> >>>>>>> AlignedSplitReader
> >>>>>>> > > >> could
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> be
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
> >>>>>>> added to
> >>>>>>> > the
> >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make
> >>>>>>> it
> >>>>>>> > explicit
> >>>>>>> > > >> that
> >>>>>>> > > >> > >> the source actually supports it.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I understand your concern. Hmm, I think we could
> >>>>>>> actually do
> >>>>>>> > that.
> >>>>>>> > > >> Given
> >>>>>>> > > >> > >> the actual implementation of the
> >>>>>>> SourceReaderBase#alignSplits is
> >>>>>>> > > >> rather
> >>>>>>> > > >> > >> short (just a forward to the corresponding method of
> >>>>>>> > SplitFetcher),
> >>>>>>> > > >> we
> >>>>>>> > > >> > >> could reimplement it in the actual source
> >>>>>>> implementations. This
> >>>>>>> > > >> solution
> >>>>>>> > > >> > >> has the downside though. Authors of new sources would
> >>>>>>> have to do
> >>>>>>> > > two
> >>>>>>> > > >> > >> things: extend from AlignedSplitReader and implement
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> WithSplitsAssignment,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> instead of just extending AlignedSplitReader. I would
> be
> >>>>>>> fine
> >>>>>>> > with
> >>>>>>> > > >> such
> >>>>>>> > > >> > a
> >>>>>>> > > >> > >> tradeoff though. What others think?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Steven:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> For this part from the motivation section, is it
> >>>>>>> accurate? Let's
> >>>>>>> > > >> assume
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> one
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
> >>>>>>> partition
> >>>>>>> > is
> >>>>>>> > > >> > >> significantly slower. In this situation, watermark for
> >>>>>>> this
> >>>>>>> > source
> >>>>>>> > > >> task
> >>>>>>> > > >> > >> won't hold back as it is reading recent data from other
> >>>>>>> two Kafka
> >>>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
> >>>>>>> > watermark.
> >>>>>>> > > I
> >>>>>>> > > >> > >> thought the problem is that we may have late data for
> >>>>>>> this slow
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> partition.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> It will hold back the watermark. Watermark of an
> >>>>>>> operator is the
> >>>>>>> > > >> minimum
> >>>>>>> > > >> > >> of watermarks of all splits[1]
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I have another question about the restart. Say split
> >>>>>>> alignment is
> >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> >>>>>>> restored from
> >>>>>>> > > the
> >>>>>>> > > >> > last
> >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> >>>>>>> checkpointed,
> >>>>>>> > > initially
> >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
> >>>>>>> watermark
> >>>>>>> > > >> > aggregation
> >>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
> >>>>>>> problem. Just
> >>>>>>> > > >> want
> >>>>>>> > > >> > to
> >>>>>>> > > >> > >> understand it more.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Your understanding is correct.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> @Becket:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
> >>>>>>> case, so
> >>>>>>> > > >> should
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> we
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> just add the related methods to SourceReader directly
> >>>>>>> instead of
> >>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment?
> We
> >>>>>>> can
> >>>>>>> > > provide
> >>>>>>> > > >> > >> default implementations, so backwards compatibility
> >>>>>>> won't be an
> >>>>>>> > > >> issue.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I don't think we can provide a default implementation.
> >>>>>>> How would
> >>>>>>> > we
> >>>>>>> > > >> do
> >>>>>>> > > >> > >> that? Would it be just a no-op? Is it better than
> having
> >>>>>>> an
> >>>>>>> > opt-in
> >>>>>>> > > >> > >> interface? The default implementation would have to be
> >>>>>>> added
> >>>>>>> > > >> exclusively
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> in
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> a *Public* SourceReader interface. By the way notice
> >>>>>>> > > SourceReaderBase
> >>>>>>> > > >> > >> does extend from WithSplitsAlignment, so effectively
> all
> >>>>>>> > > >> implementations
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> do
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> handle the alignment case. To be honest I think it is
> >>>>>>> impossible
> >>>>>>> > to
> >>>>>>> > > >> > >> implement the SourceReader interface directly by end
> >>>>>>> users.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
> >>>>>>> also
> >>>>>>> > needs
> >>>>>>> > > >> some
> >>>>>>> > > >> > >> change to support throttling at the split granularity.
> >>>>>>> Can you
> >>>>>>> > add
> >>>>>>> > > >> that
> >>>>>>> > > >> > >> interface change into the public interface section as
> >>>>>>> well?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> It has been added from the beginning. See
> >>>>>>> *AlignedSplitReader.*
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
> >>>>>>> here,
> >>>>>>> > given
> >>>>>>> > > >> that
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> it
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> is not actually changing the split assignments? It
> seems
> >>>>>>> > something
> >>>>>>> > > >> like
> >>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is
> >>>>>>> more
> >>>>>>> > > accurate.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do you
> >>>>>>> still
> >>>>>>> > prefer
> >>>>>>> > > a
> >>>>>>> > > >> > >> different name for that? Personally, I am open for
> >>>>>>> suggestions
> >>>>>>> > > here.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Dawid
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> [1]
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> >
> >>>>>>> > > >>
> >>>>>>> > >
> >>>>>>> >
> >>>>>>>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think the
> >>>>>>> motivation
> >>>>>>> > > >> makes a
> >>>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
> >>>>>>> case, so
> >>>>>>> > > >> should
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> we
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> just add the related methods to SourceReader directly
> >>>>>>> instead of
> >>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment?
> We
> >>>>>>> can
> >>>>>>> > > provide
> >>>>>>> > > >> > >> default implementations, so backwards compatibility
> >>>>>>> won't be an
> >>>>>>> > > >> issue.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
> >>>>>>> also
> >>>>>>> > needs
> >>>>>>> > > >> some
> >>>>>>> > > >> > >> change to support throttling at the split granularity.
> >>>>>>> Can you
> >>>>>>> > add
> >>>>>>> > > >> that
> >>>>>>> > > >> > >> interface change into the public interface section as
> >>>>>>> well?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
> >>>>>>> here,
> >>>>>>> > given
> >>>>>>> > > >> that
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> it
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> is not actually changing the split assignments? It
> seems
> >>>>>>> > something
> >>>>>>> > > >> like
> >>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is
> >>>>>>> more
> >>>>>>> > > accurate.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Jiangjie (Becket) Qin
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
> >>>>>>> stevenz3wu@gmail.com
> >>>>>>> > >
> >>>>>>> > > <
> >>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
> >>>>>>> stevenz3wu@gmail.com>
> >>>>>>> > <
> >>>>>>> > > >> > >> stevenz3wu@gmail.com> <
> >>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> However, a single source operator may read data from
> >>>>>>> multiple
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions,
> such
> >>>>>>> that
> >>>>>>> > even
> >>>>>>> > > >> with
> >>>>>>> > > >> > >> watermark alignment the source operator may need to
> >>>>>>> buffer
> >>>>>>> > > excessive
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> amount
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> of data if one split emits data faster than another.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> For this part from the motivation section, is it
> >>>>>>> accurate? Let's
> >>>>>>> > > >> assume
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> one
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
> >>>>>>> partition
> >>>>>>> > is
> >>>>>>> > > >> > >> significantly slower. In this situation, watermark for
> >>>>>>> this
> >>>>>>> > source
> >>>>>>> > > >> task
> >>>>>>> > > >> > >> won't hold back as it is reading recent data from other
> >>>>>>> two Kafka
> >>>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
> >>>>>>> > watermark.
> >>>>>>> > > I
> >>>>>>> > > >> > >> thought the problem is that we may have late data for
> >>>>>>> this slow
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> partition.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I have another question about the restart. Say split
> >>>>>>> alignment is
> >>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
> >>>>>>> restored from
> >>>>>>> > > the
> >>>>>>> > > >> > last
> >>>>>>> > > >> > >> checkpoint. because alignment decision is not
> >>>>>>> checkpointed,
> >>>>>>> > > initially
> >>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
> >>>>>>> watermark
> >>>>>>> > > >> > aggregation
> >>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
> >>>>>>> problem. Just
> >>>>>>> > > >> want
> >>>>>>> > > >> > to
> >>>>>>> > > >> > >> understand it more.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
> >>>>>>> thw@apache.org> <
> >>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> >>>>>>> > > >> > >> thw@apache.org> <
> >>>>>>> > > >> > >> thw@apache.org> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks for working on this!
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I wonder if "supporting" split alignment in
> >>>>>>> SourceReaderBase and
> >>>>>>> > > then
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> doing
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> nothing if the split reader does not implement
> >>>>>>> AlignedSplitReader
> >>>>>>> > > >> could
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> be
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
> >>>>>>> added to
> >>>>>>> > the
> >>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make
> >>>>>>> it
> >>>>>>> > explicit
> >>>>>>> > > >> that
> >>>>>>> > > >> > >> the source actually supports it.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Thanks,
> >>>>>>> > > >> > >> Thomas
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
> >>>>>>> > > knaufk@apache.org>
> >>>>>>> > > >> <
> >>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <knaufk@apache.org
> >
> >>>>>>> <
> >>>>>>> > > >> > >> knaufk@apache.org> <
> >>>>>>> > > >> > >> knaufk@apache.org>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader`
> interface
> >>>>>>> (aka the
> >>>>>>> > > >> stop
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> &
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
> >>>>>>> Pulsar only,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> correct?
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> +1 in general. I believe it is valuable to complete the
> >>>>>>> watermark
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> aligned
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> story with this FLIP.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Cheers,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Konstantin
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> >>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> To be explicit, having worked on it, I support it ;) I
> >>>>>>> think we
> >>>>>>> > can
> >>>>>>> > > >> > >> start a vote thread soonish, as there are no concerns
> so
> >>>>>>> far.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Dawid
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Dear Flink developers,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1] for
> an
> >>>>>>> > extension
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> of
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Watermark Alignment to perform alignment also in
> >>>>>>> SplitReaders. To
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> do
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> so,
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> SplitReaders must be able to suspend and resume reading
> >>>>>>> from
> >>>>>>> > split
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> sources
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> where the SourceOperator coordinates and controlls
> >>>>>>> suspend and
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> resume.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> To
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> gather information about current watermarks of the
> >>>>>>> SplitReaders,
> >>>>>>> > we
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> extend
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
> >>>>>>> watermarks to
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> the
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> SourceOperator.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid
> >>>>>>> Heise and
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> revised
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> and
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the work.)
> >>>>>>> and me.
> >>>>>>> > The
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> changes
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> are backwards compatible in a way that if affected
> >>>>>>> components do
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> not
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> support split alignment the behavior is as before.
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Best,
> >>>>>>> > > >> > >> Sebastian
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> [1]
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> >
> >>>>>>> > > >>
> >>>>>>> > >
> >>>>>>> >
> >>>>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> [2]
> >>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> --
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >> Konstantin Knaufhttps://
> >>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
> >>>>>>> > > >> > >>
> >>>>>>> > > >> > >>
> >>>>>>> > > >> >
> >>>>>>> > > >>
> >>>>>>> > > >
> >>>>>>> > >
> >>>>>>> >
> >>>>>>>
> >>>>>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi,

Actually previously I thought about having a decorative interface and
whenever watermark alignment is enabled, checking that the source
implements the decorative interface. If not, throwing an exception.

The option with default methods in the source interfaces throwing
`UnsupportedOperationException` I think still suffers from the same
problems I mentioned before. It's still an optional implementation and at
the same time it's clogging the base interface. I think I would still vote
soft -1 on this option, but I wouldn't block it in case I am out-voted.

Best,
Piotrek

śr., 11 maj 2022 o 14:22 Sebastian Mattheis <se...@ververica.com>
napisał(a):

> Hi Becket,
>
> Thanks a lot for your fast and detailed response. For me, it converges and
> dropping the supportsX method sounds very reasonable to me. (Side note:
> With "pausable splits" enabled as "default" I think we misunderstood. As
> you described now "default" I understand as that it should be the new
> recommended way of implementation, and I think that is fully valid. Before,
> I understood "default" here as the default implementation, i.e., throwing
> UnsupportedOperationException, which is the exact opposite. :) )
>
> Nevertheless: As mentioned, an open question for me is if watermark
> alignment should enforce pausable splits. For clarification, the current
> documentation [1] says:
>
> *Note:* As of 1.15, Flink supports aligning across tasks of the same
>> source and/or different sources. It does not support aligning
>> splits/partitions/shards in the same task.
>>
>> In a case where there are e.g. two Kafka partitions that produce
>> watermarks at different pace, that get assigned to the same task watermark
>> might not behave as expected. Fortunately, worst case it should not perform
>> worse than without alignment.
>>
>> Given the limitation above, we suggest applying watermark alignment in
>> two situations:
>>
>>    1. You have two different sources (e.g. Kafka and File) that produce
>>    watermarks at different speeds
>>    2. You run your source with parallelism equal to the number of
>>    splits/shards/partitions, which results in every subtask being assigned a
>>    single unit of work.
>>
>> I personally see no issue in implementing and I see no reason against
> implementing this dependency of watermark alignment and pausable splits. (I
> think this would even be a good path towards shaping watermark alignment in
> 1.16.) However, "I don't see" means that I would be happy to hear Dawid's
> and Piotrek's opinions as they implemented watermark alignment based on
> FLIP-182 [2] and I don't want to miss relevant rationale/background info
> from their side.
>
> *@Piotrek* *@Dawid *What do you think?
>
> Regards,
> Sebastian
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
>
> On Wed, May 11, 2022 at 1:30 PM Becket Qin <be...@gmail.com> wrote:
>
>> +dev
>>
>> Hi Sebastian,
>>
>> Thank you for the summary. Please see the detailed replies inline. As a
>> recap of my suggestions.
>>
>> 1. Pausable splits API.
>>   a) Add default implementations to methods "pauseOrResumeSplits" in both
>> SourceReader and SplitReader where both default implementations throw
>>  UnsupportedOperationException.
>>
>> 2. User story.
>>     a) We tell users to enable the watermark alignment as they like. This
>> is exactly what the current Flink API is.
>>     b) We tell the source developers, please implement pausable splits,
>> otherwise bad things may happen. Think of it like you are expected to
>> implement SourceReader#snapshotState() properly, otherwise exceptions will
>> be thrown when users enable checkpointing.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
>> sebastian@ververica.com> wrote:
>>
>>> Hi Becket, Hi everybody,
>>>
>>> I'm sorry if I misread the messages but I could not derive an agreement
>>> from the mailing list. Nevertheless, if I understand you right the
>>> suggestion is:
>>>
>>> * Add default implementations to methods "pauseOrResumeSplits" in both
>>> SourceReader and SplitReader where both default implementations throw
>>> UnsupportedOperationException.
>>>
>> Yes.
>>
>> * Add "supportsPauseOrResumeSplits" to the Source interface. (In the
>>> following, I refer to supporting this as "pausable splits".)
>>>
>> We may no longer need this if pausable splits are expected to be
>> implemented by the source developers, i.e. non-optional. Having this method
>> would then be somewhat misleading as it looks like the sources that do not
>> support pausable splits are also acceptable in the long term. So API wise,
>> I'd say maybe we should remove this for this FLIP, although I believe this
>> supportXXX pattern itself is still attractive for optional features.
>>
>>
>>>
>>> To make the conclusions explicit:
>>>
>>> 1. The implementation of pauseOrResumeSplits in both interfaces
>>> SourceReader and SplitReader are optional where the default is that it
>>> doesn't support it. (--> This means that the implementation is still
>>> optional for the source developer.)
>>>
>> It is optional for backwards compatibility with existing sources, as they
>> may still compile without code change. But starting from this FLIP, Flink
>> will always optimistically assume that all the sources support pausable
>> splits. If a source does not support pausable splits, it goes to an error
>> handling path when watermark alignment is enabled on it. This is different
>> from a usual optional feature, where no error is expected.
>>
>>
>>> 2. If watermark alignment is enabled in the application code by adding
>>> withWatermarkAlignment to the WatermarkStrategy while SourceReader or
>>> SplitReader do not support pausableSplits, we throw an
>>> UnsupportedOperationException.
>>>
>> Yes.
>>
>>
>>> 3. With regard to your statement:
>>>
>>>> [...] basically means watermark alignment is an non-optional feature to
>>>> the end users.
>>>
>>> You actually mean that "pausable splits" are non-optional for the app
>>> developer if watermark alignment is enabled. However, watermark alignment
>>> is optional and can be enabled/disabled.
>>>
>> Yes, watermark alignment can be enabled/disabled in individual sources in
>> Flink jobs, which basically means the code supporting watermark alignment
>> has to already be there. That again means the Source developers are also
>> expected to support pausable splits by default. So this way we essentially
>> tell the end users that you may enable / disable this feature as you wish,
>> and tell the source developers that you SHOULD implement this because the
>> end users may turn it on/off at will. And if the source does not support
>> pausable splits, that goes to an error handling path when watermark
>> alignment is enabled on it. So users know they have to explicitly exclude
>> this source.
>>
>>
>>>
>>> So far it's totally clear to me and I hope this is what you mean. I also
>>> agree with both statements:
>>>
>>> So making that expectation aligned with the source developers seems
>>>> reasonable.
>>>>
>>>
>>> I think this is a simple and clean solution from both the end user and
>>>> source developers' standpoint.
>>>>
>>>
>>> However, a last conclusion derives from 3. and is an open question for
>>> me:
>>>
>>> 4. The feature of "pausable splits" is now tightly bound to watermark
>>> alignment, i.e., if sources do not support "pausable splits" one can not
>>> enable watermark alignment for these sources. This dependency is not the
>>> current status of watermark alignment implementation because it is/was
>>> implemented without pausable splits. Do we want to introduce this
>>> dependency? (This is an open question. I cannot judge that.)
>>>
>> The watermark alignment basically relies on the pausable splits, right?
>> So personally I found it quite reasonable that if the source does not
>> support pausable splits, end users cannot enable watermark alignment on it.
>>
>>
>>> If something is wrong, please correct me.
>>>
>>> Regards,
>>> Sebastian
>>>
>>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com> wrote:
>>>
>>>> Hi Sebastian,
>>>>
>>>> Thanks for the reply and patient discussion. I agree this is a tricky
>>>> decision.
>>>>
>>>>
>>>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
>>>>> follows:
>>>>> (1) An interface with default NOOP implementation makes the
>>>>> implementation optional. And in my opinion, a default implementation is and
>>>>> will remain a way of making implementation optional because even in future
>>>>> a developer can decide to implement the "old flavor" without support for
>>>>> pausable splits.
>>>>> (2) It may not be too critical but I also find it suboptimal that with
>>>>> a NOOP default implementation there is no way to check at runtime if
>>>>> SourceReader or SplitReader actually support pausing. (To do so, one would
>>>>> need a supportsX method which makes it again more complicated.)\
>>>>
>>>>
>>>> Based on the last few messages in the mailing list.  Piotr and I agreed
>>>> that the default implementation should just throw an
>>>> UnsupportedOperationException if the source is unpausable. So this
>>>> basically tells the Source developers that this feature is expected to be
>>>> supported. Because we cannot prevent end users from putting an unpausable
>>>> source into the watermark alignment group, that basically means watermark
>>>> alignment is an non-optional feature to the end users. So making that
>>>> expectation aligned with the source developers seems reasonable.  And if a
>>>> source does not support this feature, the end users should explicitly
>>>> remove that source from the watermark alignment group.
>>>>
>>>> Personally speaking I think this is a simple and clean solution from
>>>> both the end user and source developers' standpoint.
>>>>
>>>> Does this address your concerns?
>>>>
>>>> Thanks,
>>>>
>>>> Jiangjie (Becket) Qin
>>>>
>>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
>>>> sebastian@ververica.com> wrote:
>>>>
>>>>> Hi Piotr, Hi Becket, Hi everybody,
>>>>>
>>>>> we, Dawid and I, discussed the various suggestions/options and we
>>>>> would be okay either way because we find neither solution is perfect just
>>>>> because of the already present complexity.
>>>>>
>>>>> Option c) Adding methods to the interfaces of SourceReader and
>>>>> SplitReader
>>>>> Option a) Adding decorative interfaces to be used by SourceReader and
>>>>> SplitReader
>>>>>
>>>>> As of the current status (v. 12) of the FLIP [1], it is based on
>>>>> Option c) which we find acceptable because the complexity added is only a
>>>>> single method.
>>>>>
>>>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
>>>>> follows:
>>>>> (1) An interface with default NOOP implementation makes the
>>>>> implementation optional. And in my opinion, a default implementation is and
>>>>> will remain a way of making implementation optional because even in future
>>>>> a developer can decide to implement the "old flavor" without support for
>>>>> pausable splits.
>>>>> (2) It may not be too critical but I also find it suboptimal that with
>>>>> a NOOP default implementation there is no way to check at runtime if
>>>>> SourceReader or SplitReader actually support pausing. (To do so, one would
>>>>> need a supportsX method which makes it again more complicated.)
>>>>>
>>>>> However, we haven't changed it because Option a) is also not optimal
>>>>> or straight-forward:
>>>>> (1) We need to add two distinct yet similar decorative interfaces
>>>>> since, as mentioned, the signatures of the methods are different. For
>>>>> example, we would need decorative interfaces like
>>>>> `SplitReaderWithPausableSplits` and `SourceReaderWithPausableSplits`.
>>>>> (2) As a consequence, we would need to somehow document how/where to
>>>>> implement both interfaces and how this relates to each other. This we could
>>>>> solve by adding a note in the interface of SourceReader and SplitReader and
>>>>> reference to the decorative interfaces but it still increases complexity
>>>>> too.
>>>>>
>>>>> In summary, we see both as acceptable and preferred over other
>>>>> options. The question is if we can find a solution or compromise that is
>>>>> acceptable for everybody to reach consensus.
>>>>>
>>>>> Please let us know what you think because we would be happy if we can
>>>>> conclude the discussion to avoid dropping the initiative on this FLIP.
>>>>>
>>>>> Regards,
>>>>> Sebastian
>>>>>
>>>>> [1]
>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
>>>>> (v. 12)
>>>>>
>>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <pn...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hi Guowei,
>>>>>>
>>>>>> as Dawid wrote a couple of messages back:
>>>>>>
>>>>>> > This is covered in the previous FLIP[1] which has been already
>>>>>> implemented in 1.15. In short, it must be enabled with the watermark
>>>>>> strategy which also configures drift and update interval
>>>>>>
>>>>>> So by default watermark alignment is disabled, regardless if a source
>>>>>> supports it or not.
>>>>>>
>>>>>> Best,
>>>>>> Piotrek
>>>>>>
>>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com> napisał(a):
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We know that in the case of Bounded input Flink supports the Batch
>>>>>>> execution mode. Currently in Batch execution mode, flink is executed
>>>>>>> on a
>>>>>>> stage-by-stage basis. In this way, perhaps watermark alignment might
>>>>>>> not
>>>>>>> gain much.
>>>>>>>
>>>>>>> So my question is: Is watermark alignment the default behavior(for
>>>>>>> implemented source only)? If so, have you considered evaluating the
>>>>>>> impact
>>>>>>> of this behavior on the Batch execution mode? Or thinks it is not
>>>>>>> necessary.
>>>>>>>
>>>>>>> Correct me if I miss something.
>>>>>>>
>>>>>>> Best,
>>>>>>> Guowei
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
>>>>>>> piotr.nowojski@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> > Hi Becket and Dawid,
>>>>>>> >
>>>>>>> > > I feel that no matter which option we choose this can not be
>>>>>>> solved
>>>>>>> > entirely in either of the options, because of the point above and
>>>>>>> because
>>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
>>>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
>>>>>>> identifies
>>>>>>> > splits with splitId the other one passes the splits directly).
>>>>>>> >
>>>>>>> > Yes, that's a good point in this case and for features that need
>>>>>>> to be
>>>>>>> > implemented in more than one place.
>>>>>>> >
>>>>>>> > > Is there any reason for pausing reading from a split an optional
>>>>>>> feature,
>>>>>>> > > other than that this was not included in the original interface?
>>>>>>> >
>>>>>>> > An additional argument in favor of making it optional is to
>>>>>>> simplify source
>>>>>>> > implementation. But on its own I'm not sure if that would be
>>>>>>> enough to
>>>>>>> > justify making this feature optional. Maybe.
>>>>>>> >
>>>>>>> > > I think it would be way simpler and clearer to just let end
>>>>>>> users and
>>>>>>> > Flink
>>>>>>> > > assume all the connectors will implement this feature.
>>>>>>> >
>>>>>>> > As I wrote above that would be an interesting choice to make (ease
>>>>>>> of
>>>>>>> > implementation for new users, vs system consistency). Regardless
>>>>>>> of that,
>>>>>>> > yes, for me the main argument is the API backward compatibility.
>>>>>>> But let's
>>>>>>> > clear a couple of points:
>>>>>>> > - The current proposal adding methods to the base interface with
>>>>>>> default
>>>>>>> > implementations is an OPTIONAL feature. Same as the decorative
>>>>>>> version
>>>>>>> > would be.
>>>>>>> > - Decorative version could implement "throw
>>>>>>> UnsupportedOperationException"
>>>>>>> > if user enabled watermark alignment just as well and I agree
>>>>>>> that's a
>>>>>>> > better option compared to logging a warning.
>>>>>>> >
>>>>>>> > Best,
>>>>>>> > Piotrek
>>>>>>> >
>>>>>>> >
>>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
>>>>>>> napisał(a):
>>>>>>> >
>>>>>>> > > Thanks for the reply and patient discussion, Piotr and Dawid.
>>>>>>> > >
>>>>>>> > > Is there any reason for pausing reading from a split an optional
>>>>>>> feature,
>>>>>>> > > other than that this was not included in the original interface?
>>>>>>> > >
>>>>>>> > > To be honest I am really worried about the complexity of the
>>>>>>> user story
>>>>>>> > > here. Optional features like this have a high overhead. Imagine
>>>>>>> this
>>>>>>> > > feature is optional, now a user enabled watermark alignment and
>>>>>>> defined a
>>>>>>> > > few watermark groups. Would it work? Hmm, that depends on
>>>>>>> whether the
>>>>>>> > > involved Source has implmemented this feature. If the Sources
>>>>>>> are well
>>>>>>> > > documented, good luck. Otherwise end users may have to look into
>>>>>>> the code
>>>>>>> > > of the Source to see whether the feature is supported. Which is
>>>>>>> something
>>>>>>> > > they shouldn't have to do.
>>>>>>> > >
>>>>>>> > > I think it would be way simpler and clearer to just let end
>>>>>>> users and
>>>>>>> > Flink
>>>>>>> > > assume all the connectors will implement this feature. After all
>>>>>>> the
>>>>>>> > > watermark group is not optinoal to the end users. If in some
>>>>>>> rare cases,
>>>>>>> > > the feature cannot be supported, a clear
>>>>>>> UnsupportedOperationException
>>>>>>> > will
>>>>>>> > > be thrown to tell users to explicitly remove this Source from the
>>>>>>> > watermark
>>>>>>> > > group. I don't think we should have a warning message here, as
>>>>>>> they tend
>>>>>>> > to
>>>>>>> > > be ignored in many cases. If we do this, we don't even need the
>>>>>>> > supportXXX
>>>>>>> > > method in the Source for this feature. In fact this is exactly
>>>>>>> how many
>>>>>>> > > interfaces works today. For example,
>>>>>>> SplitEnumerator#addSplitsBack() is
>>>>>>> > not
>>>>>>> > > supported by Pravega source because it does not support partial
>>>>>>> failover.
>>>>>>> > > In that case, it simply throws an exception to trigger a global
>>>>>>> recovery.
>>>>>>> > >
>>>>>>> > > The reason we add a default implementation in this case would
>>>>>>> just for
>>>>>>> > the
>>>>>>> > > sake of backwards compatibility so the old source can still
>>>>>>> compile.
>>>>>>> > Sure,
>>>>>>> > > in short term, this feature might not be supported by many
>>>>>>> existing
>>>>>>> > > sources. That is OK, and it is quite visible to the source
>>>>>>> developers
>>>>>>> > that
>>>>>>> > > they did not override the default impl which throws an
>>>>>>> > > UnsupportedOperationException.
>>>>>>> > >
>>>>>>> > > @Dawid,
>>>>>>> > >
>>>>>>> > > the Java doc of the SupportXXX() method in the Source would be
>>>>>>> the single
>>>>>>> > > >> source of truth regarding how to implement this feature.
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > >
>>>>>>> > > I also don't find it entirely true. Half of the classes are
>>>>>>> theoretically
>>>>>>> > > > optional and are utility classes from the point of view how the
>>>>>>> > > interfaces
>>>>>>> > > > are organized. Theoretically users do not need to use any of
>>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>>>>>> methods in
>>>>>>> > > the
>>>>>>> > > > Source interface.
>>>>>>> > >
>>>>>>> > > I think the ultimate goal of java docs is to guide users to
>>>>>>> implement the
>>>>>>> > > Source. If SourceReaderBase is the preferred way to implement a
>>>>>>> > > SourceReader, it seems worth mentioning that. Even the Java
>>>>>>> language
>>>>>>> > > documentation interfaces lists the konwn implementations [1] so
>>>>>>> people
>>>>>>> > can
>>>>>>> > > leverage them. But for this particular case, if we make the
>>>>>>> feature
>>>>>>> > > non-optional, we don't even need the supportXXX() method for now.
>>>>>>> > >
>>>>>>> > > Thanks,
>>>>>>> > >
>>>>>>> > > Jiangjie (Becket) Qin
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>>>>>>> dwysakowicz@apache.org>
>>>>>>> > > wrote:
>>>>>>> > >
>>>>>>> > > > Hey Piotr and Becket,
>>>>>>> > > >
>>>>>>> > > > First of all, let me say I am happy with whichever option is
>>>>>>> agreed in
>>>>>>> > > the
>>>>>>> > > > discussion.
>>>>>>> > > >
>>>>>>> > > > I wanted to clarify a few points from the discussion though:
>>>>>>> > > >
>>>>>>> > > > @Becket:
>>>>>>> > > >
>>>>>>> > > > The main argument for adding the methods to the SourceReader
>>>>>>> is that
>>>>>>> > > these
>>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader impl, i.e.
>>>>>>> > starting
>>>>>>> > > > from this FLIP, all the SourceReaders impl are expected to
>>>>>>> support this
>>>>>>> > > > method, although some old implementations may not have
>>>>>>> implemented this
>>>>>>> > > > feature. I think we should distinguish the new features from
>>>>>>> the
>>>>>>> > optional
>>>>>>> > > > features. While the public decorative interface is a solution
>>>>>>> to the
>>>>>>> > > > optional features. We should not use it for the features that
>>>>>>> are
>>>>>>> > > > non-optional.
>>>>>>> > > >
>>>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even though
>>>>>>> > preferred, I
>>>>>>> > > > still think it can be simply optional.
>>>>>>> > > >
>>>>>>> > > > the Java doc of the SupportXXX() method in the Source would be
>>>>>>> the
>>>>>>> > single
>>>>>>> > > > source of truth regarding how to implement this feature.
>>>>>>> > > >
>>>>>>> > > > I also don't find it entirely true. Half of the classes are
>>>>>>> > theoretically
>>>>>>> > > > optional and are utility classes from the point of view how the
>>>>>>> > > interfaces
>>>>>>> > > > are organized. Theoretically users do not need to use any of
>>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>>>>>> methods in
>>>>>>> > > the
>>>>>>> > > > Source interface.
>>>>>>> > > >
>>>>>>> > > > @Piotr
>>>>>>> > > >
>>>>>>> > > > If we have all of the methods with default implementation in
>>>>>>> the base
>>>>>>> > > > interface, the API doesn't give any clue to the user which set
>>>>>>> of
>>>>>>> > methods
>>>>>>> > > > are required to be implemented at the same time.
>>>>>>> > > >
>>>>>>> > > > I feel that no matter which option we choose this can not be
>>>>>>> solved
>>>>>>> > > > entirely in either of the options, because of the point above
>>>>>>> and
>>>>>>> > because
>>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
>>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different (one
>>>>>>> identifies
>>>>>>> > > > splits with splitId the other one passes the splits directly).
>>>>>>> > > >
>>>>>>> > > > Best,
>>>>>>> > > >
>>>>>>> > > > Dawid
>>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>>>>>>> > > >
>>>>>>> > > > Hi Piotr,
>>>>>>> > > >
>>>>>>> > > > Thanks for the comment.
>>>>>>> > > >
>>>>>>> > > > Just to clarify, I am not against the decorative interfaces,
>>>>>>> but I do
>>>>>>> > > > think we should use them with caution. The main argument for
>>>>>>> adding the
>>>>>>> > > > methods to the SourceReader is that these methods are
>>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting
>>>>>>> from this
>>>>>>> > > > FLIP, all the SourceReaders impl are expected to support this
>>>>>>> > > > method, although some old implementations may not have
>>>>>>> implemented this
>>>>>>> > > > feature. I think we should distinguish the new features from
>>>>>>> the
>>>>>>> > optional
>>>>>>> > > > features. While the public decorative interface is a solution
>>>>>>> to the
>>>>>>> > > > optional features. We should not use it for the features that
>>>>>>> are
>>>>>>> > > > non-optional.
>>>>>>> > > >
>>>>>>> > > > That said, this feature is optional for SplitReaders. Arguably
>>>>>>> we can
>>>>>>> > > have
>>>>>>> > > > a decorative interface for that, but for simplicity and
>>>>>>> symmetry of the
>>>>>>> > > > interface, personally I prefer just adding a new method.
>>>>>>> > > >
>>>>>>> > > > Regarding the advantages you mentioned about the decorative
>>>>>>> interfaces,
>>>>>>> > > > they would make sense if:
>>>>>>> > > > 1. The feature is optional.
>>>>>>> > > > 2. There is only one decorative interface involved for a
>>>>>>> feature.
>>>>>>> > > > Otherwise the argument that all the methods are grouped
>>>>>>> together will
>>>>>>> > not
>>>>>>> > > > stand.
>>>>>>> > > >
>>>>>>> > > > Compared with that, I think the current solution works fine in
>>>>>>> all
>>>>>>> > cases,
>>>>>>> > > > i.e. "having supportXXX() method in Source, and default
>>>>>>> methods /
>>>>>>> > > > decorative interfaces in base interfaces.".
>>>>>>> > > >
>>>>>>> > > > The advantages are:
>>>>>>> > > >> - clean and easy to implement base interface
>>>>>>> > > >
>>>>>>> > > > In the current approach, the Java doc of the SupportXXX()
>>>>>>> method in the
>>>>>>> > > > Source would be the single source of truth regarding how to
>>>>>>> implement
>>>>>>> > > this
>>>>>>> > > > feature. It lists the method that has to be implemented to
>>>>>>> support this
>>>>>>> > > > feature, regardless of how many classes / interfaces are
>>>>>>> involved.
>>>>>>> > > >
>>>>>>> > > > When implementing the base interface, users do not need to
>>>>>>> implement a
>>>>>>> > > > method with default implementation. If they are curious what
>>>>>>> the method
>>>>>>> > > is
>>>>>>> > > > for, the java doc of that method simply points users to the
>>>>>>> > SupportXXX()
>>>>>>> > > > method in the Source. It seems not adding work to the users
>>>>>>> compared
>>>>>>> > with
>>>>>>> > > > decorative interfaces, but gives much better discoverability.
>>>>>>> > > >
>>>>>>> > > > - all of the methods from a single feature are grouped in a
>>>>>>> single
>>>>>>> > > >> decorator interface, together with their dedicated java doc.
>>>>>>> It's also
>>>>>>> > > >> easier to google search for help using the decorator name
>>>>>>> > > >
>>>>>>> > > > - if an optional feature requires two methods to be
>>>>>>> implemented at
>>>>>>> > once,
>>>>>>> > > >> decorator can guarantee that
>>>>>>> > > >
>>>>>>> > > > These two points are not true when multiple components and
>>>>>>> classes are
>>>>>>> > > > involved collaboratively to provide a feature. In our case, we
>>>>>>> have
>>>>>>> > both
>>>>>>> > > > SourceReader and SplitReader involved. And there might be other
>>>>>>> > > interfaces
>>>>>>> > > > on the JM side involved for some future features. So the
>>>>>>> relevant
>>>>>>> > methods
>>>>>>> > > > can actually be scattered over the places. That said, we may
>>>>>>> still use
>>>>>>> > > > decorative interfaces for each component, if the feature is
>>>>>>> optional,
>>>>>>> > > given
>>>>>>> > > > there is a single source of truth for the feature.
>>>>>>> > > >
>>>>>>> > > > Here I would strongly lean towards making life easier for new
>>>>>>> users,
>>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for
>>>>>>> the power
>>>>>>> > > >> users.
>>>>>>> > > >
>>>>>>> > > > I actually think the current approach is simpler, more
>>>>>>> extensible and
>>>>>>> > > more
>>>>>>> > > > general for all the users. Can you articulate a bit more on
>>>>>>> which part
>>>>>>> > > you
>>>>>>> > > > think makes users harder to understand?
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > There is another benefit of the decorative interfaces which is
>>>>>>> not
>>>>>>> > > > mentioned, but might be worth considering here. Usually the
>>>>>>> decorative
>>>>>>> > > > interfaces give slightly better backwards compatibility than
>>>>>>> the new
>>>>>>> > > > default method in the interfaces. That is when users are using
>>>>>>> a jar
>>>>>>> > that
>>>>>>> > > > was compiled with an older version of Flink which does not
>>>>>>> have the
>>>>>>> > > default
>>>>>>> > > > method in the interfaces in question. A decorative interface
>>>>>>> may still
>>>>>>> > > > provide backwards compatibility in that case, while default
>>>>>>> method impl
>>>>>>> > > > cannot.
>>>>>>> > > >
>>>>>>> > > > I think in Flink we in general do not guarantee custom
>>>>>>> components
>>>>>>> > > compiled
>>>>>>> > > > with an older version can run with a newer version of Flink. A
>>>>>>> > recompile
>>>>>>> > > > with a newer version would be required. That said, if we do
>>>>>>> care about
>>>>>>> > > > this, we can just change the "supportXXX()" method in the
>>>>>>> Source
>>>>>>> > > interface
>>>>>>> > > > to use decorative interfaces, and leave the other parts
>>>>>>> unchanged.
>>>>>>> > > >
>>>>>>> > > > Thanks,
>>>>>>> > > >
>>>>>>> > > > Jiangjie (Becket) Qin
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > >
>>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>>>>>>> pnowojski@apache.org>
>>>>>>> > > > wrote:
>>>>>>> > > >
>>>>>>> > > >> Hi,
>>>>>>> > > >>
>>>>>>> > > >> Sorry for chipping in so late, but I was OoO for the last two
>>>>>>> weeks.
>>>>>>> > > >> Regarding the interfaces, I would be actually against adding
>>>>>>> those
>>>>>>> > > methods
>>>>>>> > > >> to the base interfaces for the reasons mentioned above.
>>>>>>> Clogging the
>>>>>>> > > base
>>>>>>> > > >> interface for new users with tons of methods that they do not
>>>>>>> need, do
>>>>>>> > > not
>>>>>>> > > >> understand and do not know what to do with them. Moreover,
>>>>>>> such
>>>>>>> > > decorative
>>>>>>> > > >> interfaces are solving a problem if a feature requires two or
>>>>>>> more
>>>>>>> > > methods
>>>>>>> > > >> to be implemented at the same time. If we have all of the
>>>>>>> methods with
>>>>>>> > > >> default implementation in the base interface, the API doesn't
>>>>>>> give any
>>>>>>> > > >> clue
>>>>>>> > > >> to the user which set of methods are required to be
>>>>>>> implemented at the
>>>>>>> > > >> same
>>>>>>> > > >> time.
>>>>>>> > > >>
>>>>>>> > > >> > a) I feel the biggest drawback of decorative interfaces is
>>>>>>> which
>>>>>>> > > >> interface
>>>>>>> > > >> > they can decorate and which combinations of multiple
>>>>>>> decorative
>>>>>>> > > >> interfaces
>>>>>>> > > >> > are valid. (...)
>>>>>>> > > >> > In the future, if there is a new feature added
>>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we going
>>>>>>> to create
>>>>>>> > > >> another
>>>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
>>>>>>> > > >> PrePartitionedAware?
>>>>>>> > > >> > Can they be combined? So I think the additional decorative
>>>>>>> interface
>>>>>>> > > >> like
>>>>>>> > > >> > withSplitsAlignment actually increases the understanding
>>>>>>> cost of
>>>>>>> > users
>>>>>>> > > >> > because they have to know what decorative interfaces are
>>>>>>> there,
>>>>>>> > which
>>>>>>> > > >> > interface they can decorate and which combinations of the
>>>>>>> decorative
>>>>>>> > > >> > interfaces are valid and which are not. Ideally we want to
>>>>>>> avoid
>>>>>>> > that.
>>>>>>> > > >>
>>>>>>> > > >> I'm not sure if I understand how embedding default methods in
>>>>>>> the base
>>>>>>> > > >> interface is solving the problem: what can be combined or
>>>>>>> not? If
>>>>>>> > there
>>>>>>> > > >> are
>>>>>>> > > >> two conflicting features, having decorative interfaces that
>>>>>>> can not be
>>>>>>> > > >> mixed together actually makes much more sense to me rather
>>>>>>> than having
>>>>>>> > > >> them
>>>>>>> > > >> all in one base class. How would you allow users to implement
>>>>>>> only one
>>>>>>> > > of
>>>>>>> > > >> those two features?
>>>>>>> > > >>
>>>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
>>>>>>> > > >> - how can a user discover what decorators are there?
>>>>>>> > > >> - how can a user know where the decorator can be applied to?
>>>>>>> > > >>
>>>>>>> > > >> However those are drawbacks for more power users, that can be
>>>>>>> > mitigated
>>>>>>> > > by
>>>>>>> > > >> the documentation. For example listing all of the decorators
>>>>>>> with
>>>>>>> > > >> detailed explanation both in the docs and in the java docs.
>>>>>>> More
>>>>>>> > > >> experienced users will be able to deal with those issues
>>>>>>> easier, as
>>>>>>> > they
>>>>>>> > > >> will already have some basic understanding of Flink. Also if
>>>>>>> user has
>>>>>>> > a
>>>>>>> > > >> problem that he wants to solve, he will google search a
>>>>>>> potential
>>>>>>> > > solution
>>>>>>> > > >> to his problem anyway, and while doing that he is very likely
>>>>>>> to
>>>>>>> > > discover
>>>>>>> > > >> the decorator that he needs anyway in the docs.
>>>>>>> > > >>
>>>>>>> > > >> The advantages are:
>>>>>>> > > >> - clean and easy to implement base interface
>>>>>>> > > >> - all of the methods from a single feature are grouped in a
>>>>>>> single
>>>>>>> > > >> decorator interface, together with their dedicated java doc.
>>>>>>> It's also
>>>>>>> > > >> easier to google search for help using the decorator name
>>>>>>> > > >> - if an optional feature requires two methods to be
>>>>>>> implemented at
>>>>>>> > once,
>>>>>>> > > >> decorator can guarantee that
>>>>>>> > > >>
>>>>>>> > > >> Here I would strongly lean towards making life easier for new
>>>>>>> users,
>>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for
>>>>>>> the power
>>>>>>> > > >> users.
>>>>>>> > > >>
>>>>>>> > > >> Best,
>>>>>>> > > >> Piotrek
>>>>>>> > > >>
>>>>>>> > > >>
>>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
>>>>>>> > napisał(a):
>>>>>>> > > >>
>>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
>>>>>>> > > >> >
>>>>>>> > > >> > I think Sebastion has a good summary. This is a really
>>>>>>> helpful
>>>>>>> > > >> discussion.
>>>>>>> > > >> >
>>>>>>> > > >> > Thinking a bit more, I feel that it might still be better
>>>>>>> to add the
>>>>>>> > > >> > supportsXXX() method in the Source rather than SourceReader.
>>>>>>> > > >> >
>>>>>>> > > >> > Generally speaking, what we are trying to do here is to let
>>>>>>> the
>>>>>>> > Flink
>>>>>>> > > >> > framework know what the Source is capable of. In this FLIP,
>>>>>>> it
>>>>>>> > happens
>>>>>>> > > >> to
>>>>>>> > > >> > be the capability that only involves SourceReader. But in
>>>>>>> the
>>>>>>> > future,
>>>>>>> > > >> it is
>>>>>>> > > >> > possible that another functionality involves both the
>>>>>>> > SplitEnumerator
>>>>>>> > > >> and
>>>>>>> > > >> > SourceReader. In that case, following the current approach,
>>>>>>> we
>>>>>>> > should
>>>>>>> > > >> put
>>>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
>>>>>>> SourceReader.
>>>>>>> > > >> > Because if we only put this in the SourceReader, then the
>>>>>>> JM would
>>>>>>> > > have
>>>>>>> > > >> to
>>>>>>> > > >> > create a SourceReader in order to know whether this feature
>>>>>>> is
>>>>>>> > > >> supported,
>>>>>>> > > >> > which is a little ugly. But if we put the "supportsXXX()"
>>>>>>> method in
>>>>>>> > > the
>>>>>>> > > >> > Source, we will break the "symmetric" design because this
>>>>>>> FLIP
>>>>>>> > chose a
>>>>>>> > > >> > different way.
>>>>>>> > > >> >
>>>>>>> > > >> > This is also why I think supportsXXX() method seems a good
>>>>>>> thing to
>>>>>>> > > >> have,
>>>>>>> > > >> > because when there are a few interfaces / methods that are
>>>>>>> expected
>>>>>>> > to
>>>>>>> > > >> be
>>>>>>> > > >> > implemented at the same time in order to deliver a feature,
>>>>>>> it is
>>>>>>> > > always
>>>>>>> > > >> > good to have a single source of truth to tell the framework
>>>>>>> what to
>>>>>>> > > do,
>>>>>>> > > >> so
>>>>>>> > > >> > the framework can do consistent things in different parts.
>>>>>>> > > >> >
>>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
>>>>>>> > > >> >
>>>>>>> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
>>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is that we are
>>>>>>> > > essentially
>>>>>>> > > >> > expecting almost all the SourceReader implementations to
>>>>>>> extend
>>>>>>> > > >> > SourceReaderBase, which effectively makes the SourceReader
>>>>>>> interface
>>>>>>> > > >> > without the pausing support useless. So this indicates that
>>>>>>> public
>>>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
>>>>>>> purpose) only
>>>>>>> > > >> > make sense if the original interface is also expected to be
>>>>>>> used.
>>>>>>> > > >> > Otherwise, it seems makes more sense to add the method to
>>>>>>> the
>>>>>>> > original
>>>>>>> > > >> > interface itself.
>>>>>>> > > >> >
>>>>>>> > > >> > Cheers,
>>>>>>> > > >> >
>>>>>>> > > >> > Jiangjie (Becket) Qin
>>>>>>> > > >> >
>>>>>>> > > >> >
>>>>>>> > > >> >
>>>>>>> > > >> >
>>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
>>>>>>> > > >> dwysakowicz@apache.org>
>>>>>>> > > >> > wrote:
>>>>>>> > > >> >
>>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
>>>>>>> > > >> > >
>>>>>>> > > >> > > I think most of your points aligned with the suggestions
>>>>>>> I made to
>>>>>>> > > the
>>>>>>> > > >> > > FLIP, while you were writing your reply (I believe we hit
>>>>>>> enter
>>>>>>> > > >> nearly at
>>>>>>> > > >> > > the same time ;) )
>>>>>>> > > >> > >
>>>>>>> > > >> > > Two points after we synced offline
>>>>>>> > > >> > >
>>>>>>> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment to
>>>>>>> > > >> > > supportsPausingSplits to express the general capability of
>>>>>>> > pausing.
>>>>>>> > > >> > >
>>>>>>> > > >> > > 2. As for if we should
>>>>>>> PausingSourceReader/PausingSplitReader
>>>>>>> > > (option
>>>>>>> > > >> b)
>>>>>>> > > >> > > or if we should just add the methods (option c), I
>>>>>>> suggest to
>>>>>>> > simply
>>>>>>> > > >> add
>>>>>>> > > >> > > the two methods as I felt this is much preferred approach
>>>>>>> Becket,
>>>>>>> > > >> which
>>>>>>> > > >> > > others do not object. Unless there is an opposition let's
>>>>>>> go with
>>>>>>> > > this
>>>>>>> > > >> > > option c.
>>>>>>> > > >> > >
>>>>>>> > > >> > > Best,
>>>>>>> > > >> > >
>>>>>>> > > >> > > Dawid
>>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>>>>>>> > > >> > >
>>>>>>> > > >> > > Hi folks,
>>>>>>> > > >> > >
>>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all the
>>>>>>> input and
>>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time to
>>>>>>> catch up and
>>>>>>> > > >> > structure
>>>>>>> > > >> > > (for myself) the discussion and I wanted to find a way to
>>>>>>> > structure
>>>>>>> > > >> the
>>>>>>> > > >> > > conclusions. (Also because I had the feeling that some
>>>>>>> concerns
>>>>>>> > got
>>>>>>> > > >> lost
>>>>>>> > > >> > in
>>>>>>> > > >> > > the discussion.) This is my attempt and please correct me
>>>>>>> if
>>>>>>> > > >> something is
>>>>>>> > > >> > > wrong or misunderstood. I tried to collect and assemble
>>>>>>> the
>>>>>>> > > opinions,
>>>>>>> > > >> > > suggestions, and conclusions (to the best of my
>>>>>>> knowledge):
>>>>>>> > > >> > >
>>>>>>> > > >> > > # Top A: Should split alignment (pause/resume behavior)
>>>>>>> be a
>>>>>>> > general
>>>>>>> > > >> > > capability?
>>>>>>> > > >> > >
>>>>>>> > > >> > > I personally don't see any reason no to have it a general
>>>>>>> > capability
>>>>>>> > > >> > > because for the alignSplit method it is actually
>>>>>>> independent of
>>>>>>> > the
>>>>>>> > > >> > > watermarks. If we agree here to have it a general
>>>>>>> capability, we
>>>>>>> > > >> should
>>>>>>> > > >> > > also agree on the right wording. Does
>>>>>>> "alignSplits(splitsToResume,
>>>>>>> > > >> > > splitsToPause)" refer to what is then actually meant? (I
>>>>>>> see it as
>>>>>>> > > >> okay.
>>>>>>> > > >> > I
>>>>>>> > > >> > > don't have any better idea whilst Arvid suggested
>>>>>>> > > >> "pauseOrResumeSplits".)
>>>>>>> > > >> > >
>>>>>>> > > >> > > # Top B: Should it be possible do enable/disable split
>>>>>>> alignment?
>>>>>>> > > >> > >
>>>>>>> > > >> > > I would personally not disable the split alignment on the
>>>>>>> source
>>>>>>> > > >> reader
>>>>>>> > > >> > > side because if split alignment is used for some other
>>>>>>> use case
>>>>>>> > (see
>>>>>>> > > >> A)
>>>>>>> > > >> > it
>>>>>>> > > >> > > could have nasty side effects on other/future use cases.
>>>>>>> Instead,
>>>>>>> > I
>>>>>>> > > >> would
>>>>>>> > > >> > > disable "watermark split alignment" where I think it
>>>>>>> should
>>>>>>> > disable
>>>>>>> > > >> the
>>>>>>> > > >> > > watermark-dependent trigger for split alignment.
>>>>>>> > > >> > >
>>>>>>> > > >> > > # Top C: Should we add a supportsX method?
>>>>>>> > > >> > >
>>>>>>> > > >> > > I find it difficult to define the scope of a supportsX
>>>>>>> method
>>>>>>> > w.r.t.
>>>>>>> > > >> to
>>>>>>> > > >> > > the following questions: a) Where is it used? and b) What
>>>>>>> is the
>>>>>>> > > >> expected
>>>>>>> > > >> > > output? To b), it's not straight-forward to provide a
>>>>>>> meaningful
>>>>>>> > > >> output,
>>>>>>> > > >> > > e.g., if SourceReader supports split alignment but
>>>>>>> SplitReader
>>>>>>> > not.
>>>>>>> > > >> This
>>>>>>> > > >> > is
>>>>>>> > > >> > > because with the current implementation, we can determine
>>>>>>> whether
>>>>>>> > > >> split
>>>>>>> > > >> > > alignment is fully supported only during runtime and
>>>>>>> specifically
>>>>>>> > > >> > actually
>>>>>>> > > >> > > only when calling alignSplits down the call hierarchy up
>>>>>>> to the
>>>>>>> > > actual
>>>>>>> > > >> > > SplitReaders.
>>>>>>> > > >> > >
>>>>>>> > > >> > > Therefore, I would suggest to either raise an error or
>>>>>>> warning if
>>>>>>> > > the
>>>>>>> > > >> > > alignment is called but not supported at some point. I
>>>>>>> know we
>>>>>>> > > should
>>>>>>> > > >> > > carefully think about when this could be the case because
>>>>>>> we don't
>>>>>>> > > >> want
>>>>>>> > > >> > to
>>>>>>> > > >> > > flood anybody with such warnings. However, warnings could
>>>>>>> be an
>>>>>>> > > >> indicator
>>>>>>> > > >> > > for the user that for watermark split alignment use case
>>>>>>> split
>>>>>>> > > >> reading is
>>>>>>> > > >> > > imbalanced with the conclusion to either disable the
>>>>>>> trigger for
>>>>>>> > > >> > watermark
>>>>>>> > > >> > > split alignment (see Top B) or to use/implement a source
>>>>>>> and
>>>>>>> > reader
>>>>>>> > > >> that
>>>>>>> > > >> > > fully supports split alignment.
>>>>>>> > > >> > >
>>>>>>> > > >> > > # Top D: How to design interfaces?
>>>>>>> > > >> > >
>>>>>>> > > >> > > Thanks for structuring the discussion with the the various
>>>>>>> > > >> possibilities
>>>>>>> > > >> > > (a-d). From the discussion and emails, I would like to
>>>>>>> summarize
>>>>>>> > the
>>>>>>> > > >> > > following requirements:
>>>>>>> > > >> > > - Interfaces should be consistent ("symmetric"), i.e.,
>>>>>>> similar
>>>>>>> > > >> semantics
>>>>>>> > > >> > > should have similar interfaces with similar usage.
>>>>>>> > > >> > > - Make explicit which implementations implement
>>>>>>> interfaces/support
>>>>>>> > > >> > > behavior.
>>>>>>> > > >> > > - Make clear what are default implementations and how to
>>>>>>> implement
>>>>>>> > > >> > > interfaces with desired behavior.
>>>>>>> > > >> > >
>>>>>>> > > >> > > This is a simplified view of the relations between
>>>>>>> relevant
>>>>>>> > classes
>>>>>>> > > of
>>>>>>> > > >> > the
>>>>>>> > > >> > > PoC implementation:
>>>>>>> > > >> > >
>>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal)
>>>>>>> <|-- ..
>>>>>>> > <|--
>>>>>>> > > >> > > MySourceReader
>>>>>>> > > >> > >
>>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
>>>>>>> > SplitFetcher
>>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
>>>>>>> > > >> > >
>>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
>>>>>>> > > >> > >
>>>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
>>>>>>> implement most
>>>>>>> > of
>>>>>>> > > >> the
>>>>>>> > > >> > > "logic" for split alignment just because we wanted to
>>>>>>> implement
>>>>>>> > > split
>>>>>>> > > >> > > alignment and wanted it to be available as kind of a
>>>>>>> default. As a
>>>>>>> > > >> > > consequence, we have a "default implementation" for
>>>>>>> SourceReader
>>>>>>> > > that
>>>>>>> > > >> > > implements the actual logic for split alignment. For that
>>>>>>> reason,
>>>>>>> > I
>>>>>>> > > >> find
>>>>>>> > > >> > it
>>>>>>> > > >> > > very confusing to have a NOOP default implementation in
>>>>>>> the
>>>>>>> > > interface
>>>>>>> > > >> for
>>>>>>> > > >> > > the SourceReader. As a consequence, interface strategy c)
>>>>>>> is
>>>>>>> > > difficult
>>>>>>> > > >> > > because this would require NOOP default implementations
>>>>>>> in the
>>>>>>> > > public
>>>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is the
>>>>>>> same for
>>>>>>> > > >> strategy
>>>>>>> > > >> > > d) because it would require NOOP default implementation
>>>>>>> in the
>>>>>>> > > >> > > SourceReader. Further, as Dawid described method
>>>>>>> signatures of
>>>>>>> > > >> alignSplit
>>>>>>> > > >> > > for SourceReader and SplitReader differ and it would be
>>>>>>> extremely
>>>>>>> > > >> > difficult
>>>>>>> > > >> > > to make the signatures the same (with even potential
>>>>>>> performance
>>>>>>> > > >> impact
>>>>>>> > > >> > > because of additional loop-ups of split ids). Therefore,
>>>>>>> having a
>>>>>>> > > >> > symmetric
>>>>>>> > > >> > > decorative interface as of strategy a) is actually not
>>>>>>> possible
>>>>>>> > and
>>>>>>> > > >> > having
>>>>>>> > > >> > > two decorative interfaces with different method
>>>>>>> signatures is
>>>>>>> > > >> confusing.
>>>>>>> > > >> > My
>>>>>>> > > >> > > conclusion is that we are best with strategy b) which
>>>>>>> means to
>>>>>>> > have
>>>>>>> > > >> > > specializing sub-interfaces that inherit from the parent
>>>>>>> > interface:
>>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
>>>>>>> > > >> > AlignedSplitReader
>>>>>>> > > >> > > With this option, I'm not 100% sure what the implications
>>>>>>> are and
>>>>>>> > if
>>>>>>> > > >> this
>>>>>>> > > >> > > could get nasty. I would suggest that Dawid and I just
>>>>>>> try to
>>>>>>> > > >> implement
>>>>>>> > > >> > and
>>>>>>> > > >> > > see if we like it. :)
>>>>>>> > > >> > >
>>>>>>> > > >> > > # Summary
>>>>>>> > > >> > >
>>>>>>> > > >> > > In conclusion, please let me know your perspectives.
>>>>>>> Please
>>>>>>> > correct
>>>>>>> > > >> me,
>>>>>>> > > >> > if
>>>>>>> > > >> > > something is wrong or if I misunderstood something. My
>>>>>>> perspective
>>>>>>> > > >> would
>>>>>>> > > >> > be:
>>>>>>> > > >> > >
>>>>>>> > > >> > > Top A: Yes
>>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
>>>>>>> alignment)
>>>>>>> > > >> > > Top C: No
>>>>>>> > > >> > > Top D: b)
>>>>>>> > > >> > >
>>>>>>> > > >> > > Best,
>>>>>>> > > >> > > Sebastian
>>>>>>> > > >> > >
>>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
>>>>>>> > > >> dwysakowicz@apache.org
>>>>>>> > > >> > >
>>>>>>> > > >> > > wrote:
>>>>>>> > > >> > >
>>>>>>> > > >> > >> @Arvid:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> While I also like Becket's capability approach, I fear
>>>>>>> that it
>>>>>>> > > >> doesn't
>>>>>>> > > >> > work
>>>>>>> > > >> > >> for this particular use case: Sources can always be
>>>>>>> aligned
>>>>>>> > > >> cross-task
>>>>>>> > > >> > and
>>>>>>> > > >> > >> this is just about intra-task alignment. So it's
>>>>>>> plausible to put
>>>>>>> > > >> > sources
>>>>>>> > > >> > >> into an alignment group even though they do not use any
>>>>>>> of the
>>>>>>> > > >> presented
>>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
>>>>>>> they handle
>>>>>>> > > >> > multiple
>>>>>>> > > >> > >> splits (see motivation section).
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for telling if
>>>>>>> it
>>>>>>> > > supports
>>>>>>> > > >> > that
>>>>>>> > > >> > >> intra-task alignment. Cross-task alignment would always
>>>>>>> be
>>>>>>> > > supported.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I updated interfaces to what I believe to be closest to a
>>>>>>> > consensus
>>>>>>> > > >> > >> between all participants. Do you mind taking a look?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Dawid
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks for pushing this effort.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree that
>>>>>>> decorator
>>>>>>> > > >> interfaces
>>>>>>> > > >> > >> should be avoided but I'm also not a big fan of
>>>>>>> overloading the
>>>>>>> > > base
>>>>>>> > > >> > >> interfaces (they are hard to implement as is). The usual
>>>>>>> feedback
>>>>>>> > > to
>>>>>>> > > >> > >> Source-related interfaces are always that they are
>>>>>>> overwhelming
>>>>>>> > and
>>>>>>> > > >> too
>>>>>>> > > >> > >> hard to implement. However, I'd also not oppose 1c) as
>>>>>>> scattered
>>>>>>> > > >> > interfaces
>>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
>>>>>>> > > >> > >> While I also like Becket's capability approach, I fear
>>>>>>> that it
>>>>>>> > > >> doesn't
>>>>>>> > > >> > work
>>>>>>> > > >> > >> for this particular use case: Sources can always be
>>>>>>> aligned
>>>>>>> > > >> cross-task
>>>>>>> > > >> > and
>>>>>>> > > >> > >> this is just about intra-task alignment. So it's
>>>>>>> plausible to put
>>>>>>> > > >> > sources
>>>>>>> > > >> > >> into an alignment group even though they do not use any
>>>>>>> of the
>>>>>>> > > >> presented
>>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
>>>>>>> they handle
>>>>>>> > > >> > multiple
>>>>>>> > > >> > >> splits (see motivation section).
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I think renaming alignSplits to facilitate future use
>>>>>>> cases makes
>>>>>>> > > >> sense
>>>>>>> > > >> > but
>>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be adjusted
>>>>>>> > > >> accordingly.
>>>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader and I'd
>>>>>>> go for
>>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could
>>>>>>> also
>>>>>>> > split
>>>>>>> > > it
>>>>>>> > > >> > into
>>>>>>> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits
>>>>>>> may allow
>>>>>>> > > >> Sources
>>>>>>> > > >> > to
>>>>>>> > > >> > >> just use 1 instead of 2 library calls (as written in the
>>>>>>> > Javadoc),
>>>>>>> > > >> both
>>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if there
>>>>>>> is a
>>>>>>> > system
>>>>>>> > > >> that
>>>>>>> > > >> > >> can.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Some nit for the FLIP:
>>>>>>> > > >> > >> - Please replace "stop" with "pause".
>>>>>>> > > >> > >> - Not sure if it's worth it in the capability section:
>>>>>>> Sources
>>>>>>> > that
>>>>>>> > > >> > adopt
>>>>>>> > > >> > >> this interface cannot be used in earlier versions. So it
>>>>>>> feels
>>>>>>> > like
>>>>>>> > > >> we
>>>>>>> > > >> > are
>>>>>>> > > >> > >> only forward compatible (old sources can be used after
>>>>>>> the
>>>>>>> > change);
>>>>>>> > > >> but
>>>>>>> > > >> > I
>>>>>>> > > >> > >> guess this holds for any API addition.
>>>>>>> > > >> > >> - You might want to add what happens when all splits are
>>>>>>> paused.
>>>>>>> > > >> > >> - You may want to describe how the 3 flavors of
>>>>>>> SourceReaderBase
>>>>>>> > > >> > interact
>>>>>>> > > >> > >> with the interface.
>>>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka and
>>>>>>> Pulsar in
>>>>>>> > the
>>>>>>> > > >> > FLIP.
>>>>>>> > > >> > >> For me, this is rather immediate follow-up work. (could
>>>>>>> be in the
>>>>>>> > > >> same
>>>>>>> > > >> > >> umbrella ticket)
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Arvid
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
>>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>>>> > > >> > >> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>>>>>> WithSplitsAlignment",
>>>>>>> > > >> along
>>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>>>>>>> WithSplitsAlignment",
>>>>>>> > > or
>>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>>>>>> > > "MySplitReader
>>>>>>> > > >> > >> implements AlignedSplitReader", or
>>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>>>>>> "MySplitReader
>>>>>>> > > >> > implements
>>>>>>> > > >> > >> SplitReader".
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>>>>>> "MySplitReader
>>>>>>> > > >> > implements
>>>>>>> > > >> > >> AlignedSplitReader".
>>>>>>> > > >> > >> I am fine with this API, although personally speaking I
>>>>>>> think it
>>>>>>> > is
>>>>>>> > > >> > simpler
>>>>>>> > > >> > >> to just add a new method to the split reader with
>>>>>>> default impl.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I think that is a good idea to have it aligned as much as
>>>>>>> > possible.
>>>>>>> > > >> I'd
>>>>>>> > > >> > be
>>>>>>> > > >> > >> +1 for your option c). We can merge AlignedSplitReader
>>>>>>> with
>>>>>>> > > >> > SplitReader. We
>>>>>>> > > >> > >> will update the FLIP shortly.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Dawid
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks for the comment, Jark.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 3. Interface/Method Name.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Can the interface be used to align other things in the
>>>>>>> future?
>>>>>>> > For
>>>>>>> > > >> > example,
>>>>>>> > > >> > >> align read speed, I have
>>>>>>> > > >> > >> seen users requesting global rate limits. This feature
>>>>>>> may also
>>>>>>> > > need
>>>>>>> > > >> an
>>>>>>> > > >> > >> interface like this.
>>>>>>> > > >> > >> If we don't plan to extend this interface to support
>>>>>>> align other
>>>>>>> > > >> > things, I
>>>>>>> > > >> > >> suggest explicitly declaring
>>>>>>> > > >> > >> the purpose of the methods, such as
>>>>>>> `alignWatermarksForSplits`
>>>>>>> > > >> instead
>>>>>>> > > >> > of
>>>>>>> > > >> > >> `alignSplits`.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> This is a good point. Naming wise, it would usually be
>>>>>>> more
>>>>>>> > > >> extensible
>>>>>>> > > >> > to
>>>>>>> > > >> > >> just describe what the method actually does, instead of
>>>>>>> assuming
>>>>>>> > > the
>>>>>>> > > >> > >> purpose of doing this. For example, in this case,
>>>>>>> > > >> pauseOrResumeSplits()
>>>>>>> > > >> > >> would be more extensible because this can be used for
>>>>>>> any kind of
>>>>>>> > > >> flow
>>>>>>> > > >> > >> control, be it watermark alignment or simple rate
>>>>>>> limiting.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 4. Interface or Method.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>>>>>> their
>>>>>>> > own
>>>>>>> > > >> > >> advantages.
>>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>>>>>> abilities
>>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
>>>>>>> case. When
>>>>>>> > you
>>>>>>> > > >> have
>>>>>>> > > >> > a
>>>>>>> > > >> > >> bunch of abilities and each ability
>>>>>>> > > >> > >> has more than one method, Interfaces can help to
>>>>>>> organize them
>>>>>>> > and
>>>>>>> > > >> make
>>>>>>> > > >> > >> users clear which methods
>>>>>>> > > >> > >> need to implement when you want to have an ability.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I am OK with decorative interfaces if this is a general
>>>>>>> design
>>>>>>> > > >> pattern
>>>>>>> > > >> > in
>>>>>>> > > >> > >> the other components in Flink. But it looks like the
>>>>>>> current API
>>>>>>> > > >> > proposal
>>>>>>> > > >> > >> is not symmetric.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> The current proposal is essentially "MySourceReader
>>>>>>> implements
>>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
>>>>>>> "MySplitReader
>>>>>>> > > >> implements
>>>>>>> > > >> > >> AlignedSplitsReader".
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Should we make the API symmetric? I'd consider any one
>>>>>>> of the
>>>>>>> > > >> following
>>>>>>> > > >> > as
>>>>>>> > > >> > >> symmetric.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>>>>>> WithSplitsAlignment",
>>>>>>> > > >> along
>>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>>>>>>> WithSplitsAlignment",
>>>>>>> > > or
>>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>>>>>> > > "MySplitReader
>>>>>>> > > >> > >> implements AlignedSplitReader", or
>>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>>>>>> "MySplitReader
>>>>>>> > > >> > implements
>>>>>>> > > >> > >> SplitReader".
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>>>>>> "MySplitReader
>>>>>>> > > >> > implements
>>>>>>> > > >> > >> AlignedSplitReader".
>>>>>>> > > >> > >> I am fine with this API, although personally speaking I
>>>>>>> think it
>>>>>>> > is
>>>>>>> > > >> > simpler
>>>>>>> > > >> > >> to just add a new method to the split reader with
>>>>>>> default impl.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
>>>>>>> > dwysakowicz@apache.org
>>>>>>> > > >
>>>>>>> > > >> <
>>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks
>>>>>>> for the
>>>>>>> > > reply.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Having said that, as I don't have a preference and I
>>>>>>> agree most
>>>>>>> > of
>>>>>>> > > >> the
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> sources will support the alignment I am fine following
>>>>>>> your
>>>>>>> > > >> suggestion
>>>>>>> > > >> > to
>>>>>>> > > >> > >> have the SourceReader extending from
>>>>>>> > WithWatermarksSplitsAlignment,
>>>>>>> > > >> but
>>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
>>>>>>> keep the
>>>>>>> > > two
>>>>>>> > > >> > >> methods together.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source is
>>>>>>> that this
>>>>>>> > > allows
>>>>>>> > > >> > some
>>>>>>> > > >> > >> compile time check. For example, if a user enabled
>>>>>>> watermark
>>>>>>> > > >> alignment
>>>>>>> > > >> > >> while it is not supported by the Source, an exception
>>>>>>> can be
>>>>>>> > thrown
>>>>>>> > > >> at
>>>>>>> > > >> > >> compile time. It seems in general useful. That said, I
>>>>>>> agree that
>>>>>>> > > API
>>>>>>> > > >> > >> cleanliness wise it is better to put the two methods
>>>>>>> together.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <
>>>>>>> imjark@gmail.com> <
>>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com>
>>>>>>> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did
>>>>>>> miss it.
>>>>>>> > > >> > >> I don't have other concerns then.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >> Jark
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
>>>>>>> > > >> dwysakowicz@apache.org>
>>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
>>>>>>> > > >> dwysakowicz@apache.org>
>>>>>>> > > >> > >> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Jark:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 1. Will the framework always align with watermarks when
>>>>>>> the
>>>>>>> > source
>>>>>>> > > >> > >> implements the interface?
>>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment even
>>>>>>> if Kafka
>>>>>>> > > >> > >> implements the interface,
>>>>>>> > > >> > >> and this will affect the throughput somehow. I agree
>>>>>>> with Becket
>>>>>>> > > >> > >> we may need a
>>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to configure
>>>>>>> the
>>>>>>> > source
>>>>>>> > > >> to
>>>>>>> > > >> > >> enable/disable the alignment.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
>>>>>>> greatly affect
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> throughput
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> if the reader is constantly
>>>>>>> > > >> > >>  switching between pause and resume. Can users configure
>>>>>>> the
>>>>>>> > > >> alignment
>>>>>>> > > >> > >> offset?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> This is covered in the previous FLIP[1] which has been
>>>>>>> already
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> implemented
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the watermark
>>>>>>> strategy
>>>>>>> > > >> which
>>>>>>> > > >> > >> also configures drift and update interval.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> If we don't plan to extend this interface to support
>>>>>>> align other
>>>>>>> > > >> things,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> suggest explicitly declaring
>>>>>>> > > >> > >> the purpose of the methods, such as
>>>>>>> `alignWatermarksForSplits`
>>>>>>> > > >> instead
>>>>>>> > > >> > of
>>>>>>> > > >> > >> `alignSplits`.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Sure let's rename it.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Becket:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I understand your point. On the other hand putting all
>>>>>>> methods,
>>>>>>> > > even
>>>>>>> > > >> > with
>>>>>>> > > >> > >> "supportsXXX" methods for enabling certain features,
>>>>>>> makes the
>>>>>>> > > entry
>>>>>>> > > >> > >> threshold for writing a new source higher. Instead of
>>>>>>> focusing on
>>>>>>> > > the
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> basic
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> and required properties of the Source, the person
>>>>>>> implementing a
>>>>>>> > > >> source
>>>>>>> > > >> > >> must bother with and need to figure out what all of the
>>>>>>> extra
>>>>>>> > > >> features
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> are
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> about and how to deal with them. It makes it also harder
>>>>>>> to
>>>>>>> > > organize
>>>>>>> > > >> > >> methods in coupled groups as Jark said.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Having said that, as I don't have a preference and I
>>>>>>> agree most
>>>>>>> > of
>>>>>>> > > >> the
>>>>>>> > > >> > >> sources will support the alignment I am fine following
>>>>>>> your
>>>>>>> > > >> suggestion
>>>>>>> > > >> > to
>>>>>>> > > >> > >> have the SourceReader extending from
>>>>>>> > WithWatermarksSplitsAlignment,
>>>>>>> > > >> but
>>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
>>>>>>> keep the
>>>>>>> > > two
>>>>>>> > > >> > >> methods together.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the
>>>>>>> "alignSplits"
>>>>>>> > methods
>>>>>>> > > >> > differ
>>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The reason for
>>>>>>> that is
>>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
>>>>>>> SplitReader
>>>>>>> > > needs
>>>>>>> > > >> the
>>>>>>> > > >> > >> actual splits to pause them. I found the discrepancy
>>>>>>> acceptable
>>>>>>> > for
>>>>>>> > > >> the
>>>>>>> > > >> > >> sake of simplifying changes significantly, especially as
>>>>>>> they
>>>>>>> > would
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> highly
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> likely impact performance as we would have to perform
>>>>>>> additional
>>>>>>> > > >> > lookups.
>>>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Dawid
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I just have some minor questions (maybe I missed
>>>>>>> something).
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 1. Will the framework always align with watermarks when
>>>>>>> the
>>>>>>> > source
>>>>>>> > > >> > >> implements the interface?
>>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment even
>>>>>>> if Kafka
>>>>>>> > > >> > >> implements the interface,
>>>>>>> > > >> > >> and this will affect the throughput somehow. I agree
>>>>>>> with Becket
>>>>>>> > > >> > >> we may need a
>>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to configure
>>>>>>> the
>>>>>>> > source
>>>>>>> > > >> to
>>>>>>> > > >> > >> enable/disable the alignment.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will
>>>>>>> greatly affect
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> throughput
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> if the reader is constantly
>>>>>>> > > >> > >>  switching between pause and resume. Can users configure
>>>>>>> the
>>>>>>> > > >> alignment
>>>>>>> > > >> > >> offset?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 3. Interface/Method Name.
>>>>>>> > > >> > >> Can the interface be used to align other things in the
>>>>>>> future?
>>>>>>> > For
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> example,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> align read speed, I have
>>>>>>> > > >> > >> seen users requesting global rate limits. This feature
>>>>>>> may also
>>>>>>> > > need
>>>>>>> > > >> an
>>>>>>> > > >> > >> interface like this.
>>>>>>> > > >> > >> If we don't plan to extend this interface to support
>>>>>>> align other
>>>>>>> > > >> things,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> suggest explicitly declaring
>>>>>>> > > >> > >> the purpose of the methods, such as
>>>>>>> `alignWatermarksForSplits`
>>>>>>> > > >> instead
>>>>>>> > > >> > of
>>>>>>> > > >> > >> `alignSplits`.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 4. Interface or Method.
>>>>>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>>>>>> their
>>>>>>> > own
>>>>>>> > > >> > >> advantages.
>>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>>>>>> abilities
>>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this
>>>>>>> case. When
>>>>>>> > you
>>>>>>> > > >> have
>>>>>>> > > >> > a
>>>>>>> > > >> > >> bunch of abilities and each ability
>>>>>>> > > >> > >> has more than one method, Interfaces can help to
>>>>>>> organize them
>>>>>>> > and
>>>>>>> > > >> make
>>>>>>> > > >> > >> users clear which methods
>>>>>>> > > >> > >> need to implement when you want to have an ability.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >> Jark
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>>>>>>> becket.qin@gmail.com>
>>>>>>> > <
>>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
>>>>>>> becket.qin@gmail.com>
>>>>>>> > <
>>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Hi Dawid,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks for the explanation. Apologies that I somehow
>>>>>>> misread a
>>>>>>> > > bunch
>>>>>>> > > >> of
>>>>>>> > > >> > >> "align" and thought they were "assign".
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Regarding 1, by default implementation, I was thinking
>>>>>>> of the
>>>>>>> > > default
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> no-op
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> implementation. I am a little worried about the
>>>>>>> proliferation of
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> decorative
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> interfaces. I think the most important thing about
>>>>>>> interfaces is
>>>>>>> > > that
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> they
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> are easy to understand. In this case, I prefer adding
>>>>>>> new method
>>>>>>> > to
>>>>>>> > > >> the
>>>>>>> > > >> > >> existing interface for the following reasons:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> a) I feel the biggest drawback of decorative interfaces
>>>>>>> is which
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> interface
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> they can decorate and which combinations of multiple
>>>>>>> decorative
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> interfaces
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> are valid. In the current FLIP, the withSplitsAlignment
>>>>>>> interface
>>>>>>> > > is
>>>>>>> > > >> > only
>>>>>>> > > >> > >> applicable to the SourceReader which means it can't
>>>>>>> decorate any
>>>>>>> > > >> other
>>>>>>> > > >> > >> interface. From an interface design perspective, a
>>>>>>> natural
>>>>>>> > question
>>>>>>> > > >> is
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> why
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> not let "AlignedSplitReader" extend
>>>>>>> "withSplitsAlignment"? And it
>>>>>>> > > is
>>>>>>> > > >> > also
>>>>>>> > > >> > >> natural to assume that a split reader implementing both
>>>>>>> > SplitReader
>>>>>>> > > >> and
>>>>>>> > > >> > >> WithSplitAlignment would work, because a source reader
>>>>>>> > implementing
>>>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why isn't
>>>>>>> there an
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> interface
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> of AlignedSourceReader? In the future, if there is a new
>>>>>>> feature
>>>>>>> > > >> added
>>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we
>>>>>>> going to
>>>>>>> > create
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> another
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> interface of SplitReader such as SortedSplitReader or
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> PrePartitionedAware?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Can they be combined? So I think the additional
>>>>>>> decorative
>>>>>>> > > interface
>>>>>>> > > >> > like
>>>>>>> > > >> > >> withSplitsAlignment actually increases the understanding
>>>>>>> cost of
>>>>>>> > > >> users
>>>>>>> > > >> > >> because they have to know what decorative interfaces are
>>>>>>> there,
>>>>>>> > > which
>>>>>>> > > >> > >> interface they can decorate and which combinations of the
>>>>>>> > > decorative
>>>>>>> > > >> > >> interfaces are valid and which are not. Ideally we want
>>>>>>> to avoid
>>>>>>> > > >> that.
>>>>>>> > > >> > To
>>>>>>> > > >> > >> be clear, I am not opposing having an interface of
>>>>>>> > > >> withSplitsAlignment,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> it
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> is completely OK to have it as an internal interface and
>>>>>>> let
>>>>>>> > > >> > SourceReader
>>>>>>> > > >> > >> and SplitReader both extend it.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> b) Adding a new method to the SourceReader with a default
>>>>>>> > > >> implementation
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> of
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> no-op would help avoid logic branching in the source
>>>>>>> logic,
>>>>>>> > > >> especially
>>>>>>> > > >> > >> given that we agree that the vast majority of the
>>>>>>> SourceReader
>>>>>>> > > >> > >> implementations, if not all, would just extend from the
>>>>>>> > > >> > SourceReaderBase.
>>>>>>> > > >> > >> That means adding a new method to the interface would
>>>>>>> effectively
>>>>>>> > > >> give
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> the
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> same user experience, but simpler.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> c) A related design principle that may be worth
>>>>>>> discussing is how
>>>>>>> > > do
>>>>>>> > > >> we
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> let
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> the Source implementations tell Flink what capability is
>>>>>>> > supported
>>>>>>> > > >> and
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> what
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> is not. Personally speaking I feel the most intuitive
>>>>>>> place to me
>>>>>>> > > is
>>>>>>> > > >> in
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> the
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Source itself, because that is the entrance of the
>>>>>>> entire Source
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> connector
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> logic.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
>>>>>>> following
>>>>>>> > > >> interface
>>>>>>> > > >> > >> would be easier to understand by the users.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal interface,
>>>>>>> let both
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> SourceReader
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> and SplitReader extend from it, with a default no-op
>>>>>>> > > implementation.
>>>>>>> > > >> > >> - Add a new method "boolean supportSplitsAlignment()" to
>>>>>>> the
>>>>>>> > Source
>>>>>>> > > >> > >> interface, with a default implementation returning
>>>>>>> false. Sources
>>>>>>> > > >> that
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> have
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> implemented the alignment logic can change this to
>>>>>>> return true,
>>>>>>> > and
>>>>>>> > > >> > >> override the alignSplits() methods in the SourceReader /
>>>>>>> > > SplitReader
>>>>>>> > > >> if
>>>>>>> > > >> > >> needed.
>>>>>>> > > >> > >> - In the future, if a new optional feature is going to
>>>>>>> be added
>>>>>>> > to
>>>>>>> > > >> the
>>>>>>> > > >> > >> Source, and that feature requires the awareness from
>>>>>>> Flink, we
>>>>>>> > can
>>>>>>> > > >> add
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> more
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> such methods to the Source.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> What do you think?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
>>>>>>> > > >> dwysakowicz@apache.org>
>>>>>>> > > >> > <dw...@apache.org>
>>>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
>>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Konstantin:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>>>>>> (aka the
>>>>>>> > > >> stop &
>>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
>>>>>>> Pulsar only,
>>>>>>> > > >> correct?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Correct, as far as I know though, those are the only
>>>>>>> sources
>>>>>>> > which
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> consume
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> concurrently from multiple splits and thus alignment
>>>>>>> applies.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Thomas:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>>>>>>> SourceReaderBase and
>>>>>>> > > then
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> doing
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> nothing if the split reader does not implement
>>>>>>> AlignedSplitReader
>>>>>>> > > >> could
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> be
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
>>>>>>> added to
>>>>>>> > the
>>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make
>>>>>>> it
>>>>>>> > explicit
>>>>>>> > > >> that
>>>>>>> > > >> > >> the source actually supports it.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I understand your concern. Hmm, I think we could
>>>>>>> actually do
>>>>>>> > that.
>>>>>>> > > >> Given
>>>>>>> > > >> > >> the actual implementation of the
>>>>>>> SourceReaderBase#alignSplits is
>>>>>>> > > >> rather
>>>>>>> > > >> > >> short (just a forward to the corresponding method of
>>>>>>> > SplitFetcher),
>>>>>>> > > >> we
>>>>>>> > > >> > >> could reimplement it in the actual source
>>>>>>> implementations. This
>>>>>>> > > >> solution
>>>>>>> > > >> > >> has the downside though. Authors of new sources would
>>>>>>> have to do
>>>>>>> > > two
>>>>>>> > > >> > >> things: extend from AlignedSplitReader and implement
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> WithSplitsAssignment,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> instead of just extending AlignedSplitReader. I would be
>>>>>>> fine
>>>>>>> > with
>>>>>>> > > >> such
>>>>>>> > > >> > a
>>>>>>> > > >> > >> tradeoff though. What others think?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Steven:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> For this part from the motivation section, is it
>>>>>>> accurate? Let's
>>>>>>> > > >> assume
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> one
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
>>>>>>> partition
>>>>>>> > is
>>>>>>> > > >> > >> significantly slower. In this situation, watermark for
>>>>>>> this
>>>>>>> > source
>>>>>>> > > >> task
>>>>>>> > > >> > >> won't hold back as it is reading recent data from other
>>>>>>> two Kafka
>>>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>>>>>> > watermark.
>>>>>>> > > I
>>>>>>> > > >> > >> thought the problem is that we may have late data for
>>>>>>> this slow
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> partition.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> It will hold back the watermark. Watermark of an
>>>>>>> operator is the
>>>>>>> > > >> minimum
>>>>>>> > > >> > >> of watermarks of all splits[1]
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I have another question about the restart. Say split
>>>>>>> alignment is
>>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>>>>>>> restored from
>>>>>>> > > the
>>>>>>> > > >> > last
>>>>>>> > > >> > >> checkpoint. because alignment decision is not
>>>>>>> checkpointed,
>>>>>>> > > initially
>>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
>>>>>>> watermark
>>>>>>> > > >> > aggregation
>>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
>>>>>>> problem. Just
>>>>>>> > > >> want
>>>>>>> > > >> > to
>>>>>>> > > >> > >> understand it more.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Your understanding is correct.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> @Becket:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>>>>>> case, so
>>>>>>> > > >> should
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> we
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> just add the related methods to SourceReader directly
>>>>>>> instead of
>>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment? We
>>>>>>> can
>>>>>>> > > provide
>>>>>>> > > >> > >> default implementations, so backwards compatibility
>>>>>>> won't be an
>>>>>>> > > >> issue.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I don't think we can provide a default implementation.
>>>>>>> How would
>>>>>>> > we
>>>>>>> > > >> do
>>>>>>> > > >> > >> that? Would it be just a no-op? Is it better than having
>>>>>>> an
>>>>>>> > opt-in
>>>>>>> > > >> > >> interface? The default implementation would have to be
>>>>>>> added
>>>>>>> > > >> exclusively
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> in
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> a *Public* SourceReader interface. By the way notice
>>>>>>> > > SourceReaderBase
>>>>>>> > > >> > >> does extend from WithSplitsAlignment, so effectively all
>>>>>>> > > >> implementations
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> do
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> handle the alignment case. To be honest I think it is
>>>>>>> impossible
>>>>>>> > to
>>>>>>> > > >> > >> implement the SourceReader interface directly by end
>>>>>>> users.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
>>>>>>> also
>>>>>>> > needs
>>>>>>> > > >> some
>>>>>>> > > >> > >> change to support throttling at the split granularity.
>>>>>>> Can you
>>>>>>> > add
>>>>>>> > > >> that
>>>>>>> > > >> > >> interface change into the public interface section as
>>>>>>> well?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> It has been added from the beginning. See
>>>>>>> *AlignedSplitReader.*
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
>>>>>>> here,
>>>>>>> > given
>>>>>>> > > >> that
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> it
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> is not actually changing the split assignments? It seems
>>>>>>> > something
>>>>>>> > > >> like
>>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is
>>>>>>> more
>>>>>>> > > accurate.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do you
>>>>>>> still
>>>>>>> > prefer
>>>>>>> > > a
>>>>>>> > > >> > >> different name for that? Personally, I am open for
>>>>>>> suggestions
>>>>>>> > > here.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Dawid
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> [1]
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> >
>>>>>>> > > >>
>>>>>>> > >
>>>>>>> >
>>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think the
>>>>>>> motivation
>>>>>>> > > >> makes a
>>>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>>>>>> case, so
>>>>>>> > > >> should
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> we
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> just add the related methods to SourceReader directly
>>>>>>> instead of
>>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment? We
>>>>>>> can
>>>>>>> > > provide
>>>>>>> > > >> > >> default implementations, so backwards compatibility
>>>>>>> won't be an
>>>>>>> > > >> issue.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
>>>>>>> also
>>>>>>> > needs
>>>>>>> > > >> some
>>>>>>> > > >> > >> change to support throttling at the split granularity.
>>>>>>> Can you
>>>>>>> > add
>>>>>>> > > >> that
>>>>>>> > > >> > >> interface change into the public interface section as
>>>>>>> well?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
>>>>>>> here,
>>>>>>> > given
>>>>>>> > > >> that
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> it
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> is not actually changing the split assignments? It seems
>>>>>>> > something
>>>>>>> > > >> like
>>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is
>>>>>>> more
>>>>>>> > > accurate.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
>>>>>>> stevenz3wu@gmail.com
>>>>>>> > >
>>>>>>> > > <
>>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
>>>>>>> stevenz3wu@gmail.com>
>>>>>>> > <
>>>>>>> > > >> > >> stevenz3wu@gmail.com> <
>>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> However, a single source operator may read data from
>>>>>>> multiple
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions, such
>>>>>>> that
>>>>>>> > even
>>>>>>> > > >> with
>>>>>>> > > >> > >> watermark alignment the source operator may need to
>>>>>>> buffer
>>>>>>> > > excessive
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> amount
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> of data if one split emits data faster than another.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> For this part from the motivation section, is it
>>>>>>> accurate? Let's
>>>>>>> > > >> assume
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> one
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
>>>>>>> partition
>>>>>>> > is
>>>>>>> > > >> > >> significantly slower. In this situation, watermark for
>>>>>>> this
>>>>>>> > source
>>>>>>> > > >> task
>>>>>>> > > >> > >> won't hold back as it is reading recent data from other
>>>>>>> two Kafka
>>>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>>>>>> > watermark.
>>>>>>> > > I
>>>>>>> > > >> > >> thought the problem is that we may have late data for
>>>>>>> this slow
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> partition.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I have another question about the restart. Say split
>>>>>>> alignment is
>>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>>>>>>> restored from
>>>>>>> > > the
>>>>>>> > > >> > last
>>>>>>> > > >> > >> checkpoint. because alignment decision is not
>>>>>>> checkpointed,
>>>>>>> > > initially
>>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
>>>>>>> watermark
>>>>>>> > > >> > aggregation
>>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
>>>>>>> problem. Just
>>>>>>> > > >> want
>>>>>>> > > >> > to
>>>>>>> > > >> > >> understand it more.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
>>>>>>> thw@apache.org> <
>>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>>>>>>> > > >> > >> thw@apache.org> <
>>>>>>> > > >> > >> thw@apache.org> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks for working on this!
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>>>>>>> SourceReaderBase and
>>>>>>> > > then
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> doing
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> nothing if the split reader does not implement
>>>>>>> AlignedSplitReader
>>>>>>> > > >> could
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> be
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
>>>>>>> added to
>>>>>>> > the
>>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make
>>>>>>> it
>>>>>>> > explicit
>>>>>>> > > >> that
>>>>>>> > > >> > >> the source actually supports it.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Thanks,
>>>>>>> > > >> > >> Thomas
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
>>>>>>> > > knaufk@apache.org>
>>>>>>> > > >> <
>>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org>
>>>>>>> <
>>>>>>> > > >> > >> knaufk@apache.org> <
>>>>>>> > > >> > >> knaufk@apache.org>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>>>>>> (aka the
>>>>>>> > > >> stop
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> &
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and
>>>>>>> Pulsar only,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> correct?
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> +1 in general. I believe it is valuable to complete the
>>>>>>> watermark
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> aligned
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> story with this FLIP.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Cheers,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Konstantin
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
>>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> To be explicit, having worked on it, I support it ;) I
>>>>>>> think we
>>>>>>> > can
>>>>>>> > > >> > >> start a vote thread soonish, as there are no concerns so
>>>>>>> far.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Dawid
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Dear Flink developers,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1] for an
>>>>>>> > extension
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> of
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Watermark Alignment to perform alignment also in
>>>>>>> SplitReaders. To
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> do
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> so,
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> SplitReaders must be able to suspend and resume reading
>>>>>>> from
>>>>>>> > split
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> sources
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> where the SourceOperator coordinates and controlls
>>>>>>> suspend and
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> resume.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> To
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> gather information about current watermarks of the
>>>>>>> SplitReaders,
>>>>>>> > we
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> extend
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
>>>>>>> watermarks to
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> the
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> SourceOperator.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid
>>>>>>> Heise and
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> revised
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> and
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the work.)
>>>>>>> and me.
>>>>>>> > The
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> changes
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> are backwards compatible in a way that if affected
>>>>>>> components do
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> not
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> support split alignment the behavior is as before.
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Best,
>>>>>>> > > >> > >> Sebastian
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> [1]
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> >
>>>>>>> > > >>
>>>>>>> > >
>>>>>>> >
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> [2]
>>>>>>> https://github.com/dawidwys/flink/tree/aligned-splits
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> --
>>>>>>> > > >> > >>
>>>>>>> > > >> > >> Konstantin Knaufhttps://
>>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>>>>>>> > > >> > >>
>>>>>>> > > >> > >>
>>>>>>> > > >> >
>>>>>>> > > >>
>>>>>>> > > >
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Sebastian Mattheis <se...@ververica.com>.
Hi Becket,

Thanks a lot for your fast and detailed response. For me, it converges and
dropping the supportsX method sounds very reasonable to me. (Side note:
With "pausable splits" enabled as "default" I think we misunderstood. As
you described now "default" I understand as that it should be the new
recommended way of implementation, and I think that is fully valid. Before,
I understood "default" here as the default implementation, i.e., throwing
UnsupportedOperationException, which is the exact opposite. :) )

Nevertheless: As mentioned, an open question for me is if watermark
alignment should enforce pausable splits. For clarification, the current
documentation [1] says:

*Note:* As of 1.15, Flink supports aligning across tasks of the same source
> and/or different sources. It does not support aligning
> splits/partitions/shards in the same task.
>
> In a case where there are e.g. two Kafka partitions that produce
> watermarks at different pace, that get assigned to the same task watermark
> might not behave as expected. Fortunately, worst case it should not perform
> worse than without alignment.
>
> Given the limitation above, we suggest applying watermark alignment in two
> situations:
>
>    1. You have two different sources (e.g. Kafka and File) that produce
>    watermarks at different speeds
>    2. You run your source with parallelism equal to the number of
>    splits/shards/partitions, which results in every subtask being assigned a
>    single unit of work.
>
> I personally see no issue in implementing and I see no reason against
implementing this dependency of watermark alignment and pausable splits. (I
think this would even be a good path towards shaping watermark alignment in
1.16.) However, "I don't see" means that I would be happy to hear Dawid's
and Piotrek's opinions as they implemented watermark alignment based on
FLIP-182 [2] and I don't want to miss relevant rationale/background info
from their side.

*@Piotrek* *@Dawid *What do you think?

Regards,
Sebastian

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode

On Wed, May 11, 2022 at 1:30 PM Becket Qin <be...@gmail.com> wrote:

> +dev
>
> Hi Sebastian,
>
> Thank you for the summary. Please see the detailed replies inline. As a
> recap of my suggestions.
>
> 1. Pausable splits API.
>   a) Add default implementations to methods "pauseOrResumeSplits" in both
> SourceReader and SplitReader where both default implementations throw
>  UnsupportedOperationException.
>
> 2. User story.
>     a) We tell users to enable the watermark alignment as they like. This
> is exactly what the current Flink API is.
>     b) We tell the source developers, please implement pausable splits,
> otherwise bad things may happen. Think of it like you are expected to
> implement SourceReader#snapshotState() properly, otherwise exceptions will
> be thrown when users enable checkpointing.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> sebastian@ververica.com> wrote:
>
>> Hi Becket, Hi everybody,
>>
>> I'm sorry if I misread the messages but I could not derive an agreement
>> from the mailing list. Nevertheless, if I understand you right the
>> suggestion is:
>>
>> * Add default implementations to methods "pauseOrResumeSplits" in both
>> SourceReader and SplitReader where both default implementations throw
>> UnsupportedOperationException.
>>
> Yes.
>
> * Add "supportsPauseOrResumeSplits" to the Source interface. (In the
>> following, I refer to supporting this as "pausable splits".)
>>
> We may no longer need this if pausable splits are expected to be
> implemented by the source developers, i.e. non-optional. Having this method
> would then be somewhat misleading as it looks like the sources that do not
> support pausable splits are also acceptable in the long term. So API wise,
> I'd say maybe we should remove this for this FLIP, although I believe this
> supportXXX pattern itself is still attractive for optional features.
>
>
>>
>> To make the conclusions explicit:
>>
>> 1. The implementation of pauseOrResumeSplits in both interfaces
>> SourceReader and SplitReader are optional where the default is that it
>> doesn't support it. (--> This means that the implementation is still
>> optional for the source developer.)
>>
> It is optional for backwards compatibility with existing sources, as they
> may still compile without code change. But starting from this FLIP, Flink
> will always optimistically assume that all the sources support pausable
> splits. If a source does not support pausable splits, it goes to an error
> handling path when watermark alignment is enabled on it. This is different
> from a usual optional feature, where no error is expected.
>
>
>> 2. If watermark alignment is enabled in the application code by adding
>> withWatermarkAlignment to the WatermarkStrategy while SourceReader or
>> SplitReader do not support pausableSplits, we throw an
>> UnsupportedOperationException.
>>
> Yes.
>
>
>> 3. With regard to your statement:
>>
>>> [...] basically means watermark alignment is an non-optional feature to
>>> the end users.
>>
>> You actually mean that "pausable splits" are non-optional for the app
>> developer if watermark alignment is enabled. However, watermark alignment
>> is optional and can be enabled/disabled.
>>
> Yes, watermark alignment can be enabled/disabled in individual sources in
> Flink jobs, which basically means the code supporting watermark alignment
> has to already be there. That again means the Source developers are also
> expected to support pausable splits by default. So this way we essentially
> tell the end users that you may enable / disable this feature as you wish,
> and tell the source developers that you SHOULD implement this because the
> end users may turn it on/off at will. And if the source does not support
> pausable splits, that goes to an error handling path when watermark
> alignment is enabled on it. So users know they have to explicitly exclude
> this source.
>
>
>>
>> So far it's totally clear to me and I hope this is what you mean. I also
>> agree with both statements:
>>
>> So making that expectation aligned with the source developers seems
>>> reasonable.
>>>
>>
>> I think this is a simple and clean solution from both the end user and
>>> source developers' standpoint.
>>>
>>
>> However, a last conclusion derives from 3. and is an open question for me:
>>
>> 4. The feature of "pausable splits" is now tightly bound to watermark
>> alignment, i.e., if sources do not support "pausable splits" one can not
>> enable watermark alignment for these sources. This dependency is not the
>> current status of watermark alignment implementation because it is/was
>> implemented without pausable splits. Do we want to introduce this
>> dependency? (This is an open question. I cannot judge that.)
>>
> The watermark alignment basically relies on the pausable splits, right? So
> personally I found it quite reasonable that if the source does not support
> pausable splits, end users cannot enable watermark alignment on it.
>
>
>> If something is wrong, please correct me.
>>
>> Regards,
>> Sebastian
>>
>> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com> wrote:
>>
>>> Hi Sebastian,
>>>
>>> Thanks for the reply and patient discussion. I agree this is a tricky
>>> decision.
>>>
>>>
>>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
>>>> follows:
>>>> (1) An interface with default NOOP implementation makes the
>>>> implementation optional. And in my opinion, a default implementation is and
>>>> will remain a way of making implementation optional because even in future
>>>> a developer can decide to implement the "old flavor" without support for
>>>> pausable splits.
>>>> (2) It may not be too critical but I also find it suboptimal that with
>>>> a NOOP default implementation there is no way to check at runtime if
>>>> SourceReader or SplitReader actually support pausing. (To do so, one would
>>>> need a supportsX method which makes it again more complicated.)\
>>>
>>>
>>> Based on the last few messages in the mailing list.  Piotr and I agreed
>>> that the default implementation should just throw an
>>> UnsupportedOperationException if the source is unpausable. So this
>>> basically tells the Source developers that this feature is expected to be
>>> supported. Because we cannot prevent end users from putting an unpausable
>>> source into the watermark alignment group, that basically means watermark
>>> alignment is an non-optional feature to the end users. So making that
>>> expectation aligned with the source developers seems reasonable.  And if a
>>> source does not support this feature, the end users should explicitly
>>> remove that source from the watermark alignment group.
>>>
>>> Personally speaking I think this is a simple and clean solution from
>>> both the end user and source developers' standpoint.
>>>
>>> Does this address your concerns?
>>>
>>> Thanks,
>>>
>>> Jiangjie (Becket) Qin
>>>
>>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
>>> sebastian@ververica.com> wrote:
>>>
>>>> Hi Piotr, Hi Becket, Hi everybody,
>>>>
>>>> we, Dawid and I, discussed the various suggestions/options and we would
>>>> be okay either way because we find neither solution is perfect just because
>>>> of the already present complexity.
>>>>
>>>> Option c) Adding methods to the interfaces of SourceReader and
>>>> SplitReader
>>>> Option a) Adding decorative interfaces to be used by SourceReader and
>>>> SplitReader
>>>>
>>>> As of the current status (v. 12) of the FLIP [1], it is based on Option
>>>> c) which we find acceptable because the complexity added is only a single
>>>> method.
>>>>
>>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
>>>> follows:
>>>> (1) An interface with default NOOP implementation makes the
>>>> implementation optional. And in my opinion, a default implementation is and
>>>> will remain a way of making implementation optional because even in future
>>>> a developer can decide to implement the "old flavor" without support for
>>>> pausable splits.
>>>> (2) It may not be too critical but I also find it suboptimal that with
>>>> a NOOP default implementation there is no way to check at runtime if
>>>> SourceReader or SplitReader actually support pausing. (To do so, one would
>>>> need a supportsX method which makes it again more complicated.)
>>>>
>>>> However, we haven't changed it because Option a) is also not optimal or
>>>> straight-forward:
>>>> (1) We need to add two distinct yet similar decorative interfaces
>>>> since, as mentioned, the signatures of the methods are different. For
>>>> example, we would need decorative interfaces like
>>>> `SplitReaderWithPausableSplits` and `SourceReaderWithPausableSplits`.
>>>> (2) As a consequence, we would need to somehow document how/where to
>>>> implement both interfaces and how this relates to each other. This we could
>>>> solve by adding a note in the interface of SourceReader and SplitReader and
>>>> reference to the decorative interfaces but it still increases complexity
>>>> too.
>>>>
>>>> In summary, we see both as acceptable and preferred over other options.
>>>> The question is if we can find a solution or compromise that is acceptable
>>>> for everybody to reach consensus.
>>>>
>>>> Please let us know what you think because we would be happy if we can
>>>> conclude the discussion to avoid dropping the initiative on this FLIP.
>>>>
>>>> Regards,
>>>> Sebastian
>>>>
>>>> [1]
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
>>>> (v. 12)
>>>>
>>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <pn...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Guowei,
>>>>>
>>>>> as Dawid wrote a couple of messages back:
>>>>>
>>>>> > This is covered in the previous FLIP[1] which has been already
>>>>> implemented in 1.15. In short, it must be enabled with the watermark
>>>>> strategy which also configures drift and update interval
>>>>>
>>>>> So by default watermark alignment is disabled, regardless if a source
>>>>> supports it or not.
>>>>>
>>>>> Best,
>>>>> Piotrek
>>>>>
>>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com> napisał(a):
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We know that in the case of Bounded input Flink supports the Batch
>>>>>> execution mode. Currently in Batch execution mode, flink is executed
>>>>>> on a
>>>>>> stage-by-stage basis. In this way, perhaps watermark alignment might
>>>>>> not
>>>>>> gain much.
>>>>>>
>>>>>> So my question is: Is watermark alignment the default behavior(for
>>>>>> implemented source only)? If so, have you considered evaluating the
>>>>>> impact
>>>>>> of this behavior on the Batch execution mode? Or thinks it is not
>>>>>> necessary.
>>>>>>
>>>>>> Correct me if I miss something.
>>>>>>
>>>>>> Best,
>>>>>> Guowei
>>>>>>
>>>>>>
>>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
>>>>>> piotr.nowojski@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > Hi Becket and Dawid,
>>>>>> >
>>>>>> > > I feel that no matter which option we choose this can not be
>>>>>> solved
>>>>>> > entirely in either of the options, because of the point above and
>>>>>> because
>>>>>> > the signature of SplitReader#pauseOrResumeSplits and
>>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
>>>>>> identifies
>>>>>> > splits with splitId the other one passes the splits directly).
>>>>>> >
>>>>>> > Yes, that's a good point in this case and for features that need to
>>>>>> be
>>>>>> > implemented in more than one place.
>>>>>> >
>>>>>> > > Is there any reason for pausing reading from a split an optional
>>>>>> feature,
>>>>>> > > other than that this was not included in the original interface?
>>>>>> >
>>>>>> > An additional argument in favor of making it optional is to
>>>>>> simplify source
>>>>>> > implementation. But on its own I'm not sure if that would be enough
>>>>>> to
>>>>>> > justify making this feature optional. Maybe.
>>>>>> >
>>>>>> > > I think it would be way simpler and clearer to just let end users
>>>>>> and
>>>>>> > Flink
>>>>>> > > assume all the connectors will implement this feature.
>>>>>> >
>>>>>> > As I wrote above that would be an interesting choice to make (ease
>>>>>> of
>>>>>> > implementation for new users, vs system consistency). Regardless of
>>>>>> that,
>>>>>> > yes, for me the main argument is the API backward compatibility.
>>>>>> But let's
>>>>>> > clear a couple of points:
>>>>>> > - The current proposal adding methods to the base interface with
>>>>>> default
>>>>>> > implementations is an OPTIONAL feature. Same as the decorative
>>>>>> version
>>>>>> > would be.
>>>>>> > - Decorative version could implement "throw
>>>>>> UnsupportedOperationException"
>>>>>> > if user enabled watermark alignment just as well and I agree that's
>>>>>> a
>>>>>> > better option compared to logging a warning.
>>>>>> >
>>>>>> > Best,
>>>>>> > Piotrek
>>>>>> >
>>>>>> >
>>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
>>>>>> napisał(a):
>>>>>> >
>>>>>> > > Thanks for the reply and patient discussion, Piotr and Dawid.
>>>>>> > >
>>>>>> > > Is there any reason for pausing reading from a split an optional
>>>>>> feature,
>>>>>> > > other than that this was not included in the original interface?
>>>>>> > >
>>>>>> > > To be honest I am really worried about the complexity of the user
>>>>>> story
>>>>>> > > here. Optional features like this have a high overhead. Imagine
>>>>>> this
>>>>>> > > feature is optional, now a user enabled watermark alignment and
>>>>>> defined a
>>>>>> > > few watermark groups. Would it work? Hmm, that depends on whether
>>>>>> the
>>>>>> > > involved Source has implmemented this feature. If the Sources are
>>>>>> well
>>>>>> > > documented, good luck. Otherwise end users may have to look into
>>>>>> the code
>>>>>> > > of the Source to see whether the feature is supported. Which is
>>>>>> something
>>>>>> > > they shouldn't have to do.
>>>>>> > >
>>>>>> > > I think it would be way simpler and clearer to just let end users
>>>>>> and
>>>>>> > Flink
>>>>>> > > assume all the connectors will implement this feature. After all
>>>>>> the
>>>>>> > > watermark group is not optinoal to the end users. If in some rare
>>>>>> cases,
>>>>>> > > the feature cannot be supported, a clear
>>>>>> UnsupportedOperationException
>>>>>> > will
>>>>>> > > be thrown to tell users to explicitly remove this Source from the
>>>>>> > watermark
>>>>>> > > group. I don't think we should have a warning message here, as
>>>>>> they tend
>>>>>> > to
>>>>>> > > be ignored in many cases. If we do this, we don't even need the
>>>>>> > supportXXX
>>>>>> > > method in the Source for this feature. In fact this is exactly
>>>>>> how many
>>>>>> > > interfaces works today. For example,
>>>>>> SplitEnumerator#addSplitsBack() is
>>>>>> > not
>>>>>> > > supported by Pravega source because it does not support partial
>>>>>> failover.
>>>>>> > > In that case, it simply throws an exception to trigger a global
>>>>>> recovery.
>>>>>> > >
>>>>>> > > The reason we add a default implementation in this case would
>>>>>> just for
>>>>>> > the
>>>>>> > > sake of backwards compatibility so the old source can still
>>>>>> compile.
>>>>>> > Sure,
>>>>>> > > in short term, this feature might not be supported by many
>>>>>> existing
>>>>>> > > sources. That is OK, and it is quite visible to the source
>>>>>> developers
>>>>>> > that
>>>>>> > > they did not override the default impl which throws an
>>>>>> > > UnsupportedOperationException.
>>>>>> > >
>>>>>> > > @Dawid,
>>>>>> > >
>>>>>> > > the Java doc of the SupportXXX() method in the Source would be
>>>>>> the single
>>>>>> > > >> source of truth regarding how to implement this feature.
>>>>>> > > >
>>>>>> > > >
>>>>>> > >
>>>>>> > > I also don't find it entirely true. Half of the classes are
>>>>>> theoretically
>>>>>> > > > optional and are utility classes from the point of view how the
>>>>>> > > interfaces
>>>>>> > > > are organized. Theoretically users do not need to use any of
>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>>>>> methods in
>>>>>> > > the
>>>>>> > > > Source interface.
>>>>>> > >
>>>>>> > > I think the ultimate goal of java docs is to guide users to
>>>>>> implement the
>>>>>> > > Source. If SourceReaderBase is the preferred way to implement a
>>>>>> > > SourceReader, it seems worth mentioning that. Even the Java
>>>>>> language
>>>>>> > > documentation interfaces lists the konwn implementations [1] so
>>>>>> people
>>>>>> > can
>>>>>> > > leverage them. But for this particular case, if we make the
>>>>>> feature
>>>>>> > > non-optional, we don't even need the supportXXX() method for now.
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > >
>>>>>> > > Jiangjie (Becket) Qin
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>>>>>> dwysakowicz@apache.org>
>>>>>> > > wrote:
>>>>>> > >
>>>>>> > > > Hey Piotr and Becket,
>>>>>> > > >
>>>>>> > > > First of all, let me say I am happy with whichever option is
>>>>>> agreed in
>>>>>> > > the
>>>>>> > > > discussion.
>>>>>> > > >
>>>>>> > > > I wanted to clarify a few points from the discussion though:
>>>>>> > > >
>>>>>> > > > @Becket:
>>>>>> > > >
>>>>>> > > > The main argument for adding the methods to the SourceReader is
>>>>>> that
>>>>>> > > these
>>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader impl, i.e.
>>>>>> > starting
>>>>>> > > > from this FLIP, all the SourceReaders impl are expected to
>>>>>> support this
>>>>>> > > > method, although some old implementations may not have
>>>>>> implemented this
>>>>>> > > > feature. I think we should distinguish the new features from the
>>>>>> > optional
>>>>>> > > > features. While the public decorative interface is a solution
>>>>>> to the
>>>>>> > > > optional features. We should not use it for the features that
>>>>>> are
>>>>>> > > > non-optional.
>>>>>> > > >
>>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even though
>>>>>> > preferred, I
>>>>>> > > > still think it can be simply optional.
>>>>>> > > >
>>>>>> > > > the Java doc of the SupportXXX() method in the Source would be
>>>>>> the
>>>>>> > single
>>>>>> > > > source of truth regarding how to implement this feature.
>>>>>> > > >
>>>>>> > > > I also don't find it entirely true. Half of the classes are
>>>>>> > theoretically
>>>>>> > > > optional and are utility classes from the point of view how the
>>>>>> > > interfaces
>>>>>> > > > are organized. Theoretically users do not need to use any of
>>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>>>>> methods in
>>>>>> > > the
>>>>>> > > > Source interface.
>>>>>> > > >
>>>>>> > > > @Piotr
>>>>>> > > >
>>>>>> > > > If we have all of the methods with default implementation in
>>>>>> the base
>>>>>> > > > interface, the API doesn't give any clue to the user which set
>>>>>> of
>>>>>> > methods
>>>>>> > > > are required to be implemented at the same time.
>>>>>> > > >
>>>>>> > > > I feel that no matter which option we choose this can not be
>>>>>> solved
>>>>>> > > > entirely in either of the options, because of the point above
>>>>>> and
>>>>>> > because
>>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
>>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different (one
>>>>>> identifies
>>>>>> > > > splits with splitId the other one passes the splits directly).
>>>>>> > > >
>>>>>> > > > Best,
>>>>>> > > >
>>>>>> > > > Dawid
>>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>>>>>> > > >
>>>>>> > > > Hi Piotr,
>>>>>> > > >
>>>>>> > > > Thanks for the comment.
>>>>>> > > >
>>>>>> > > > Just to clarify, I am not against the decorative interfaces,
>>>>>> but I do
>>>>>> > > > think we should use them with caution. The main argument for
>>>>>> adding the
>>>>>> > > > methods to the SourceReader is that these methods are
>>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting
>>>>>> from this
>>>>>> > > > FLIP, all the SourceReaders impl are expected to support this
>>>>>> > > > method, although some old implementations may not have
>>>>>> implemented this
>>>>>> > > > feature. I think we should distinguish the new features from the
>>>>>> > optional
>>>>>> > > > features. While the public decorative interface is a solution
>>>>>> to the
>>>>>> > > > optional features. We should not use it for the features that
>>>>>> are
>>>>>> > > > non-optional.
>>>>>> > > >
>>>>>> > > > That said, this feature is optional for SplitReaders. Arguably
>>>>>> we can
>>>>>> > > have
>>>>>> > > > a decorative interface for that, but for simplicity and
>>>>>> symmetry of the
>>>>>> > > > interface, personally I prefer just adding a new method.
>>>>>> > > >
>>>>>> > > > Regarding the advantages you mentioned about the decorative
>>>>>> interfaces,
>>>>>> > > > they would make sense if:
>>>>>> > > > 1. The feature is optional.
>>>>>> > > > 2. There is only one decorative interface involved for a
>>>>>> feature.
>>>>>> > > > Otherwise the argument that all the methods are grouped
>>>>>> together will
>>>>>> > not
>>>>>> > > > stand.
>>>>>> > > >
>>>>>> > > > Compared with that, I think the current solution works fine in
>>>>>> all
>>>>>> > cases,
>>>>>> > > > i.e. "having supportXXX() method in Source, and default methods
>>>>>> /
>>>>>> > > > decorative interfaces in base interfaces.".
>>>>>> > > >
>>>>>> > > > The advantages are:
>>>>>> > > >> - clean and easy to implement base interface
>>>>>> > > >
>>>>>> > > > In the current approach, the Java doc of the SupportXXX()
>>>>>> method in the
>>>>>> > > > Source would be the single source of truth regarding how to
>>>>>> implement
>>>>>> > > this
>>>>>> > > > feature. It lists the method that has to be implemented to
>>>>>> support this
>>>>>> > > > feature, regardless of how many classes / interfaces are
>>>>>> involved.
>>>>>> > > >
>>>>>> > > > When implementing the base interface, users do not need to
>>>>>> implement a
>>>>>> > > > method with default implementation. If they are curious what
>>>>>> the method
>>>>>> > > is
>>>>>> > > > for, the java doc of that method simply points users to the
>>>>>> > SupportXXX()
>>>>>> > > > method in the Source. It seems not adding work to the users
>>>>>> compared
>>>>>> > with
>>>>>> > > > decorative interfaces, but gives much better discoverability.
>>>>>> > > >
>>>>>> > > > - all of the methods from a single feature are grouped in a
>>>>>> single
>>>>>> > > >> decorator interface, together with their dedicated java doc.
>>>>>> It's also
>>>>>> > > >> easier to google search for help using the decorator name
>>>>>> > > >
>>>>>> > > > - if an optional feature requires two methods to be implemented
>>>>>> at
>>>>>> > once,
>>>>>> > > >> decorator can guarantee that
>>>>>> > > >
>>>>>> > > > These two points are not true when multiple components and
>>>>>> classes are
>>>>>> > > > involved collaboratively to provide a feature. In our case, we
>>>>>> have
>>>>>> > both
>>>>>> > > > SourceReader and SplitReader involved. And there might be other
>>>>>> > > interfaces
>>>>>> > > > on the JM side involved for some future features. So the
>>>>>> relevant
>>>>>> > methods
>>>>>> > > > can actually be scattered over the places. That said, we may
>>>>>> still use
>>>>>> > > > decorative interfaces for each component, if the feature is
>>>>>> optional,
>>>>>> > > given
>>>>>> > > > there is a single source of truth for the feature.
>>>>>> > > >
>>>>>> > > > Here I would strongly lean towards making life easier for new
>>>>>> users,
>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for
>>>>>> the power
>>>>>> > > >> users.
>>>>>> > > >
>>>>>> > > > I actually think the current approach is simpler, more
>>>>>> extensible and
>>>>>> > > more
>>>>>> > > > general for all the users. Can you articulate a bit more on
>>>>>> which part
>>>>>> > > you
>>>>>> > > > think makes users harder to understand?
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > There is another benefit of the decorative interfaces which is
>>>>>> not
>>>>>> > > > mentioned, but might be worth considering here. Usually the
>>>>>> decorative
>>>>>> > > > interfaces give slightly better backwards compatibility than
>>>>>> the new
>>>>>> > > > default method in the interfaces. That is when users are using
>>>>>> a jar
>>>>>> > that
>>>>>> > > > was compiled with an older version of Flink which does not have
>>>>>> the
>>>>>> > > default
>>>>>> > > > method in the interfaces in question. A decorative interface
>>>>>> may still
>>>>>> > > > provide backwards compatibility in that case, while default
>>>>>> method impl
>>>>>> > > > cannot.
>>>>>> > > >
>>>>>> > > > I think in Flink we in general do not guarantee custom
>>>>>> components
>>>>>> > > compiled
>>>>>> > > > with an older version can run with a newer version of Flink. A
>>>>>> > recompile
>>>>>> > > > with a newer version would be required. That said, if we do
>>>>>> care about
>>>>>> > > > this, we can just change the "supportXXX()" method in the Source
>>>>>> > > interface
>>>>>> > > > to use decorative interfaces, and leave the other parts
>>>>>> unchanged.
>>>>>> > > >
>>>>>> > > > Thanks,
>>>>>> > > >
>>>>>> > > > Jiangjie (Becket) Qin
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > >
>>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>>>>>> pnowojski@apache.org>
>>>>>> > > > wrote:
>>>>>> > > >
>>>>>> > > >> Hi,
>>>>>> > > >>
>>>>>> > > >> Sorry for chipping in so late, but I was OoO for the last two
>>>>>> weeks.
>>>>>> > > >> Regarding the interfaces, I would be actually against adding
>>>>>> those
>>>>>> > > methods
>>>>>> > > >> to the base interfaces for the reasons mentioned above.
>>>>>> Clogging the
>>>>>> > > base
>>>>>> > > >> interface for new users with tons of methods that they do not
>>>>>> need, do
>>>>>> > > not
>>>>>> > > >> understand and do not know what to do with them. Moreover, such
>>>>>> > > decorative
>>>>>> > > >> interfaces are solving a problem if a feature requires two or
>>>>>> more
>>>>>> > > methods
>>>>>> > > >> to be implemented at the same time. If we have all of the
>>>>>> methods with
>>>>>> > > >> default implementation in the base interface, the API doesn't
>>>>>> give any
>>>>>> > > >> clue
>>>>>> > > >> to the user which set of methods are required to be
>>>>>> implemented at the
>>>>>> > > >> same
>>>>>> > > >> time.
>>>>>> > > >>
>>>>>> > > >> > a) I feel the biggest drawback of decorative interfaces is
>>>>>> which
>>>>>> > > >> interface
>>>>>> > > >> > they can decorate and which combinations of multiple
>>>>>> decorative
>>>>>> > > >> interfaces
>>>>>> > > >> > are valid. (...)
>>>>>> > > >> > In the future, if there is a new feature added
>>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we going to
>>>>>> create
>>>>>> > > >> another
>>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
>>>>>> > > >> PrePartitionedAware?
>>>>>> > > >> > Can they be combined? So I think the additional decorative
>>>>>> interface
>>>>>> > > >> like
>>>>>> > > >> > withSplitsAlignment actually increases the understanding
>>>>>> cost of
>>>>>> > users
>>>>>> > > >> > because they have to know what decorative interfaces are
>>>>>> there,
>>>>>> > which
>>>>>> > > >> > interface they can decorate and which combinations of the
>>>>>> decorative
>>>>>> > > >> > interfaces are valid and which are not. Ideally we want to
>>>>>> avoid
>>>>>> > that.
>>>>>> > > >>
>>>>>> > > >> I'm not sure if I understand how embedding default methods in
>>>>>> the base
>>>>>> > > >> interface is solving the problem: what can be combined or not?
>>>>>> If
>>>>>> > there
>>>>>> > > >> are
>>>>>> > > >> two conflicting features, having decorative interfaces that
>>>>>> can not be
>>>>>> > > >> mixed together actually makes much more sense to me rather
>>>>>> than having
>>>>>> > > >> them
>>>>>> > > >> all in one base class. How would you allow users to implement
>>>>>> only one
>>>>>> > > of
>>>>>> > > >> those two features?
>>>>>> > > >>
>>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
>>>>>> > > >> - how can a user discover what decorators are there?
>>>>>> > > >> - how can a user know where the decorator can be applied to?
>>>>>> > > >>
>>>>>> > > >> However those are drawbacks for more power users, that can be
>>>>>> > mitigated
>>>>>> > > by
>>>>>> > > >> the documentation. For example listing all of the decorators
>>>>>> with
>>>>>> > > >> detailed explanation both in the docs and in the java docs.
>>>>>> More
>>>>>> > > >> experienced users will be able to deal with those issues
>>>>>> easier, as
>>>>>> > they
>>>>>> > > >> will already have some basic understanding of Flink. Also if
>>>>>> user has
>>>>>> > a
>>>>>> > > >> problem that he wants to solve, he will google search a
>>>>>> potential
>>>>>> > > solution
>>>>>> > > >> to his problem anyway, and while doing that he is very likely
>>>>>> to
>>>>>> > > discover
>>>>>> > > >> the decorator that he needs anyway in the docs.
>>>>>> > > >>
>>>>>> > > >> The advantages are:
>>>>>> > > >> - clean and easy to implement base interface
>>>>>> > > >> - all of the methods from a single feature are grouped in a
>>>>>> single
>>>>>> > > >> decorator interface, together with their dedicated java doc.
>>>>>> It's also
>>>>>> > > >> easier to google search for help using the decorator name
>>>>>> > > >> - if an optional feature requires two methods to be
>>>>>> implemented at
>>>>>> > once,
>>>>>> > > >> decorator can guarantee that
>>>>>> > > >>
>>>>>> > > >> Here I would strongly lean towards making life easier for new
>>>>>> users,
>>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for
>>>>>> the power
>>>>>> > > >> users.
>>>>>> > > >>
>>>>>> > > >> Best,
>>>>>> > > >> Piotrek
>>>>>> > > >>
>>>>>> > > >>
>>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
>>>>>> > napisał(a):
>>>>>> > > >>
>>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
>>>>>> > > >> >
>>>>>> > > >> > I think Sebastion has a good summary. This is a really
>>>>>> helpful
>>>>>> > > >> discussion.
>>>>>> > > >> >
>>>>>> > > >> > Thinking a bit more, I feel that it might still be better to
>>>>>> add the
>>>>>> > > >> > supportsXXX() method in the Source rather than SourceReader.
>>>>>> > > >> >
>>>>>> > > >> > Generally speaking, what we are trying to do here is to let
>>>>>> the
>>>>>> > Flink
>>>>>> > > >> > framework know what the Source is capable of. In this FLIP,
>>>>>> it
>>>>>> > happens
>>>>>> > > >> to
>>>>>> > > >> > be the capability that only involves SourceReader. But in the
>>>>>> > future,
>>>>>> > > >> it is
>>>>>> > > >> > possible that another functionality involves both the
>>>>>> > SplitEnumerator
>>>>>> > > >> and
>>>>>> > > >> > SourceReader. In that case, following the current approach,
>>>>>> we
>>>>>> > should
>>>>>> > > >> put
>>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
>>>>>> SourceReader.
>>>>>> > > >> > Because if we only put this in the SourceReader, then the JM
>>>>>> would
>>>>>> > > have
>>>>>> > > >> to
>>>>>> > > >> > create a SourceReader in order to know whether this feature
>>>>>> is
>>>>>> > > >> supported,
>>>>>> > > >> > which is a little ugly. But if we put the "supportsXXX()"
>>>>>> method in
>>>>>> > > the
>>>>>> > > >> > Source, we will break the "symmetric" design because this
>>>>>> FLIP
>>>>>> > chose a
>>>>>> > > >> > different way.
>>>>>> > > >> >
>>>>>> > > >> > This is also why I think supportsXXX() method seems a good
>>>>>> thing to
>>>>>> > > >> have,
>>>>>> > > >> > because when there are a few interfaces / methods that are
>>>>>> expected
>>>>>> > to
>>>>>> > > >> be
>>>>>> > > >> > implemented at the same time in order to deliver a feature,
>>>>>> it is
>>>>>> > > always
>>>>>> > > >> > good to have a single source of truth to tell the framework
>>>>>> what to
>>>>>> > > do,
>>>>>> > > >> so
>>>>>> > > >> > the framework can do consistent things in different parts.
>>>>>> > > >> >
>>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
>>>>>> > > >> >
>>>>>> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
>>>>>> > > >> > AlignedSplitReader, what I feel awkward about is that we are
>>>>>> > > essentially
>>>>>> > > >> > expecting almost all the SourceReader implementations to
>>>>>> extend
>>>>>> > > >> > SourceReaderBase, which effectively makes the SourceReader
>>>>>> interface
>>>>>> > > >> > without the pausing support useless. So this indicates that
>>>>>> public
>>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
>>>>>> purpose) only
>>>>>> > > >> > make sense if the original interface is also expected to be
>>>>>> used.
>>>>>> > > >> > Otherwise, it seems makes more sense to add the method to the
>>>>>> > original
>>>>>> > > >> > interface itself.
>>>>>> > > >> >
>>>>>> > > >> > Cheers,
>>>>>> > > >> >
>>>>>> > > >> > Jiangjie (Becket) Qin
>>>>>> > > >> >
>>>>>> > > >> >
>>>>>> > > >> >
>>>>>> > > >> >
>>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
>>>>>> > > >> dwysakowicz@apache.org>
>>>>>> > > >> > wrote:
>>>>>> > > >> >
>>>>>> > > >> > > Thanks @Sebastian for the nice summary.
>>>>>> > > >> > >
>>>>>> > > >> > > I think most of your points aligned with the suggestions I
>>>>>> made to
>>>>>> > > the
>>>>>> > > >> > > FLIP, while you were writing your reply (I believe we hit
>>>>>> enter
>>>>>> > > >> nearly at
>>>>>> > > >> > > the same time ;) )
>>>>>> > > >> > >
>>>>>> > > >> > > Two points after we synced offline
>>>>>> > > >> > >
>>>>>> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment to
>>>>>> > > >> > > supportsPausingSplits to express the general capability of
>>>>>> > pausing.
>>>>>> > > >> > >
>>>>>> > > >> > > 2. As for if we should
>>>>>> PausingSourceReader/PausingSplitReader
>>>>>> > > (option
>>>>>> > > >> b)
>>>>>> > > >> > > or if we should just add the methods (option c), I suggest
>>>>>> to
>>>>>> > simply
>>>>>> > > >> add
>>>>>> > > >> > > the two methods as I felt this is much preferred approach
>>>>>> Becket,
>>>>>> > > >> which
>>>>>> > > >> > > others do not object. Unless there is an opposition let's
>>>>>> go with
>>>>>> > > this
>>>>>> > > >> > > option c.
>>>>>> > > >> > >
>>>>>> > > >> > > Best,
>>>>>> > > >> > >
>>>>>> > > >> > > Dawid
>>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>>>>>> > > >> > >
>>>>>> > > >> > > Hi folks,
>>>>>> > > >> > >
>>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all the
>>>>>> input and
>>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time to catch
>>>>>> up and
>>>>>> > > >> > structure
>>>>>> > > >> > > (for myself) the discussion and I wanted to find a way to
>>>>>> > structure
>>>>>> > > >> the
>>>>>> > > >> > > conclusions. (Also because I had the feeling that some
>>>>>> concerns
>>>>>> > got
>>>>>> > > >> lost
>>>>>> > > >> > in
>>>>>> > > >> > > the discussion.) This is my attempt and please correct me
>>>>>> if
>>>>>> > > >> something is
>>>>>> > > >> > > wrong or misunderstood. I tried to collect and assemble the
>>>>>> > > opinions,
>>>>>> > > >> > > suggestions, and conclusions (to the best of my knowledge):
>>>>>> > > >> > >
>>>>>> > > >> > > # Top A: Should split alignment (pause/resume behavior) be
>>>>>> a
>>>>>> > general
>>>>>> > > >> > > capability?
>>>>>> > > >> > >
>>>>>> > > >> > > I personally don't see any reason no to have it a general
>>>>>> > capability
>>>>>> > > >> > > because for the alignSplit method it is actually
>>>>>> independent of
>>>>>> > the
>>>>>> > > >> > > watermarks. If we agree here to have it a general
>>>>>> capability, we
>>>>>> > > >> should
>>>>>> > > >> > > also agree on the right wording. Does
>>>>>> "alignSplits(splitsToResume,
>>>>>> > > >> > > splitsToPause)" refer to what is then actually meant? (I
>>>>>> see it as
>>>>>> > > >> okay.
>>>>>> > > >> > I
>>>>>> > > >> > > don't have any better idea whilst Arvid suggested
>>>>>> > > >> "pauseOrResumeSplits".)
>>>>>> > > >> > >
>>>>>> > > >> > > # Top B: Should it be possible do enable/disable split
>>>>>> alignment?
>>>>>> > > >> > >
>>>>>> > > >> > > I would personally not disable the split alignment on the
>>>>>> source
>>>>>> > > >> reader
>>>>>> > > >> > > side because if split alignment is used for some other use
>>>>>> case
>>>>>> > (see
>>>>>> > > >> A)
>>>>>> > > >> > it
>>>>>> > > >> > > could have nasty side effects on other/future use cases.
>>>>>> Instead,
>>>>>> > I
>>>>>> > > >> would
>>>>>> > > >> > > disable "watermark split alignment" where I think it should
>>>>>> > disable
>>>>>> > > >> the
>>>>>> > > >> > > watermark-dependent trigger for split alignment.
>>>>>> > > >> > >
>>>>>> > > >> > > # Top C: Should we add a supportsX method?
>>>>>> > > >> > >
>>>>>> > > >> > > I find it difficult to define the scope of a supportsX
>>>>>> method
>>>>>> > w.r.t.
>>>>>> > > >> to
>>>>>> > > >> > > the following questions: a) Where is it used? and b) What
>>>>>> is the
>>>>>> > > >> expected
>>>>>> > > >> > > output? To b), it's not straight-forward to provide a
>>>>>> meaningful
>>>>>> > > >> output,
>>>>>> > > >> > > e.g., if SourceReader supports split alignment but
>>>>>> SplitReader
>>>>>> > not.
>>>>>> > > >> This
>>>>>> > > >> > is
>>>>>> > > >> > > because with the current implementation, we can determine
>>>>>> whether
>>>>>> > > >> split
>>>>>> > > >> > > alignment is fully supported only during runtime and
>>>>>> specifically
>>>>>> > > >> > actually
>>>>>> > > >> > > only when calling alignSplits down the call hierarchy up
>>>>>> to the
>>>>>> > > actual
>>>>>> > > >> > > SplitReaders.
>>>>>> > > >> > >
>>>>>> > > >> > > Therefore, I would suggest to either raise an error or
>>>>>> warning if
>>>>>> > > the
>>>>>> > > >> > > alignment is called but not supported at some point. I
>>>>>> know we
>>>>>> > > should
>>>>>> > > >> > > carefully think about when this could be the case because
>>>>>> we don't
>>>>>> > > >> want
>>>>>> > > >> > to
>>>>>> > > >> > > flood anybody with such warnings. However, warnings could
>>>>>> be an
>>>>>> > > >> indicator
>>>>>> > > >> > > for the user that for watermark split alignment use case
>>>>>> split
>>>>>> > > >> reading is
>>>>>> > > >> > > imbalanced with the conclusion to either disable the
>>>>>> trigger for
>>>>>> > > >> > watermark
>>>>>> > > >> > > split alignment (see Top B) or to use/implement a source
>>>>>> and
>>>>>> > reader
>>>>>> > > >> that
>>>>>> > > >> > > fully supports split alignment.
>>>>>> > > >> > >
>>>>>> > > >> > > # Top D: How to design interfaces?
>>>>>> > > >> > >
>>>>>> > > >> > > Thanks for structuring the discussion with the the various
>>>>>> > > >> possibilities
>>>>>> > > >> > > (a-d). From the discussion and emails, I would like to
>>>>>> summarize
>>>>>> > the
>>>>>> > > >> > > following requirements:
>>>>>> > > >> > > - Interfaces should be consistent ("symmetric"), i.e.,
>>>>>> similar
>>>>>> > > >> semantics
>>>>>> > > >> > > should have similar interfaces with similar usage.
>>>>>> > > >> > > - Make explicit which implementations implement
>>>>>> interfaces/support
>>>>>> > > >> > > behavior.
>>>>>> > > >> > > - Make clear what are default implementations and how to
>>>>>> implement
>>>>>> > > >> > > interfaces with desired behavior.
>>>>>> > > >> > >
>>>>>> > > >> > > This is a simplified view of the relations between relevant
>>>>>> > classes
>>>>>> > > of
>>>>>> > > >> > the
>>>>>> > > >> > > PoC implementation:
>>>>>> > > >> > >
>>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal)
>>>>>> <|-- ..
>>>>>> > <|--
>>>>>> > > >> > > MySourceReader
>>>>>> > > >> > >
>>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
>>>>>> > SplitFetcher
>>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
>>>>>> > > >> > >
>>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
>>>>>> > > >> > >
>>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
>>>>>> implement most
>>>>>> > of
>>>>>> > > >> the
>>>>>> > > >> > > "logic" for split alignment just because we wanted to
>>>>>> implement
>>>>>> > > split
>>>>>> > > >> > > alignment and wanted it to be available as kind of a
>>>>>> default. As a
>>>>>> > > >> > > consequence, we have a "default implementation" for
>>>>>> SourceReader
>>>>>> > > that
>>>>>> > > >> > > implements the actual logic for split alignment. For that
>>>>>> reason,
>>>>>> > I
>>>>>> > > >> find
>>>>>> > > >> > it
>>>>>> > > >> > > very confusing to have a NOOP default implementation in the
>>>>>> > > interface
>>>>>> > > >> for
>>>>>> > > >> > > the SourceReader. As a consequence, interface strategy c)
>>>>>> is
>>>>>> > > difficult
>>>>>> > > >> > > because this would require NOOP default implementations in
>>>>>> the
>>>>>> > > public
>>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is the
>>>>>> same for
>>>>>> > > >> strategy
>>>>>> > > >> > > d) because it would require NOOP default implementation in
>>>>>> the
>>>>>> > > >> > > SourceReader. Further, as Dawid described method
>>>>>> signatures of
>>>>>> > > >> alignSplit
>>>>>> > > >> > > for SourceReader and SplitReader differ and it would be
>>>>>> extremely
>>>>>> > > >> > difficult
>>>>>> > > >> > > to make the signatures the same (with even potential
>>>>>> performance
>>>>>> > > >> impact
>>>>>> > > >> > > because of additional loop-ups of split ids). Therefore,
>>>>>> having a
>>>>>> > > >> > symmetric
>>>>>> > > >> > > decorative interface as of strategy a) is actually not
>>>>>> possible
>>>>>> > and
>>>>>> > > >> > having
>>>>>> > > >> > > two decorative interfaces with different method signatures
>>>>>> is
>>>>>> > > >> confusing.
>>>>>> > > >> > My
>>>>>> > > >> > > conclusion is that we are best with strategy b) which
>>>>>> means to
>>>>>> > have
>>>>>> > > >> > > specializing sub-interfaces that inherit from the parent
>>>>>> > interface:
>>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
>>>>>> > > >> > AlignedSplitReader
>>>>>> > > >> > > With this option, I'm not 100% sure what the implications
>>>>>> are and
>>>>>> > if
>>>>>> > > >> this
>>>>>> > > >> > > could get nasty. I would suggest that Dawid and I just try
>>>>>> to
>>>>>> > > >> implement
>>>>>> > > >> > and
>>>>>> > > >> > > see if we like it. :)
>>>>>> > > >> > >
>>>>>> > > >> > > # Summary
>>>>>> > > >> > >
>>>>>> > > >> > > In conclusion, please let me know your perspectives. Please
>>>>>> > correct
>>>>>> > > >> me,
>>>>>> > > >> > if
>>>>>> > > >> > > something is wrong or if I misunderstood something. My
>>>>>> perspective
>>>>>> > > >> would
>>>>>> > > >> > be:
>>>>>> > > >> > >
>>>>>> > > >> > > Top A: Yes
>>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
>>>>>> alignment)
>>>>>> > > >> > > Top C: No
>>>>>> > > >> > > Top D: b)
>>>>>> > > >> > >
>>>>>> > > >> > > Best,
>>>>>> > > >> > > Sebastian
>>>>>> > > >> > >
>>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
>>>>>> > > >> dwysakowicz@apache.org
>>>>>> > > >> > >
>>>>>> > > >> > > wrote:
>>>>>> > > >> > >
>>>>>> > > >> > >> @Arvid:
>>>>>> > > >> > >>
>>>>>> > > >> > >> While I also like Becket's capability approach, I fear
>>>>>> that it
>>>>>> > > >> doesn't
>>>>>> > > >> > work
>>>>>> > > >> > >> for this particular use case: Sources can always be
>>>>>> aligned
>>>>>> > > >> cross-task
>>>>>> > > >> > and
>>>>>> > > >> > >> this is just about intra-task alignment. So it's
>>>>>> plausible to put
>>>>>> > > >> > sources
>>>>>> > > >> > >> into an alignment group even though they do not use any
>>>>>> of the
>>>>>> > > >> presented
>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
>>>>>> they handle
>>>>>> > > >> > multiple
>>>>>> > > >> > >> splits (see motivation section).
>>>>>> > > >> > >>
>>>>>> > > >> > >> Yes, but the "supportXXX" method would be for telling if
>>>>>> it
>>>>>> > > supports
>>>>>> > > >> > that
>>>>>> > > >> > >> intra-task alignment. Cross-task alignment would always be
>>>>>> > > supported.
>>>>>> > > >> > >>
>>>>>> > > >> > >> I updated interfaces to what I believe to be closest to a
>>>>>> > consensus
>>>>>> > > >> > >> between all participants. Do you mind taking a look?
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Dawid
>>>>>> > > >> > >>
>>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks for pushing this effort.
>>>>>> > > >> > >>
>>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree that
>>>>>> decorator
>>>>>> > > >> interfaces
>>>>>> > > >> > >> should be avoided but I'm also not a big fan of
>>>>>> overloading the
>>>>>> > > base
>>>>>> > > >> > >> interfaces (they are hard to implement as is). The usual
>>>>>> feedback
>>>>>> > > to
>>>>>> > > >> > >> Source-related interfaces are always that they are
>>>>>> overwhelming
>>>>>> > and
>>>>>> > > >> too
>>>>>> > > >> > >> hard to implement. However, I'd also not oppose 1c) as
>>>>>> scattered
>>>>>> > > >> > interfaces
>>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
>>>>>> > > >> > >> While I also like Becket's capability approach, I fear
>>>>>> that it
>>>>>> > > >> doesn't
>>>>>> > > >> > work
>>>>>> > > >> > >> for this particular use case: Sources can always be
>>>>>> aligned
>>>>>> > > >> cross-task
>>>>>> > > >> > and
>>>>>> > > >> > >> this is just about intra-task alignment. So it's
>>>>>> plausible to put
>>>>>> > > >> > sources
>>>>>> > > >> > >> into an alignment group even though they do not use any
>>>>>> of the
>>>>>> > > >> presented
>>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if
>>>>>> they handle
>>>>>> > > >> > multiple
>>>>>> > > >> > >> splits (see motivation section).
>>>>>> > > >> > >>
>>>>>> > > >> > >> I think renaming alignSplits to facilitate future use
>>>>>> cases makes
>>>>>> > > >> sense
>>>>>> > > >> > but
>>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be adjusted
>>>>>> > > >> accordingly.
>>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader and I'd
>>>>>> go for
>>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could
>>>>>> also
>>>>>> > split
>>>>>> > > it
>>>>>> > > >> > into
>>>>>> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may
>>>>>> allow
>>>>>> > > >> Sources
>>>>>> > > >> > to
>>>>>> > > >> > >> just use 1 instead of 2 library calls (as written in the
>>>>>> > Javadoc),
>>>>>> > > >> both
>>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if there
>>>>>> is a
>>>>>> > system
>>>>>> > > >> that
>>>>>> > > >> > >> can.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Some nit for the FLIP:
>>>>>> > > >> > >> - Please replace "stop" with "pause".
>>>>>> > > >> > >> - Not sure if it's worth it in the capability section:
>>>>>> Sources
>>>>>> > that
>>>>>> > > >> > adopt
>>>>>> > > >> > >> this interface cannot be used in earlier versions. So it
>>>>>> feels
>>>>>> > like
>>>>>> > > >> we
>>>>>> > > >> > are
>>>>>> > > >> > >> only forward compatible (old sources can be used after the
>>>>>> > change);
>>>>>> > > >> but
>>>>>> > > >> > I
>>>>>> > > >> > >> guess this holds for any API addition.
>>>>>> > > >> > >> - You might want to add what happens when all splits are
>>>>>> paused.
>>>>>> > > >> > >> - You may want to describe how the 3 flavors of
>>>>>> SourceReaderBase
>>>>>> > > >> > interact
>>>>>> > > >> > >> with the interface.
>>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka and
>>>>>> Pulsar in
>>>>>> > the
>>>>>> > > >> > FLIP.
>>>>>> > > >> > >> For me, this is rather immediate follow-up work. (could
>>>>>> be in the
>>>>>> > > >> same
>>>>>> > > >> > >> umbrella ticket)
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Arvid
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>>> > > >> > >> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>>>>> WithSplitsAlignment",
>>>>>> > > >> along
>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>>>>>> WithSplitsAlignment",
>>>>>> > > or
>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>>>>> > > "MySplitReader
>>>>>> > > >> > >> implements AlignedSplitReader", or
>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>>>>> "MySplitReader
>>>>>> > > >> > implements
>>>>>> > > >> > >> SplitReader".
>>>>>> > > >> > >>
>>>>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>>>>> "MySplitReader
>>>>>> > > >> > implements
>>>>>> > > >> > >> AlignedSplitReader".
>>>>>> > > >> > >> I am fine with this API, although personally speaking I
>>>>>> think it
>>>>>> > is
>>>>>> > > >> > simpler
>>>>>> > > >> > >> to just add a new method to the split reader with default
>>>>>> impl.
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> I think that is a good idea to have it aligned as much as
>>>>>> > possible.
>>>>>> > > >> I'd
>>>>>> > > >> > be
>>>>>> > > >> > >> +1 for your option c). We can merge AlignedSplitReader
>>>>>> with
>>>>>> > > >> > SplitReader. We
>>>>>> > > >> > >> will update the FLIP shortly.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Dawid
>>>>>> > > >> > >>
>>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks for the comment, Jark.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 3. Interface/Method Name.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Can the interface be used to align other things in the
>>>>>> future?
>>>>>> > For
>>>>>> > > >> > example,
>>>>>> > > >> > >> align read speed, I have
>>>>>> > > >> > >> seen users requesting global rate limits. This feature
>>>>>> may also
>>>>>> > > need
>>>>>> > > >> an
>>>>>> > > >> > >> interface like this.
>>>>>> > > >> > >> If we don't plan to extend this interface to support
>>>>>> align other
>>>>>> > > >> > things, I
>>>>>> > > >> > >> suggest explicitly declaring
>>>>>> > > >> > >> the purpose of the methods, such as
>>>>>> `alignWatermarksForSplits`
>>>>>> > > >> instead
>>>>>> > > >> > of
>>>>>> > > >> > >> `alignSplits`.
>>>>>> > > >> > >>
>>>>>> > > >> > >> This is a good point. Naming wise, it would usually be
>>>>>> more
>>>>>> > > >> extensible
>>>>>> > > >> > to
>>>>>> > > >> > >> just describe what the method actually does, instead of
>>>>>> assuming
>>>>>> > > the
>>>>>> > > >> > >> purpose of doing this. For example, in this case,
>>>>>> > > >> pauseOrResumeSplits()
>>>>>> > > >> > >> would be more extensible because this can be used for any
>>>>>> kind of
>>>>>> > > >> flow
>>>>>> > > >> > >> control, be it watermark alignment or simple rate
>>>>>> limiting.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 4. Interface or Method.
>>>>>> > > >> > >>
>>>>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>>>>> their
>>>>>> > own
>>>>>> > > >> > >> advantages.
>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>>>>> abilities
>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this case.
>>>>>> When
>>>>>> > you
>>>>>> > > >> have
>>>>>> > > >> > a
>>>>>> > > >> > >> bunch of abilities and each ability
>>>>>> > > >> > >> has more than one method, Interfaces can help to organize
>>>>>> them
>>>>>> > and
>>>>>> > > >> make
>>>>>> > > >> > >> users clear which methods
>>>>>> > > >> > >> need to implement when you want to have an ability.
>>>>>> > > >> > >>
>>>>>> > > >> > >> I am OK with decorative interfaces if this is a general
>>>>>> design
>>>>>> > > >> pattern
>>>>>> > > >> > in
>>>>>> > > >> > >> the other components in Flink. But it looks like the
>>>>>> current API
>>>>>> > > >> > proposal
>>>>>> > > >> > >> is not symmetric.
>>>>>> > > >> > >>
>>>>>> > > >> > >> The current proposal is essentially "MySourceReader
>>>>>> implements
>>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
>>>>>> "MySplitReader
>>>>>> > > >> implements
>>>>>> > > >> > >> AlignedSplitsReader".
>>>>>> > > >> > >>
>>>>>> > > >> > >> Should we make the API symmetric? I'd consider any one of
>>>>>> the
>>>>>> > > >> following
>>>>>> > > >> > as
>>>>>> > > >> > >> symmetric.
>>>>>> > > >> > >>
>>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>>>>> WithSplitsAlignment",
>>>>>> > > >> along
>>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>>>>>> WithSplitsAlignment",
>>>>>> > > or
>>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>>>>> > > "MySplitReader
>>>>>> > > >> > >> implements AlignedSplitReader", or
>>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>>>>> "MySplitReader
>>>>>> > > >> > implements
>>>>>> > > >> > >> SplitReader".
>>>>>> > > >> > >>
>>>>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>>>>> "MySplitReader
>>>>>> > > >> > implements
>>>>>> > > >> > >> AlignedSplitReader".
>>>>>> > > >> > >> I am fine with this API, although personally speaking I
>>>>>> think it
>>>>>> > is
>>>>>> > > >> > simpler
>>>>>> > > >> > >> to just add a new method to the split reader with default
>>>>>> impl.
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
>>>>>> > dwysakowicz@apache.org
>>>>>> > > >
>>>>>> > > >> <
>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks
>>>>>> for the
>>>>>> > > reply.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Having said that, as I don't have a preference and I
>>>>>> agree most
>>>>>> > of
>>>>>> > > >> the
>>>>>> > > >> > >>
>>>>>> > > >> > >> sources will support the alignment I am fine following
>>>>>> your
>>>>>> > > >> suggestion
>>>>>> > > >> > to
>>>>>> > > >> > >> have the SourceReader extending from
>>>>>> > WithWatermarksSplitsAlignment,
>>>>>> > > >> but
>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
>>>>>> keep the
>>>>>> > > two
>>>>>> > > >> > >> methods together.
>>>>>> > > >> > >>
>>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source is that
>>>>>> this
>>>>>> > > allows
>>>>>> > > >> > some
>>>>>> > > >> > >> compile time check. For example, if a user enabled
>>>>>> watermark
>>>>>> > > >> alignment
>>>>>> > > >> > >> while it is not supported by the Source, an exception can
>>>>>> be
>>>>>> > thrown
>>>>>> > > >> at
>>>>>> > > >> > >> compile time. It seems in general useful. That said, I
>>>>>> agree that
>>>>>> > > API
>>>>>> > > >> > >> cleanliness wise it is better to put the two methods
>>>>>> together.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com>
>>>>>> <
>>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com>
>>>>>> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did
>>>>>> miss it.
>>>>>> > > >> > >> I don't have other concerns then.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >> Jark
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
>>>>>> > > >> dwysakowicz@apache.org>
>>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
>>>>>> > > >> dwysakowicz@apache.org>
>>>>>> > > >> > >> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Jark:
>>>>>> > > >> > >>
>>>>>> > > >> > >> 1. Will the framework always align with watermarks when
>>>>>> the
>>>>>> > source
>>>>>> > > >> > >> implements the interface?
>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment even
>>>>>> if Kafka
>>>>>> > > >> > >> implements the interface,
>>>>>> > > >> > >> and this will affect the throughput somehow. I agree with
>>>>>> Becket
>>>>>> > > >> > >> we may need a
>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to configure
>>>>>> the
>>>>>> > source
>>>>>> > > >> to
>>>>>> > > >> > >> enable/disable the alignment.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>>>>>> affect
>>>>>> > > >> > >>
>>>>>> > > >> > >> throughput
>>>>>> > > >> > >>
>>>>>> > > >> > >> if the reader is constantly
>>>>>> > > >> > >>  switching between pause and resume. Can users configure
>>>>>> the
>>>>>> > > >> alignment
>>>>>> > > >> > >> offset?
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> This is covered in the previous FLIP[1] which has been
>>>>>> already
>>>>>> > > >> > >>
>>>>>> > > >> > >> implemented
>>>>>> > > >> > >>
>>>>>> > > >> > >> in 1.15. In short, it must be enabled with the watermark
>>>>>> strategy
>>>>>> > > >> which
>>>>>> > > >> > >> also configures drift and update interval.
>>>>>> > > >> > >>
>>>>>> > > >> > >> If we don't plan to extend this interface to support
>>>>>> align other
>>>>>> > > >> things,
>>>>>> > > >> > >>
>>>>>> > > >> > >> I
>>>>>> > > >> > >>
>>>>>> > > >> > >> suggest explicitly declaring
>>>>>> > > >> > >> the purpose of the methods, such as
>>>>>> `alignWatermarksForSplits`
>>>>>> > > >> instead
>>>>>> > > >> > of
>>>>>> > > >> > >> `alignSplits`.
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> Sure let's rename it.
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Becket:
>>>>>> > > >> > >>
>>>>>> > > >> > >> I understand your point. On the other hand putting all
>>>>>> methods,
>>>>>> > > even
>>>>>> > > >> > with
>>>>>> > > >> > >> "supportsXXX" methods for enabling certain features,
>>>>>> makes the
>>>>>> > > entry
>>>>>> > > >> > >> threshold for writing a new source higher. Instead of
>>>>>> focusing on
>>>>>> > > the
>>>>>> > > >> > >>
>>>>>> > > >> > >> basic
>>>>>> > > >> > >>
>>>>>> > > >> > >> and required properties of the Source, the person
>>>>>> implementing a
>>>>>> > > >> source
>>>>>> > > >> > >> must bother with and need to figure out what all of the
>>>>>> extra
>>>>>> > > >> features
>>>>>> > > >> > >>
>>>>>> > > >> > >> are
>>>>>> > > >> > >>
>>>>>> > > >> > >> about and how to deal with them. It makes it also harder
>>>>>> to
>>>>>> > > organize
>>>>>> > > >> > >> methods in coupled groups as Jark said.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Having said that, as I don't have a preference and I
>>>>>> agree most
>>>>>> > of
>>>>>> > > >> the
>>>>>> > > >> > >> sources will support the alignment I am fine following
>>>>>> your
>>>>>> > > >> suggestion
>>>>>> > > >> > to
>>>>>> > > >> > >> have the SourceReader extending from
>>>>>> > WithWatermarksSplitsAlignment,
>>>>>> > > >> but
>>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
>>>>>> keep the
>>>>>> > > two
>>>>>> > > >> > >> methods together.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Lastly, I agree it is really unfortunate the "alignSplits"
>>>>>> > methods
>>>>>> > > >> > differ
>>>>>> > > >> > >> slightly for SourceReader and SpitReader. The reason for
>>>>>> that is
>>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
>>>>>> SplitReader
>>>>>> > > needs
>>>>>> > > >> the
>>>>>> > > >> > >> actual splits to pause them. I found the discrepancy
>>>>>> acceptable
>>>>>> > for
>>>>>> > > >> the
>>>>>> > > >> > >> sake of simplifying changes significantly, especially as
>>>>>> they
>>>>>> > would
>>>>>> > > >> > >>
>>>>>> > > >> > >> highly
>>>>>> > > >> > >>
>>>>>> > > >> > >> likely impact performance as we would have to perform
>>>>>> additional
>>>>>> > > >> > lookups.
>>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Dawid
>>>>>> > > >> > >>
>>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>>>>>> > > >> > >>
>>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
>>>>>> > > >> > >>
>>>>>> > > >> > >> I just have some minor questions (maybe I missed
>>>>>> something).
>>>>>> > > >> > >>
>>>>>> > > >> > >> 1. Will the framework always align with watermarks when
>>>>>> the
>>>>>> > source
>>>>>> > > >> > >> implements the interface?
>>>>>> > > >> > >> I'm afraid not every case needs watermark alignment even
>>>>>> if Kafka
>>>>>> > > >> > >> implements the interface,
>>>>>> > > >> > >> and this will affect the throughput somehow. I agree with
>>>>>> Becket
>>>>>> > > >> > >> we may need a
>>>>>> > > >> > >> `supportSplitsAlignment()` method for users to configure
>>>>>> the
>>>>>> > source
>>>>>> > > >> to
>>>>>> > > >> > >> enable/disable the alignment.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>>>>>> affect
>>>>>> > > >> > >>
>>>>>> > > >> > >> throughput
>>>>>> > > >> > >>
>>>>>> > > >> > >> if the reader is constantly
>>>>>> > > >> > >>  switching between pause and resume. Can users configure
>>>>>> the
>>>>>> > > >> alignment
>>>>>> > > >> > >> offset?
>>>>>> > > >> > >>
>>>>>> > > >> > >> 3. Interface/Method Name.
>>>>>> > > >> > >> Can the interface be used to align other things in the
>>>>>> future?
>>>>>> > For
>>>>>> > > >> > >>
>>>>>> > > >> > >> example,
>>>>>> > > >> > >>
>>>>>> > > >> > >> align read speed, I have
>>>>>> > > >> > >> seen users requesting global rate limits. This feature
>>>>>> may also
>>>>>> > > need
>>>>>> > > >> an
>>>>>> > > >> > >> interface like this.
>>>>>> > > >> > >> If we don't plan to extend this interface to support
>>>>>> align other
>>>>>> > > >> things,
>>>>>> > > >> > >>
>>>>>> > > >> > >> I
>>>>>> > > >> > >>
>>>>>> > > >> > >> suggest explicitly declaring
>>>>>> > > >> > >> the purpose of the methods, such as
>>>>>> `alignWatermarksForSplits`
>>>>>> > > >> instead
>>>>>> > > >> > of
>>>>>> > > >> > >> `alignSplits`.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 4. Interface or Method.
>>>>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>>>>> their
>>>>>> > own
>>>>>> > > >> > >> advantages.
>>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>>>>> abilities
>>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>>>> > > >> > >> and I prefer Interfaces rather than methods in this case.
>>>>>> When
>>>>>> > you
>>>>>> > > >> have
>>>>>> > > >> > a
>>>>>> > > >> > >> bunch of abilities and each ability
>>>>>> > > >> > >> has more than one method, Interfaces can help to organize
>>>>>> them
>>>>>> > and
>>>>>> > > >> make
>>>>>> > > >> > >> users clear which methods
>>>>>> > > >> > >> need to implement when you want to have an ability.
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >> Jark
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>>>>>> becket.qin@gmail.com>
>>>>>> > <
>>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
>>>>>> becket.qin@gmail.com>
>>>>>> > <
>>>>>> > > >> > >> becket.qin@gmail.com> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> Hi Dawid,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks for the explanation. Apologies that I somehow
>>>>>> misread a
>>>>>> > > bunch
>>>>>> > > >> of
>>>>>> > > >> > >> "align" and thought they were "assign".
>>>>>> > > >> > >>
>>>>>> > > >> > >> Regarding 1, by default implementation, I was thinking of
>>>>>> the
>>>>>> > > default
>>>>>> > > >> > >>
>>>>>> > > >> > >> no-op
>>>>>> > > >> > >>
>>>>>> > > >> > >> implementation. I am a little worried about the
>>>>>> proliferation of
>>>>>> > > >> > >>
>>>>>> > > >> > >> decorative
>>>>>> > > >> > >>
>>>>>> > > >> > >> interfaces. I think the most important thing about
>>>>>> interfaces is
>>>>>> > > that
>>>>>> > > >> > >>
>>>>>> > > >> > >> they
>>>>>> > > >> > >>
>>>>>> > > >> > >> are easy to understand. In this case, I prefer adding new
>>>>>> method
>>>>>> > to
>>>>>> > > >> the
>>>>>> > > >> > >> existing interface for the following reasons:
>>>>>> > > >> > >>
>>>>>> > > >> > >> a) I feel the biggest drawback of decorative interfaces
>>>>>> is which
>>>>>> > > >> > >>
>>>>>> > > >> > >> interface
>>>>>> > > >> > >>
>>>>>> > > >> > >> they can decorate and which combinations of multiple
>>>>>> decorative
>>>>>> > > >> > >>
>>>>>> > > >> > >> interfaces
>>>>>> > > >> > >>
>>>>>> > > >> > >> are valid. In the current FLIP, the withSplitsAlignment
>>>>>> interface
>>>>>> > > is
>>>>>> > > >> > only
>>>>>> > > >> > >> applicable to the SourceReader which means it can't
>>>>>> decorate any
>>>>>> > > >> other
>>>>>> > > >> > >> interface. From an interface design perspective, a natural
>>>>>> > question
>>>>>> > > >> is
>>>>>> > > >> > >>
>>>>>> > > >> > >> why
>>>>>> > > >> > >>
>>>>>> > > >> > >> not let "AlignedSplitReader" extend
>>>>>> "withSplitsAlignment"? And it
>>>>>> > > is
>>>>>> > > >> > also
>>>>>> > > >> > >> natural to assume that a split reader implementing both
>>>>>> > SplitReader
>>>>>> > > >> and
>>>>>> > > >> > >> WithSplitAlignment would work, because a source reader
>>>>>> > implementing
>>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why isn't
>>>>>> there an
>>>>>> > > >> > >>
>>>>>> > > >> > >> interface
>>>>>> > > >> > >>
>>>>>> > > >> > >> of AlignedSourceReader? In the future, if there is a new
>>>>>> feature
>>>>>> > > >> added
>>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we going
>>>>>> to
>>>>>> > create
>>>>>> > > >> > >>
>>>>>> > > >> > >> another
>>>>>> > > >> > >>
>>>>>> > > >> > >> interface of SplitReader such as SortedSplitReader or
>>>>>> > > >> > >>
>>>>>> > > >> > >> PrePartitionedAware?
>>>>>> > > >> > >>
>>>>>> > > >> > >> Can they be combined? So I think the additional decorative
>>>>>> > > interface
>>>>>> > > >> > like
>>>>>> > > >> > >> withSplitsAlignment actually increases the understanding
>>>>>> cost of
>>>>>> > > >> users
>>>>>> > > >> > >> because they have to know what decorative interfaces are
>>>>>> there,
>>>>>> > > which
>>>>>> > > >> > >> interface they can decorate and which combinations of the
>>>>>> > > decorative
>>>>>> > > >> > >> interfaces are valid and which are not. Ideally we want
>>>>>> to avoid
>>>>>> > > >> that.
>>>>>> > > >> > To
>>>>>> > > >> > >> be clear, I am not opposing having an interface of
>>>>>> > > >> withSplitsAlignment,
>>>>>> > > >> > >>
>>>>>> > > >> > >> it
>>>>>> > > >> > >>
>>>>>> > > >> > >> is completely OK to have it as an internal interface and
>>>>>> let
>>>>>> > > >> > SourceReader
>>>>>> > > >> > >> and SplitReader both extend it.
>>>>>> > > >> > >>
>>>>>> > > >> > >> b) Adding a new method to the SourceReader with a default
>>>>>> > > >> implementation
>>>>>> > > >> > >>
>>>>>> > > >> > >> of
>>>>>> > > >> > >>
>>>>>> > > >> > >> no-op would help avoid logic branching in the source
>>>>>> logic,
>>>>>> > > >> especially
>>>>>> > > >> > >> given that we agree that the vast majority of the
>>>>>> SourceReader
>>>>>> > > >> > >> implementations, if not all, would just extend from the
>>>>>> > > >> > SourceReaderBase.
>>>>>> > > >> > >> That means adding a new method to the interface would
>>>>>> effectively
>>>>>> > > >> give
>>>>>> > > >> > >>
>>>>>> > > >> > >> the
>>>>>> > > >> > >>
>>>>>> > > >> > >> same user experience, but simpler.
>>>>>> > > >> > >>
>>>>>> > > >> > >> c) A related design principle that may be worth
>>>>>> discussing is how
>>>>>> > > do
>>>>>> > > >> we
>>>>>> > > >> > >>
>>>>>> > > >> > >> let
>>>>>> > > >> > >>
>>>>>> > > >> > >> the Source implementations tell Flink what capability is
>>>>>> > supported
>>>>>> > > >> and
>>>>>> > > >> > >>
>>>>>> > > >> > >> what
>>>>>> > > >> > >>
>>>>>> > > >> > >> is not. Personally speaking I feel the most intuitive
>>>>>> place to me
>>>>>> > > is
>>>>>> > > >> in
>>>>>> > > >> > >>
>>>>>> > > >> > >> the
>>>>>> > > >> > >>
>>>>>> > > >> > >> Source itself, because that is the entrance of the entire
>>>>>> Source
>>>>>> > > >> > >>
>>>>>> > > >> > >> connector
>>>>>> > > >> > >>
>>>>>> > > >> > >> logic.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
>>>>>> following
>>>>>> > > >> interface
>>>>>> > > >> > >> would be easier to understand by the users.
>>>>>> > > >> > >>
>>>>>> > > >> > >> - Change "withSplitsAlignment" to internal interface, let
>>>>>> both
>>>>>> > > >> > >>
>>>>>> > > >> > >> SourceReader
>>>>>> > > >> > >>
>>>>>> > > >> > >> and SplitReader extend from it, with a default no-op
>>>>>> > > implementation.
>>>>>> > > >> > >> - Add a new method "boolean supportSplitsAlignment()" to
>>>>>> the
>>>>>> > Source
>>>>>> > > >> > >> interface, with a default implementation returning false.
>>>>>> Sources
>>>>>> > > >> that
>>>>>> > > >> > >>
>>>>>> > > >> > >> have
>>>>>> > > >> > >>
>>>>>> > > >> > >> implemented the alignment logic can change this to return
>>>>>> true,
>>>>>> > and
>>>>>> > > >> > >> override the alignSplits() methods in the SourceReader /
>>>>>> > > SplitReader
>>>>>> > > >> if
>>>>>> > > >> > >> needed.
>>>>>> > > >> > >> - In the future, if a new optional feature is going to be
>>>>>> added
>>>>>> > to
>>>>>> > > >> the
>>>>>> > > >> > >> Source, and that feature requires the awareness from
>>>>>> Flink, we
>>>>>> > can
>>>>>> > > >> add
>>>>>> > > >> > >>
>>>>>> > > >> > >> more
>>>>>> > > >> > >>
>>>>>> > > >> > >> such methods to the Source.
>>>>>> > > >> > >>
>>>>>> > > >> > >> What do you think?
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
>>>>>> > > >> dwysakowicz@apache.org>
>>>>>> > > >> > <dw...@apache.org>
>>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>>> > > >> > >>
>>>>>> > > >> > >> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Konstantin:
>>>>>> > > >> > >>
>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>>>>> (aka the
>>>>>> > > >> stop &
>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>>>>>> only,
>>>>>> > > >> correct?
>>>>>> > > >> > >>
>>>>>> > > >> > >> Correct, as far as I know though, those are the only
>>>>>> sources
>>>>>> > which
>>>>>> > > >> > >>
>>>>>> > > >> > >> consume
>>>>>> > > >> > >>
>>>>>> > > >> > >> concurrently from multiple splits and thus alignment
>>>>>> applies.
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Thomas:
>>>>>> > > >> > >>
>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>>>>>> SourceReaderBase and
>>>>>> > > then
>>>>>> > > >> > >>
>>>>>> > > >> > >> doing
>>>>>> > > >> > >>
>>>>>> > > >> > >> nothing if the split reader does not implement
>>>>>> AlignedSplitReader
>>>>>> > > >> could
>>>>>> > > >> > >>
>>>>>> > > >> > >> be
>>>>>> > > >> > >>
>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
>>>>>> added to
>>>>>> > the
>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>>>>>> > explicit
>>>>>> > > >> that
>>>>>> > > >> > >> the source actually supports it.
>>>>>> > > >> > >>
>>>>>> > > >> > >> I understand your concern. Hmm, I think we could actually
>>>>>> do
>>>>>> > that.
>>>>>> > > >> Given
>>>>>> > > >> > >> the actual implementation of the
>>>>>> SourceReaderBase#alignSplits is
>>>>>> > > >> rather
>>>>>> > > >> > >> short (just a forward to the corresponding method of
>>>>>> > SplitFetcher),
>>>>>> > > >> we
>>>>>> > > >> > >> could reimplement it in the actual source
>>>>>> implementations. This
>>>>>> > > >> solution
>>>>>> > > >> > >> has the downside though. Authors of new sources would
>>>>>> have to do
>>>>>> > > two
>>>>>> > > >> > >> things: extend from AlignedSplitReader and implement
>>>>>> > > >> > >>
>>>>>> > > >> > >> WithSplitsAssignment,
>>>>>> > > >> > >>
>>>>>> > > >> > >> instead of just extending AlignedSplitReader. I would be
>>>>>> fine
>>>>>> > with
>>>>>> > > >> such
>>>>>> > > >> > a
>>>>>> > > >> > >> tradeoff though. What others think?
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Steven:
>>>>>> > > >> > >>
>>>>>> > > >> > >> For this part from the motivation section, is it
>>>>>> accurate? Let's
>>>>>> > > >> assume
>>>>>> > > >> > >>
>>>>>> > > >> > >> one
>>>>>> > > >> > >>
>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
>>>>>> partition
>>>>>> > is
>>>>>> > > >> > >> significantly slower. In this situation, watermark for
>>>>>> this
>>>>>> > source
>>>>>> > > >> task
>>>>>> > > >> > >> won't hold back as it is reading recent data from other
>>>>>> two Kafka
>>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>>>>> > watermark.
>>>>>> > > I
>>>>>> > > >> > >> thought the problem is that we may have late data for
>>>>>> this slow
>>>>>> > > >> > >>
>>>>>> > > >> > >> partition.
>>>>>> > > >> > >>
>>>>>> > > >> > >> It will hold back the watermark. Watermark of an operator
>>>>>> is the
>>>>>> > > >> minimum
>>>>>> > > >> > >> of watermarks of all splits[1]
>>>>>> > > >> > >>
>>>>>> > > >> > >> I have another question about the restart. Say split
>>>>>> alignment is
>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>>>>>> restored from
>>>>>> > > the
>>>>>> > > >> > last
>>>>>> > > >> > >> checkpoint. because alignment decision is not
>>>>>> checkpointed,
>>>>>> > > initially
>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
>>>>>> watermark
>>>>>> > > >> > aggregation
>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
>>>>>> problem. Just
>>>>>> > > >> want
>>>>>> > > >> > to
>>>>>> > > >> > >> understand it more.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Your understanding is correct.
>>>>>> > > >> > >>
>>>>>> > > >> > >> @Becket:
>>>>>> > > >> > >>
>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>>>>> case, so
>>>>>> > > >> should
>>>>>> > > >> > >>
>>>>>> > > >> > >> we
>>>>>> > > >> > >>
>>>>>> > > >> > >> just add the related methods to SourceReader directly
>>>>>> instead of
>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment? We
>>>>>> can
>>>>>> > > provide
>>>>>> > > >> > >> default implementations, so backwards compatibility won't
>>>>>> be an
>>>>>> > > >> issue.
>>>>>> > > >> > >>
>>>>>> > > >> > >> I don't think we can provide a default implementation.
>>>>>> How would
>>>>>> > we
>>>>>> > > >> do
>>>>>> > > >> > >> that? Would it be just a no-op? Is it better than having
>>>>>> an
>>>>>> > opt-in
>>>>>> > > >> > >> interface? The default implementation would have to be
>>>>>> added
>>>>>> > > >> exclusively
>>>>>> > > >> > >>
>>>>>> > > >> > >> in
>>>>>> > > >> > >>
>>>>>> > > >> > >> a *Public* SourceReader interface. By the way notice
>>>>>> > > SourceReaderBase
>>>>>> > > >> > >> does extend from WithSplitsAlignment, so effectively all
>>>>>> > > >> implementations
>>>>>> > > >> > >>
>>>>>> > > >> > >> do
>>>>>> > > >> > >>
>>>>>> > > >> > >> handle the alignment case. To be honest I think it is
>>>>>> impossible
>>>>>> > to
>>>>>> > > >> > >> implement the SourceReader interface directly by end
>>>>>> users.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
>>>>>> also
>>>>>> > needs
>>>>>> > > >> some
>>>>>> > > >> > >> change to support throttling at the split granularity.
>>>>>> Can you
>>>>>> > add
>>>>>> > > >> that
>>>>>> > > >> > >> interface change into the public interface section as
>>>>>> well?
>>>>>> > > >> > >>
>>>>>> > > >> > >> It has been added from the beginning. See
>>>>>> *AlignedSplitReader.*
>>>>>> > > >> > >>
>>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
>>>>>> here,
>>>>>> > given
>>>>>> > > >> that
>>>>>> > > >> > >>
>>>>>> > > >> > >> it
>>>>>> > > >> > >>
>>>>>> > > >> > >> is not actually changing the split assignments? It seems
>>>>>> > something
>>>>>> > > >> like
>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>>>>>> > > accurate.
>>>>>> > > >> > >>
>>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do you
>>>>>> still
>>>>>> > prefer
>>>>>> > > a
>>>>>> > > >> > >> different name for that? Personally, I am open for
>>>>>> suggestions
>>>>>> > > here.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Dawid
>>>>>> > > >> > >>
>>>>>> > > >> > >> [1]
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> >
>>>>>> > > >>
>>>>>> > >
>>>>>> >
>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>>>>>> > > >> > >>
>>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think the
>>>>>> motivation
>>>>>> > > >> makes a
>>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>>>>> case, so
>>>>>> > > >> should
>>>>>> > > >> > >>
>>>>>> > > >> > >> we
>>>>>> > > >> > >>
>>>>>> > > >> > >> just add the related methods to SourceReader directly
>>>>>> instead of
>>>>>> > > >> > >> introducing the new interface of WithSplitAssignment? We
>>>>>> can
>>>>>> > > provide
>>>>>> > > >> > >> default implementations, so backwards compatibility won't
>>>>>> be an
>>>>>> > > >> issue.
>>>>>> > > >> > >>
>>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
>>>>>> also
>>>>>> > needs
>>>>>> > > >> some
>>>>>> > > >> > >> change to support throttling at the split granularity.
>>>>>> Can you
>>>>>> > add
>>>>>> > > >> that
>>>>>> > > >> > >> interface change into the public interface section as
>>>>>> well?
>>>>>> > > >> > >>
>>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
>>>>>> here,
>>>>>> > given
>>>>>> > > >> that
>>>>>> > > >> > >>
>>>>>> > > >> > >> it
>>>>>> > > >> > >>
>>>>>> > > >> > >> is not actually changing the split assignments? It seems
>>>>>> > something
>>>>>> > > >> like
>>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>>>>>> > > accurate.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
>>>>>> stevenz3wu@gmail.com
>>>>>> > >
>>>>>> > > <
>>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
>>>>>> stevenz3wu@gmail.com>
>>>>>> > <
>>>>>> > > >> > >> stevenz3wu@gmail.com> <
>>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> However, a single source operator may read data from
>>>>>> multiple
>>>>>> > > >> > >>
>>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions, such
>>>>>> that
>>>>>> > even
>>>>>> > > >> with
>>>>>> > > >> > >> watermark alignment the source operator may need to buffer
>>>>>> > > excessive
>>>>>> > > >> > >>
>>>>>> > > >> > >> amount
>>>>>> > > >> > >>
>>>>>> > > >> > >> of data if one split emits data faster than another.
>>>>>> > > >> > >>
>>>>>> > > >> > >> For this part from the motivation section, is it
>>>>>> accurate? Let's
>>>>>> > > >> assume
>>>>>> > > >> > >>
>>>>>> > > >> > >> one
>>>>>> > > >> > >>
>>>>>> > > >> > >> source task consumes from 3 partitions and one of the
>>>>>> partition
>>>>>> > is
>>>>>> > > >> > >> significantly slower. In this situation, watermark for
>>>>>> this
>>>>>> > source
>>>>>> > > >> task
>>>>>> > > >> > >> won't hold back as it is reading recent data from other
>>>>>> two Kafka
>>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>>>>> > watermark.
>>>>>> > > I
>>>>>> > > >> > >> thought the problem is that we may have late data for
>>>>>> this slow
>>>>>> > > >> > >>
>>>>>> > > >> > >> partition.
>>>>>> > > >> > >>
>>>>>> > > >> > >> I have another question about the restart. Say split
>>>>>> alignment is
>>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>>>>>> restored from
>>>>>> > > the
>>>>>> > > >> > last
>>>>>> > > >> > >> checkpoint. because alignment decision is not
>>>>>> checkpointed,
>>>>>> > > initially
>>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
>>>>>> watermark
>>>>>> > > >> > aggregation
>>>>>> > > >> > >> and propagation, right? Not saying this corner is a
>>>>>> problem. Just
>>>>>> > > >> want
>>>>>> > > >> > to
>>>>>> > > >> > >> understand it more.
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
>>>>>> thw@apache.org> <
>>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>>>>>> > > >> > >> thw@apache.org> <
>>>>>> > > >> > >> thw@apache.org> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks for working on this!
>>>>>> > > >> > >>
>>>>>> > > >> > >> I wonder if "supporting" split alignment in
>>>>>> SourceReaderBase and
>>>>>> > > then
>>>>>> > > >> > >>
>>>>>> > > >> > >> doing
>>>>>> > > >> > >>
>>>>>> > > >> > >> nothing if the split reader does not implement
>>>>>> AlignedSplitReader
>>>>>> > > >> could
>>>>>> > > >> > >>
>>>>>> > > >> > >> be
>>>>>> > > >> > >>
>>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
>>>>>> added to
>>>>>> > the
>>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>>>>>> > explicit
>>>>>> > > >> that
>>>>>> > > >> > >> the source actually supports it.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Thanks,
>>>>>> > > >> > >> Thomas
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
>>>>>> > > knaufk@apache.org>
>>>>>> > > >> <
>>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
>>>>>> > > >> > >> knaufk@apache.org> <
>>>>>> > > >> > >> knaufk@apache.org>
>>>>>> > > >> > >>
>>>>>> > > >> > >> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
>>>>>> > > >> > >>
>>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>>>>> (aka the
>>>>>> > > >> stop
>>>>>> > > >> > >>
>>>>>> > > >> > >> &
>>>>>> > > >> > >>
>>>>>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>>>>>> only,
>>>>>> > > >> > >>
>>>>>> > > >> > >> correct?
>>>>>> > > >> > >>
>>>>>> > > >> > >> +1 in general. I believe it is valuable to complete the
>>>>>> watermark
>>>>>> > > >> > >>
>>>>>> > > >> > >> aligned
>>>>>> > > >> > >>
>>>>>> > > >> > >> story with this FLIP.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Cheers,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Konstantin
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
>>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>>> > > >> > >>
>>>>>> > > >> > >> wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >> To be explicit, having worked on it, I support it ;) I
>>>>>> think we
>>>>>> > can
>>>>>> > > >> > >> start a vote thread soonish, as there are no concerns so
>>>>>> far.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >>
>>>>>> > > >> > >> Dawid
>>>>>> > > >> > >>
>>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>>>>>> > > >> > >>
>>>>>> > > >> > >> Dear Flink developers,
>>>>>> > > >> > >>
>>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1] for an
>>>>>> > extension
>>>>>> > > >> > >>
>>>>>> > > >> > >> of
>>>>>> > > >> > >>
>>>>>> > > >> > >> Watermark Alignment to perform alignment also in
>>>>>> SplitReaders. To
>>>>>> > > >> > >>
>>>>>> > > >> > >> do
>>>>>> > > >> > >>
>>>>>> > > >> > >> so,
>>>>>> > > >> > >>
>>>>>> > > >> > >> SplitReaders must be able to suspend and resume reading
>>>>>> from
>>>>>> > split
>>>>>> > > >> > >>
>>>>>> > > >> > >> sources
>>>>>> > > >> > >>
>>>>>> > > >> > >> where the SourceOperator coordinates and controlls
>>>>>> suspend and
>>>>>> > > >> > >>
>>>>>> > > >> > >> resume.
>>>>>> > > >> > >>
>>>>>> > > >> > >> To
>>>>>> > > >> > >>
>>>>>> > > >> > >> gather information about current watermarks of the
>>>>>> SplitReaders,
>>>>>> > we
>>>>>> > > >> > >>
>>>>>> > > >> > >> extend
>>>>>> > > >> > >>
>>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
>>>>>> watermarks to
>>>>>> > > >> > >>
>>>>>> > > >> > >> the
>>>>>> > > >> > >>
>>>>>> > > >> > >> SourceOperator.
>>>>>> > > >> > >>
>>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid
>>>>>> Heise and
>>>>>> > > >> > >>
>>>>>> > > >> > >> revised
>>>>>> > > >> > >>
>>>>>> > > >> > >> and
>>>>>> > > >> > >>
>>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the work.)
>>>>>> and me.
>>>>>> > The
>>>>>> > > >> > >>
>>>>>> > > >> > >> changes
>>>>>> > > >> > >>
>>>>>> > > >> > >> are backwards compatible in a way that if affected
>>>>>> components do
>>>>>> > > >> > >>
>>>>>> > > >> > >> not
>>>>>> > > >> > >>
>>>>>> > > >> > >> support split alignment the behavior is as before.
>>>>>> > > >> > >>
>>>>>> > > >> > >> Best,
>>>>>> > > >> > >> Sebastian
>>>>>> > > >> > >>
>>>>>> > > >> > >> [1]
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> >
>>>>>> > > >>
>>>>>> > >
>>>>>> >
>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>>>>> > > >> > >>
>>>>>> > > >> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
>>>>>> > > >> > >>
>>>>>> > > >> > >> --
>>>>>> > > >> > >>
>>>>>> > > >> > >> Konstantin Knaufhttps://
>>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>>>>>> > > >> > >>
>>>>>> > > >> > >>
>>>>>> > > >> >
>>>>>> > > >>
>>>>>> > > >
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
+dev

Hi Sebastian,

Thank you for the summary. Please see the detailed replies inline. As a
recap of my suggestions.

1. Pausable splits API.
  a) Add default implementations to methods "pauseOrResumeSplits" in both
SourceReader and SplitReader where both default implementations throw
 UnsupportedOperationException.

2. User story.
    a) We tell users to enable the watermark alignment as they like. This
is exactly what the current Flink API is.
    b) We tell the source developers, please implement pausable splits,
otherwise bad things may happen. Think of it like you are expected to
implement SourceReader#snapshotState() properly, otherwise exceptions will
be thrown when users enable checkpointing.

Thanks,

Jiangjie (Becket) Qin

On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <se...@ververica.com>
wrote:

> Hi Becket, Hi everybody,
>
> I'm sorry if I misread the messages but I could not derive an agreement
> from the mailing list. Nevertheless, if I understand you right the
> suggestion is:
>
> * Add default implementations to methods "pauseOrResumeSplits" in both
> SourceReader and SplitReader where both default implementations throw
> UnsupportedOperationException.
>
Yes.

* Add "supportsPauseOrResumeSplits" to the Source interface. (In the
> following, I refer to supporting this as "pausable splits".)
>
We may no longer need this if pausable splits are expected to be
implemented by the source developers, i.e. non-optional. Having this method
would then be somewhat misleading as it looks like the sources that do not
support pausable splits are also acceptable in the long term. So API wise,
I'd say maybe we should remove this for this FLIP, although I believe this
supportXXX pattern itself is still attractive for optional features.


>
> To make the conclusions explicit:
>
> 1. The implementation of pauseOrResumeSplits in both interfaces
> SourceReader and SplitReader are optional where the default is that it
> doesn't support it. (--> This means that the implementation is still
> optional for the source developer.)
>
It is optional for backwards compatibility with existing sources, as they
may still compile without code change. But starting from this FLIP, Flink
will always optimistically assume that all the sources support pausable
splits. If a source does not support pausable splits, it goes to an error
handling path when watermark alignment is enabled on it. This is different
from a usual optional feature, where no error is expected.


> 2. If watermark alignment is enabled in the application code by adding
> withWatermarkAlignment to the WatermarkStrategy while SourceReader or
> SplitReader do not support pausableSplits, we throw an
> UnsupportedOperationException.
>
Yes.


> 3. With regard to your statement:
>
>> [...] basically means watermark alignment is an non-optional feature to
>> the end users.
>
> You actually mean that "pausable splits" are non-optional for the app
> developer if watermark alignment is enabled. However, watermark alignment
> is optional and can be enabled/disabled.
>
Yes, watermark alignment can be enabled/disabled in individual sources in
Flink jobs, which basically means the code supporting watermark alignment
has to already be there. That again means the Source developers are also
expected to support pausable splits by default. So this way we essentially
tell the end users that you may enable / disable this feature as you wish,
and tell the source developers that you SHOULD implement this because the
end users may turn it on/off at will. And if the source does not support
pausable splits, that goes to an error handling path when watermark
alignment is enabled on it. So users know they have to explicitly exclude
this source.


>
> So far it's totally clear to me and I hope this is what you mean. I also
> agree with both statements:
>
> So making that expectation aligned with the source developers seems
>> reasonable.
>>
>
> I think this is a simple and clean solution from both the end user and
>> source developers' standpoint.
>>
>
> However, a last conclusion derives from 3. and is an open question for me:
>
> 4. The feature of "pausable splits" is now tightly bound to watermark
> alignment, i.e., if sources do not support "pausable splits" one can not
> enable watermark alignment for these sources. This dependency is not the
> current status of watermark alignment implementation because it is/was
> implemented without pausable splits. Do we want to introduce this
> dependency? (This is an open question. I cannot judge that.)
>
The watermark alignment basically relies on the pausable splits, right? So
personally I found it quite reasonable that if the source does not support
pausable splits, end users cannot enable watermark alignment on it.


> If something is wrong, please correct me.
>
> Regards,
> Sebastian
>
> On Wed, May 11, 2022 at 9:05 AM Becket Qin <be...@gmail.com> wrote:
>
>> Hi Sebastian,
>>
>> Thanks for the reply and patient discussion. I agree this is a tricky
>> decision.
>>
>>
>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
>>> follows:
>>> (1) An interface with default NOOP implementation makes the
>>> implementation optional. And in my opinion, a default implementation is and
>>> will remain a way of making implementation optional because even in future
>>> a developer can decide to implement the "old flavor" without support for
>>> pausable splits.
>>> (2) It may not be too critical but I also find it suboptimal that with a
>>> NOOP default implementation there is no way to check at runtime if
>>> SourceReader or SplitReader actually support pausing. (To do so, one would
>>> need a supportsX method which makes it again more complicated.)\
>>
>>
>> Based on the last few messages in the mailing list.  Piotr and I agreed
>> that the default implementation should just throw an
>> UnsupportedOperationException if the source is unpausable. So this
>> basically tells the Source developers that this feature is expected to be
>> supported. Because we cannot prevent end users from putting an unpausable
>> source into the watermark alignment group, that basically means watermark
>> alignment is an non-optional feature to the end users. So making that
>> expectation aligned with the source developers seems reasonable.  And if a
>> source does not support this feature, the end users should explicitly
>> remove that source from the watermark alignment group.
>>
>> Personally speaking I think this is a simple and clean solution from both
>> the end user and source developers' standpoint.
>>
>> Does this address your concerns?
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <
>> sebastian@ververica.com> wrote:
>>
>>> Hi Piotr, Hi Becket, Hi everybody,
>>>
>>> we, Dawid and I, discussed the various suggestions/options and we would
>>> be okay either way because we find neither solution is perfect just because
>>> of the already present complexity.
>>>
>>> Option c) Adding methods to the interfaces of SourceReader and
>>> SplitReader
>>> Option a) Adding decorative interfaces to be used by SourceReader and
>>> SplitReader
>>>
>>> As of the current status (v. 12) of the FLIP [1], it is based on Option
>>> c) which we find acceptable because the complexity added is only a single
>>> method.
>>>
>>> Nevertheless, Piotr has valid concerns about Option c) which I see as
>>> follows:
>>> (1) An interface with default NOOP implementation makes the
>>> implementation optional. And in my opinion, a default implementation is and
>>> will remain a way of making implementation optional because even in future
>>> a developer can decide to implement the "old flavor" without support for
>>> pausable splits.
>>> (2) It may not be too critical but I also find it suboptimal that with a
>>> NOOP default implementation there is no way to check at runtime if
>>> SourceReader or SplitReader actually support pausing. (To do so, one would
>>> need a supportsX method which makes it again more complicated.)
>>>
>>> However, we haven't changed it because Option a) is also not optimal or
>>> straight-forward:
>>> (1) We need to add two distinct yet similar decorative interfaces since,
>>> as mentioned, the signatures of the methods are different. For example, we
>>> would need decorative interfaces like `SplitReaderWithPausableSplits` and
>>> `SourceReaderWithPausableSplits`.
>>> (2) As a consequence, we would need to somehow document how/where to
>>> implement both interfaces and how this relates to each other. This we could
>>> solve by adding a note in the interface of SourceReader and SplitReader and
>>> reference to the decorative interfaces but it still increases complexity
>>> too.
>>>
>>> In summary, we see both as acceptable and preferred over other options.
>>> The question is if we can find a solution or compromise that is acceptable
>>> for everybody to reach consensus.
>>>
>>> Please let us know what you think because we would be happy if we can
>>> conclude the discussion to avoid dropping the initiative on this FLIP.
>>>
>>> Regards,
>>> Sebastian
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
>>> (v. 12)
>>>
>>> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <pn...@apache.org>
>>> wrote:
>>>
>>>> Hi Guowei,
>>>>
>>>> as Dawid wrote a couple of messages back:
>>>>
>>>> > This is covered in the previous FLIP[1] which has been already
>>>> implemented in 1.15. In short, it must be enabled with the watermark
>>>> strategy which also configures drift and update interval
>>>>
>>>> So by default watermark alignment is disabled, regardless if a source
>>>> supports it or not.
>>>>
>>>> Best,
>>>> Piotrek
>>>>
>>>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com> napisał(a):
>>>>
>>>>> Hi,
>>>>>
>>>>> We know that in the case of Bounded input Flink supports the Batch
>>>>> execution mode. Currently in Batch execution mode, flink is executed
>>>>> on a
>>>>> stage-by-stage basis. In this way, perhaps watermark alignment might
>>>>> not
>>>>> gain much.
>>>>>
>>>>> So my question is: Is watermark alignment the default behavior(for
>>>>> implemented source only)? If so, have you considered evaluating the
>>>>> impact
>>>>> of this behavior on the Batch execution mode? Or thinks it is not
>>>>> necessary.
>>>>>
>>>>> Correct me if I miss something.
>>>>>
>>>>> Best,
>>>>> Guowei
>>>>>
>>>>>
>>>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <
>>>>> piotr.nowojski@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi Becket and Dawid,
>>>>> >
>>>>> > > I feel that no matter which option we choose this can not be solved
>>>>> > entirely in either of the options, because of the point above and
>>>>> because
>>>>> > the signature of SplitReader#pauseOrResumeSplits and
>>>>> > SourceReader#pauseOrResumeSplits are slightly different (one
>>>>> identifies
>>>>> > splits with splitId the other one passes the splits directly).
>>>>> >
>>>>> > Yes, that's a good point in this case and for features that need to
>>>>> be
>>>>> > implemented in more than one place.
>>>>> >
>>>>> > > Is there any reason for pausing reading from a split an optional
>>>>> feature,
>>>>> > > other than that this was not included in the original interface?
>>>>> >
>>>>> > An additional argument in favor of making it optional is to simplify
>>>>> source
>>>>> > implementation. But on its own I'm not sure if that would be enough
>>>>> to
>>>>> > justify making this feature optional. Maybe.
>>>>> >
>>>>> > > I think it would be way simpler and clearer to just let end users
>>>>> and
>>>>> > Flink
>>>>> > > assume all the connectors will implement this feature.
>>>>> >
>>>>> > As I wrote above that would be an interesting choice to make (ease of
>>>>> > implementation for new users, vs system consistency). Regardless of
>>>>> that,
>>>>> > yes, for me the main argument is the API backward compatibility. But
>>>>> let's
>>>>> > clear a couple of points:
>>>>> > - The current proposal adding methods to the base interface with
>>>>> default
>>>>> > implementations is an OPTIONAL feature. Same as the decorative
>>>>> version
>>>>> > would be.
>>>>> > - Decorative version could implement "throw
>>>>> UnsupportedOperationException"
>>>>> > if user enabled watermark alignment just as well and I agree that's a
>>>>> > better option compared to logging a warning.
>>>>> >
>>>>> > Best,
>>>>> > Piotrek
>>>>> >
>>>>> >
>>>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com>
>>>>> napisał(a):
>>>>> >
>>>>> > > Thanks for the reply and patient discussion, Piotr and Dawid.
>>>>> > >
>>>>> > > Is there any reason for pausing reading from a split an optional
>>>>> feature,
>>>>> > > other than that this was not included in the original interface?
>>>>> > >
>>>>> > > To be honest I am really worried about the complexity of the user
>>>>> story
>>>>> > > here. Optional features like this have a high overhead. Imagine
>>>>> this
>>>>> > > feature is optional, now a user enabled watermark alignment and
>>>>> defined a
>>>>> > > few watermark groups. Would it work? Hmm, that depends on whether
>>>>> the
>>>>> > > involved Source has implmemented this feature. If the Sources are
>>>>> well
>>>>> > > documented, good luck. Otherwise end users may have to look into
>>>>> the code
>>>>> > > of the Source to see whether the feature is supported. Which is
>>>>> something
>>>>> > > they shouldn't have to do.
>>>>> > >
>>>>> > > I think it would be way simpler and clearer to just let end users
>>>>> and
>>>>> > Flink
>>>>> > > assume all the connectors will implement this feature. After all
>>>>> the
>>>>> > > watermark group is not optinoal to the end users. If in some rare
>>>>> cases,
>>>>> > > the feature cannot be supported, a clear
>>>>> UnsupportedOperationException
>>>>> > will
>>>>> > > be thrown to tell users to explicitly remove this Source from the
>>>>> > watermark
>>>>> > > group. I don't think we should have a warning message here, as
>>>>> they tend
>>>>> > to
>>>>> > > be ignored in many cases. If we do this, we don't even need the
>>>>> > supportXXX
>>>>> > > method in the Source for this feature. In fact this is exactly how
>>>>> many
>>>>> > > interfaces works today. For example,
>>>>> SplitEnumerator#addSplitsBack() is
>>>>> > not
>>>>> > > supported by Pravega source because it does not support partial
>>>>> failover.
>>>>> > > In that case, it simply throws an exception to trigger a global
>>>>> recovery.
>>>>> > >
>>>>> > > The reason we add a default implementation in this case would just
>>>>> for
>>>>> > the
>>>>> > > sake of backwards compatibility so the old source can still
>>>>> compile.
>>>>> > Sure,
>>>>> > > in short term, this feature might not be supported by many existing
>>>>> > > sources. That is OK, and it is quite visible to the source
>>>>> developers
>>>>> > that
>>>>> > > they did not override the default impl which throws an
>>>>> > > UnsupportedOperationException.
>>>>> > >
>>>>> > > @Dawid,
>>>>> > >
>>>>> > > the Java doc of the SupportXXX() method in the Source would be the
>>>>> single
>>>>> > > >> source of truth regarding how to implement this feature.
>>>>> > > >
>>>>> > > >
>>>>> > >
>>>>> > > I also don't find it entirely true. Half of the classes are
>>>>> theoretically
>>>>> > > > optional and are utility classes from the point of view how the
>>>>> > > interfaces
>>>>> > > > are organized. Theoretically users do not need to use any of
>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>>>> methods in
>>>>> > > the
>>>>> > > > Source interface.
>>>>> > >
>>>>> > > I think the ultimate goal of java docs is to guide users to
>>>>> implement the
>>>>> > > Source. If SourceReaderBase is the preferred way to implement a
>>>>> > > SourceReader, it seems worth mentioning that. Even the Java
>>>>> language
>>>>> > > documentation interfaces lists the konwn implementations [1] so
>>>>> people
>>>>> > can
>>>>> > > leverage them. But for this particular case, if we make the feature
>>>>> > > non-optional, we don't even need the supportXXX() method for now.
>>>>> > >
>>>>> > > Thanks,
>>>>> > >
>>>>> > > Jiangjie (Becket) Qin
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>>>>> dwysakowicz@apache.org>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > Hey Piotr and Becket,
>>>>> > > >
>>>>> > > > First of all, let me say I am happy with whichever option is
>>>>> agreed in
>>>>> > > the
>>>>> > > > discussion.
>>>>> > > >
>>>>> > > > I wanted to clarify a few points from the discussion though:
>>>>> > > >
>>>>> > > > @Becket:
>>>>> > > >
>>>>> > > > The main argument for adding the methods to the SourceReader is
>>>>> that
>>>>> > > these
>>>>> > > > methods are effectively NON-OPTIONAL to SourceReader impl, i.e.
>>>>> > starting
>>>>> > > > from this FLIP, all the SourceReaders impl are expected to
>>>>> support this
>>>>> > > > method, although some old implementations may not have
>>>>> implemented this
>>>>> > > > feature. I think we should distinguish the new features from the
>>>>> > optional
>>>>> > > > features. While the public decorative interface is a solution to
>>>>> the
>>>>> > > > optional features. We should not use it for the features that are
>>>>> > > > non-optional.
>>>>> > > >
>>>>> > > > I don't think that this feature is NON-OPTIONAL. Even though
>>>>> > preferred, I
>>>>> > > > still think it can be simply optional.
>>>>> > > >
>>>>> > > > the Java doc of the SupportXXX() method in the Source would be
>>>>> the
>>>>> > single
>>>>> > > > source of truth regarding how to implement this feature.
>>>>> > > >
>>>>> > > > I also don't find it entirely true. Half of the classes are
>>>>> > theoretically
>>>>> > > > optional and are utility classes from the point of view how the
>>>>> > > interfaces
>>>>> > > > are organized. Theoretically users do not need to use any of
>>>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>>>> methods in
>>>>> > > the
>>>>> > > > Source interface.
>>>>> > > >
>>>>> > > > @Piotr
>>>>> > > >
>>>>> > > > If we have all of the methods with default implementation in the
>>>>> base
>>>>> > > > interface, the API doesn't give any clue to the user which set of
>>>>> > methods
>>>>> > > > are required to be implemented at the same time.
>>>>> > > >
>>>>> > > > I feel that no matter which option we choose this can not be
>>>>> solved
>>>>> > > > entirely in either of the options, because of the point above and
>>>>> > because
>>>>> > > > the signature of SplitReader#pauseOrResumeSplits and
>>>>> > > > SourceReader#pauseOrResumeSplits are slightly different (one
>>>>> identifies
>>>>> > > > splits with splitId the other one passes the splits directly).
>>>>> > > >
>>>>> > > > Best,
>>>>> > > >
>>>>> > > > Dawid
>>>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>>>>> > > >
>>>>> > > > Hi Piotr,
>>>>> > > >
>>>>> > > > Thanks for the comment.
>>>>> > > >
>>>>> > > > Just to clarify, I am not against the decorative interfaces, but
>>>>> I do
>>>>> > > > think we should use them with caution. The main argument for
>>>>> adding the
>>>>> > > > methods to the SourceReader is that these methods are
>>>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting
>>>>> from this
>>>>> > > > FLIP, all the SourceReaders impl are expected to support this
>>>>> > > > method, although some old implementations may not have
>>>>> implemented this
>>>>> > > > feature. I think we should distinguish the new features from the
>>>>> > optional
>>>>> > > > features. While the public decorative interface is a solution to
>>>>> the
>>>>> > > > optional features. We should not use it for the features that are
>>>>> > > > non-optional.
>>>>> > > >
>>>>> > > > That said, this feature is optional for SplitReaders. Arguably
>>>>> we can
>>>>> > > have
>>>>> > > > a decorative interface for that, but for simplicity and symmetry
>>>>> of the
>>>>> > > > interface, personally I prefer just adding a new method.
>>>>> > > >
>>>>> > > > Regarding the advantages you mentioned about the decorative
>>>>> interfaces,
>>>>> > > > they would make sense if:
>>>>> > > > 1. The feature is optional.
>>>>> > > > 2. There is only one decorative interface involved for a feature.
>>>>> > > > Otherwise the argument that all the methods are grouped together
>>>>> will
>>>>> > not
>>>>> > > > stand.
>>>>> > > >
>>>>> > > > Compared with that, I think the current solution works fine in
>>>>> all
>>>>> > cases,
>>>>> > > > i.e. "having supportXXX() method in Source, and default methods /
>>>>> > > > decorative interfaces in base interfaces.".
>>>>> > > >
>>>>> > > > The advantages are:
>>>>> > > >> - clean and easy to implement base interface
>>>>> > > >
>>>>> > > > In the current approach, the Java doc of the SupportXXX() method
>>>>> in the
>>>>> > > > Source would be the single source of truth regarding how to
>>>>> implement
>>>>> > > this
>>>>> > > > feature. It lists the method that has to be implemented to
>>>>> support this
>>>>> > > > feature, regardless of how many classes / interfaces are
>>>>> involved.
>>>>> > > >
>>>>> > > > When implementing the base interface, users do not need to
>>>>> implement a
>>>>> > > > method with default implementation. If they are curious what the
>>>>> method
>>>>> > > is
>>>>> > > > for, the java doc of that method simply points users to the
>>>>> > SupportXXX()
>>>>> > > > method in the Source. It seems not adding work to the users
>>>>> compared
>>>>> > with
>>>>> > > > decorative interfaces, but gives much better discoverability.
>>>>> > > >
>>>>> > > > - all of the methods from a single feature are grouped in a
>>>>> single
>>>>> > > >> decorator interface, together with their dedicated java doc.
>>>>> It's also
>>>>> > > >> easier to google search for help using the decorator name
>>>>> > > >
>>>>> > > > - if an optional feature requires two methods to be implemented
>>>>> at
>>>>> > once,
>>>>> > > >> decorator can guarantee that
>>>>> > > >
>>>>> > > > These two points are not true when multiple components and
>>>>> classes are
>>>>> > > > involved collaboratively to provide a feature. In our case, we
>>>>> have
>>>>> > both
>>>>> > > > SourceReader and SplitReader involved. And there might be other
>>>>> > > interfaces
>>>>> > > > on the JM side involved for some future features. So the relevant
>>>>> > methods
>>>>> > > > can actually be scattered over the places. That said, we may
>>>>> still use
>>>>> > > > decorative interfaces for each component, if the feature is
>>>>> optional,
>>>>> > > given
>>>>> > > > there is a single source of truth for the feature.
>>>>> > > >
>>>>> > > > Here I would strongly lean towards making life easier for new
>>>>> users,
>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for the
>>>>> power
>>>>> > > >> users.
>>>>> > > >
>>>>> > > > I actually think the current approach is simpler, more
>>>>> extensible and
>>>>> > > more
>>>>> > > > general for all the users. Can you articulate a bit more on
>>>>> which part
>>>>> > > you
>>>>> > > > think makes users harder to understand?
>>>>> > > >
>>>>> > > >
>>>>> > > > There is another benefit of the decorative interfaces which is
>>>>> not
>>>>> > > > mentioned, but might be worth considering here. Usually the
>>>>> decorative
>>>>> > > > interfaces give slightly better backwards compatibility than the
>>>>> new
>>>>> > > > default method in the interfaces. That is when users are using a
>>>>> jar
>>>>> > that
>>>>> > > > was compiled with an older version of Flink which does not have
>>>>> the
>>>>> > > default
>>>>> > > > method in the interfaces in question. A decorative interface may
>>>>> still
>>>>> > > > provide backwards compatibility in that case, while default
>>>>> method impl
>>>>> > > > cannot.
>>>>> > > >
>>>>> > > > I think in Flink we in general do not guarantee custom components
>>>>> > > compiled
>>>>> > > > with an older version can run with a newer version of Flink. A
>>>>> > recompile
>>>>> > > > with a newer version would be required. That said, if we do care
>>>>> about
>>>>> > > > this, we can just change the "supportXXX()" method in the Source
>>>>> > > interface
>>>>> > > > to use decorative interfaces, and leave the other parts
>>>>> unchanged.
>>>>> > > >
>>>>> > > > Thanks,
>>>>> > > >
>>>>> > > > Jiangjie (Becket) Qin
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>>>>> pnowojski@apache.org>
>>>>> > > > wrote:
>>>>> > > >
>>>>> > > >> Hi,
>>>>> > > >>
>>>>> > > >> Sorry for chipping in so late, but I was OoO for the last two
>>>>> weeks.
>>>>> > > >> Regarding the interfaces, I would be actually against adding
>>>>> those
>>>>> > > methods
>>>>> > > >> to the base interfaces for the reasons mentioned above.
>>>>> Clogging the
>>>>> > > base
>>>>> > > >> interface for new users with tons of methods that they do not
>>>>> need, do
>>>>> > > not
>>>>> > > >> understand and do not know what to do with them. Moreover, such
>>>>> > > decorative
>>>>> > > >> interfaces are solving a problem if a feature requires two or
>>>>> more
>>>>> > > methods
>>>>> > > >> to be implemented at the same time. If we have all of the
>>>>> methods with
>>>>> > > >> default implementation in the base interface, the API doesn't
>>>>> give any
>>>>> > > >> clue
>>>>> > > >> to the user which set of methods are required to be implemented
>>>>> at the
>>>>> > > >> same
>>>>> > > >> time.
>>>>> > > >>
>>>>> > > >> > a) I feel the biggest drawback of decorative interfaces is
>>>>> which
>>>>> > > >> interface
>>>>> > > >> > they can decorate and which combinations of multiple
>>>>> decorative
>>>>> > > >> interfaces
>>>>> > > >> > are valid. (...)
>>>>> > > >> > In the future, if there is a new feature added
>>>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we going to
>>>>> create
>>>>> > > >> another
>>>>> > > >> > interface of SplitReader such as SortedSplitReader or
>>>>> > > >> PrePartitionedAware?
>>>>> > > >> > Can they be combined? So I think the additional decorative
>>>>> interface
>>>>> > > >> like
>>>>> > > >> > withSplitsAlignment actually increases the understanding cost
>>>>> of
>>>>> > users
>>>>> > > >> > because they have to know what decorative interfaces are
>>>>> there,
>>>>> > which
>>>>> > > >> > interface they can decorate and which combinations of the
>>>>> decorative
>>>>> > > >> > interfaces are valid and which are not. Ideally we want to
>>>>> avoid
>>>>> > that.
>>>>> > > >>
>>>>> > > >> I'm not sure if I understand how embedding default methods in
>>>>> the base
>>>>> > > >> interface is solving the problem: what can be combined or not?
>>>>> If
>>>>> > there
>>>>> > > >> are
>>>>> > > >> two conflicting features, having decorative interfaces that can
>>>>> not be
>>>>> > > >> mixed together actually makes much more sense to me rather than
>>>>> having
>>>>> > > >> them
>>>>> > > >> all in one base class. How would you allow users to implement
>>>>> only one
>>>>> > > of
>>>>> > > >> those two features?
>>>>> > > >>
>>>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
>>>>> > > >> - how can a user discover what decorators are there?
>>>>> > > >> - how can a user know where the decorator can be applied to?
>>>>> > > >>
>>>>> > > >> However those are drawbacks for more power users, that can be
>>>>> > mitigated
>>>>> > > by
>>>>> > > >> the documentation. For example listing all of the decorators
>>>>> with
>>>>> > > >> detailed explanation both in the docs and in the java docs. More
>>>>> > > >> experienced users will be able to deal with those issues
>>>>> easier, as
>>>>> > they
>>>>> > > >> will already have some basic understanding of Flink. Also if
>>>>> user has
>>>>> > a
>>>>> > > >> problem that he wants to solve, he will google search a
>>>>> potential
>>>>> > > solution
>>>>> > > >> to his problem anyway, and while doing that he is very likely to
>>>>> > > discover
>>>>> > > >> the decorator that he needs anyway in the docs.
>>>>> > > >>
>>>>> > > >> The advantages are:
>>>>> > > >> - clean and easy to implement base interface
>>>>> > > >> - all of the methods from a single feature are grouped in a
>>>>> single
>>>>> > > >> decorator interface, together with their dedicated java doc.
>>>>> It's also
>>>>> > > >> easier to google search for help using the decorator name
>>>>> > > >> - if an optional feature requires two methods to be implemented
>>>>> at
>>>>> > once,
>>>>> > > >> decorator can guarantee that
>>>>> > > >>
>>>>> > > >> Here I would strongly lean towards making life easier for new
>>>>> users,
>>>>> > > >> lowering the entry barrier, at the (imo) slight expense for the
>>>>> power
>>>>> > > >> users.
>>>>> > > >>
>>>>> > > >> Best,
>>>>> > > >> Piotrek
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
>>>>> > napisał(a):
>>>>> > > >>
>>>>> > > >> > Thanks for the reply Sebastian and Dawid,
>>>>> > > >> >
>>>>> > > >> > I think Sebastion has a good summary. This is a really helpful
>>>>> > > >> discussion.
>>>>> > > >> >
>>>>> > > >> > Thinking a bit more, I feel that it might still be better to
>>>>> add the
>>>>> > > >> > supportsXXX() method in the Source rather than SourceReader.
>>>>> > > >> >
>>>>> > > >> > Generally speaking, what we are trying to do here is to let
>>>>> the
>>>>> > Flink
>>>>> > > >> > framework know what the Source is capable of. In this FLIP, it
>>>>> > happens
>>>>> > > >> to
>>>>> > > >> > be the capability that only involves SourceReader. But in the
>>>>> > future,
>>>>> > > >> it is
>>>>> > > >> > possible that another functionality involves both the
>>>>> > SplitEnumerator
>>>>> > > >> and
>>>>> > > >> > SourceReader. In that case, following the current approach, we
>>>>> > should
>>>>> > > >> put
>>>>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
>>>>> SourceReader.
>>>>> > > >> > Because if we only put this in the SourceReader, then the JM
>>>>> would
>>>>> > > have
>>>>> > > >> to
>>>>> > > >> > create a SourceReader in order to know whether this feature is
>>>>> > > >> supported,
>>>>> > > >> > which is a little ugly. But if we put the "supportsXXX()"
>>>>> method in
>>>>> > > the
>>>>> > > >> > Source, we will break the "symmetric" design because this FLIP
>>>>> > chose a
>>>>> > > >> > different way.
>>>>> > > >> >
>>>>> > > >> > This is also why I think supportsXXX() method seems a good
>>>>> thing to
>>>>> > > >> have,
>>>>> > > >> > because when there are a few interfaces / methods that are
>>>>> expected
>>>>> > to
>>>>> > > >> be
>>>>> > > >> > implemented at the same time in order to deliver a feature,
>>>>> it is
>>>>> > > always
>>>>> > > >> > good to have a single source of truth to tell the framework
>>>>> what to
>>>>> > > do,
>>>>> > > >> so
>>>>> > > >> > the framework can do consistent things in different parts.
>>>>> > > >> >
>>>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
>>>>> > > >> >
>>>>> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
>>>>> > > >> > AlignedSplitReader, what I feel awkward about is that we are
>>>>> > > essentially
>>>>> > > >> > expecting almost all the SourceReader implementations to
>>>>> extend
>>>>> > > >> > SourceReaderBase, which effectively makes the SourceReader
>>>>> interface
>>>>> > > >> > without the pausing support useless. So this indicates that
>>>>> public
>>>>> > > >> > decorative interfaces (or sub-interfaces for the same
>>>>> purpose) only
>>>>> > > >> > make sense if the original interface is also expected to be
>>>>> used.
>>>>> > > >> > Otherwise, it seems makes more sense to add the method to the
>>>>> > original
>>>>> > > >> > interface itself.
>>>>> > > >> >
>>>>> > > >> > Cheers,
>>>>> > > >> >
>>>>> > > >> > Jiangjie (Becket) Qin
>>>>> > > >> >
>>>>> > > >> >
>>>>> > > >> >
>>>>> > > >> >
>>>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
>>>>> > > >> dwysakowicz@apache.org>
>>>>> > > >> > wrote:
>>>>> > > >> >
>>>>> > > >> > > Thanks @Sebastian for the nice summary.
>>>>> > > >> > >
>>>>> > > >> > > I think most of your points aligned with the suggestions I
>>>>> made to
>>>>> > > the
>>>>> > > >> > > FLIP, while you were writing your reply (I believe we hit
>>>>> enter
>>>>> > > >> nearly at
>>>>> > > >> > > the same time ;) )
>>>>> > > >> > >
>>>>> > > >> > > Two points after we synced offline
>>>>> > > >> > >
>>>>> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment to
>>>>> > > >> > > supportsPausingSplits to express the general capability of
>>>>> > pausing.
>>>>> > > >> > >
>>>>> > > >> > > 2. As for if we should
>>>>> PausingSourceReader/PausingSplitReader
>>>>> > > (option
>>>>> > > >> b)
>>>>> > > >> > > or if we should just add the methods (option c), I suggest
>>>>> to
>>>>> > simply
>>>>> > > >> add
>>>>> > > >> > > the two methods as I felt this is much preferred approach
>>>>> Becket,
>>>>> > > >> which
>>>>> > > >> > > others do not object. Unless there is an opposition let's
>>>>> go with
>>>>> > > this
>>>>> > > >> > > option c.
>>>>> > > >> > >
>>>>> > > >> > > Best,
>>>>> > > >> > >
>>>>> > > >> > > Dawid
>>>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>>>>> > > >> > >
>>>>> > > >> > > Hi folks,
>>>>> > > >> > >
>>>>> > > >> > > Sorry for being a bit silent. Many thanks for all the input
>>>>> and
>>>>> > > >> > > suggestions. As I'm a bit new, I needed some time to catch
>>>>> up and
>>>>> > > >> > structure
>>>>> > > >> > > (for myself) the discussion and I wanted to find a way to
>>>>> > structure
>>>>> > > >> the
>>>>> > > >> > > conclusions. (Also because I had the feeling that some
>>>>> concerns
>>>>> > got
>>>>> > > >> lost
>>>>> > > >> > in
>>>>> > > >> > > the discussion.) This is my attempt and please correct me if
>>>>> > > >> something is
>>>>> > > >> > > wrong or misunderstood. I tried to collect and assemble the
>>>>> > > opinions,
>>>>> > > >> > > suggestions, and conclusions (to the best of my knowledge):
>>>>> > > >> > >
>>>>> > > >> > > # Top A: Should split alignment (pause/resume behavior) be a
>>>>> > general
>>>>> > > >> > > capability?
>>>>> > > >> > >
>>>>> > > >> > > I personally don't see any reason no to have it a general
>>>>> > capability
>>>>> > > >> > > because for the alignSplit method it is actually
>>>>> independent of
>>>>> > the
>>>>> > > >> > > watermarks. If we agree here to have it a general
>>>>> capability, we
>>>>> > > >> should
>>>>> > > >> > > also agree on the right wording. Does
>>>>> "alignSplits(splitsToResume,
>>>>> > > >> > > splitsToPause)" refer to what is then actually meant? (I
>>>>> see it as
>>>>> > > >> okay.
>>>>> > > >> > I
>>>>> > > >> > > don't have any better idea whilst Arvid suggested
>>>>> > > >> "pauseOrResumeSplits".)
>>>>> > > >> > >
>>>>> > > >> > > # Top B: Should it be possible do enable/disable split
>>>>> alignment?
>>>>> > > >> > >
>>>>> > > >> > > I would personally not disable the split alignment on the
>>>>> source
>>>>> > > >> reader
>>>>> > > >> > > side because if split alignment is used for some other use
>>>>> case
>>>>> > (see
>>>>> > > >> A)
>>>>> > > >> > it
>>>>> > > >> > > could have nasty side effects on other/future use cases.
>>>>> Instead,
>>>>> > I
>>>>> > > >> would
>>>>> > > >> > > disable "watermark split alignment" where I think it should
>>>>> > disable
>>>>> > > >> the
>>>>> > > >> > > watermark-dependent trigger for split alignment.
>>>>> > > >> > >
>>>>> > > >> > > # Top C: Should we add a supportsX method?
>>>>> > > >> > >
>>>>> > > >> > > I find it difficult to define the scope of a supportsX
>>>>> method
>>>>> > w.r.t.
>>>>> > > >> to
>>>>> > > >> > > the following questions: a) Where is it used? and b) What
>>>>> is the
>>>>> > > >> expected
>>>>> > > >> > > output? To b), it's not straight-forward to provide a
>>>>> meaningful
>>>>> > > >> output,
>>>>> > > >> > > e.g., if SourceReader supports split alignment but
>>>>> SplitReader
>>>>> > not.
>>>>> > > >> This
>>>>> > > >> > is
>>>>> > > >> > > because with the current implementation, we can determine
>>>>> whether
>>>>> > > >> split
>>>>> > > >> > > alignment is fully supported only during runtime and
>>>>> specifically
>>>>> > > >> > actually
>>>>> > > >> > > only when calling alignSplits down the call hierarchy up to
>>>>> the
>>>>> > > actual
>>>>> > > >> > > SplitReaders.
>>>>> > > >> > >
>>>>> > > >> > > Therefore, I would suggest to either raise an error or
>>>>> warning if
>>>>> > > the
>>>>> > > >> > > alignment is called but not supported at some point. I know
>>>>> we
>>>>> > > should
>>>>> > > >> > > carefully think about when this could be the case because
>>>>> we don't
>>>>> > > >> want
>>>>> > > >> > to
>>>>> > > >> > > flood anybody with such warnings. However, warnings could
>>>>> be an
>>>>> > > >> indicator
>>>>> > > >> > > for the user that for watermark split alignment use case
>>>>> split
>>>>> > > >> reading is
>>>>> > > >> > > imbalanced with the conclusion to either disable the
>>>>> trigger for
>>>>> > > >> > watermark
>>>>> > > >> > > split alignment (see Top B) or to use/implement a source and
>>>>> > reader
>>>>> > > >> that
>>>>> > > >> > > fully supports split alignment.
>>>>> > > >> > >
>>>>> > > >> > > # Top D: How to design interfaces?
>>>>> > > >> > >
>>>>> > > >> > > Thanks for structuring the discussion with the the various
>>>>> > > >> possibilities
>>>>> > > >> > > (a-d). From the discussion and emails, I would like to
>>>>> summarize
>>>>> > the
>>>>> > > >> > > following requirements:
>>>>> > > >> > > - Interfaces should be consistent ("symmetric"), i.e.,
>>>>> similar
>>>>> > > >> semantics
>>>>> > > >> > > should have similar interfaces with similar usage.
>>>>> > > >> > > - Make explicit which implementations implement
>>>>> interfaces/support
>>>>> > > >> > > behavior.
>>>>> > > >> > > - Make clear what are default implementations and how to
>>>>> implement
>>>>> > > >> > > interfaces with desired behavior.
>>>>> > > >> > >
>>>>> > > >> > > This is a simplified view of the relations between relevant
>>>>> > classes
>>>>> > > of
>>>>> > > >> > the
>>>>> > > >> > > PoC implementation:
>>>>> > > >> > >
>>>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|--
>>>>> ..
>>>>> > <|--
>>>>> > > >> > > MySourceReader
>>>>> > > >> > >
>>>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
>>>>> > SplitFetcher
>>>>> > > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
>>>>> > > >> > >
>>>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
>>>>> > > >> > >
>>>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager
>>>>> implement most
>>>>> > of
>>>>> > > >> the
>>>>> > > >> > > "logic" for split alignment just because we wanted to
>>>>> implement
>>>>> > > split
>>>>> > > >> > > alignment and wanted it to be available as kind of a
>>>>> default. As a
>>>>> > > >> > > consequence, we have a "default implementation" for
>>>>> SourceReader
>>>>> > > that
>>>>> > > >> > > implements the actual logic for split alignment. For that
>>>>> reason,
>>>>> > I
>>>>> > > >> find
>>>>> > > >> > it
>>>>> > > >> > > very confusing to have a NOOP default implementation in the
>>>>> > > interface
>>>>> > > >> for
>>>>> > > >> > > the SourceReader. As a consequence, interface strategy c) is
>>>>> > > difficult
>>>>> > > >> > > because this would require NOOP default implementations in
>>>>> the
>>>>> > > public
>>>>> > > >> > > interfaces of SourceReader and SplitReader. This is the
>>>>> same for
>>>>> > > >> strategy
>>>>> > > >> > > d) because it would require NOOP default implementation in
>>>>> the
>>>>> > > >> > > SourceReader. Further, as Dawid described method signatures
>>>>> of
>>>>> > > >> alignSplit
>>>>> > > >> > > for SourceReader and SplitReader differ and it would be
>>>>> extremely
>>>>> > > >> > difficult
>>>>> > > >> > > to make the signatures the same (with even potential
>>>>> performance
>>>>> > > >> impact
>>>>> > > >> > > because of additional loop-ups of split ids). Therefore,
>>>>> having a
>>>>> > > >> > symmetric
>>>>> > > >> > > decorative interface as of strategy a) is actually not
>>>>> possible
>>>>> > and
>>>>> > > >> > having
>>>>> > > >> > > two decorative interfaces with different method signatures
>>>>> is
>>>>> > > >> confusing.
>>>>> > > >> > My
>>>>> > > >> > > conclusion is that we are best with strategy b) which means
>>>>> to
>>>>> > have
>>>>> > > >> > > specializing sub-interfaces that inherit from the parent
>>>>> > interface:
>>>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
>>>>> > > >> > AlignedSplitReader
>>>>> > > >> > > With this option, I'm not 100% sure what the implications
>>>>> are and
>>>>> > if
>>>>> > > >> this
>>>>> > > >> > > could get nasty. I would suggest that Dawid and I just try
>>>>> to
>>>>> > > >> implement
>>>>> > > >> > and
>>>>> > > >> > > see if we like it. :)
>>>>> > > >> > >
>>>>> > > >> > > # Summary
>>>>> > > >> > >
>>>>> > > >> > > In conclusion, please let me know your perspectives. Please
>>>>> > correct
>>>>> > > >> me,
>>>>> > > >> > if
>>>>> > > >> > > something is wrong or if I misunderstood something. My
>>>>> perspective
>>>>> > > >> would
>>>>> > > >> > be:
>>>>> > > >> > >
>>>>> > > >> > > Top A: Yes
>>>>> > > >> > > Top B: Yes (but disable watermark trigger for split
>>>>> alignment)
>>>>> > > >> > > Top C: No
>>>>> > > >> > > Top D: b)
>>>>> > > >> > >
>>>>> > > >> > > Best,
>>>>> > > >> > > Sebastian
>>>>> > > >> > >
>>>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
>>>>> > > >> dwysakowicz@apache.org
>>>>> > > >> > >
>>>>> > > >> > > wrote:
>>>>> > > >> > >
>>>>> > > >> > >> @Arvid:
>>>>> > > >> > >>
>>>>> > > >> > >> While I also like Becket's capability approach, I fear
>>>>> that it
>>>>> > > >> doesn't
>>>>> > > >> > work
>>>>> > > >> > >> for this particular use case: Sources can always be aligned
>>>>> > > >> cross-task
>>>>> > > >> > and
>>>>> > > >> > >> this is just about intra-task alignment. So it's plausible
>>>>> to put
>>>>> > > >> > sources
>>>>> > > >> > >> into an alignment group even though they do not use any of
>>>>> the
>>>>> > > >> presented
>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if they
>>>>> handle
>>>>> > > >> > multiple
>>>>> > > >> > >> splits (see motivation section).
>>>>> > > >> > >>
>>>>> > > >> > >> Yes, but the "supportXXX" method would be for telling if it
>>>>> > > supports
>>>>> > > >> > that
>>>>> > > >> > >> intra-task alignment. Cross-task alignment would always be
>>>>> > > supported.
>>>>> > > >> > >>
>>>>> > > >> > >> I updated interfaces to what I believe to be closest to a
>>>>> > consensus
>>>>> > > >> > >> between all participants. Do you mind taking a look?
>>>>> > > >> > >>
>>>>> > > >> > >> @Sebastian Do you mind addressing the nits?
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >>
>>>>> > > >> > >> Dawid
>>>>> > > >> > >>
>>>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks for pushing this effort.
>>>>> > > >> > >>
>>>>> > > >> > >> I'd actually be in favor of 1b). I fully agree that
>>>>> decorator
>>>>> > > >> interfaces
>>>>> > > >> > >> should be avoided but I'm also not a big fan of
>>>>> overloading the
>>>>> > > base
>>>>> > > >> > >> interfaces (they are hard to implement as is). The usual
>>>>> feedback
>>>>> > > to
>>>>> > > >> > >> Source-related interfaces are always that they are
>>>>> overwhelming
>>>>> > and
>>>>> > > >> too
>>>>> > > >> > >> hard to implement. However, I'd also not oppose 1c) as
>>>>> scattered
>>>>> > > >> > interfaces
>>>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
>>>>> > > >> > >> While I also like Becket's capability approach, I fear
>>>>> that it
>>>>> > > >> doesn't
>>>>> > > >> > work
>>>>> > > >> > >> for this particular use case: Sources can always be aligned
>>>>> > > >> cross-task
>>>>> > > >> > and
>>>>> > > >> > >> this is just about intra-task alignment. So it's plausible
>>>>> to put
>>>>> > > >> > sources
>>>>> > > >> > >> into an alignment group even though they do not use any of
>>>>> the
>>>>> > > >> presented
>>>>> > > >> > >> API of FLIP-217. They should just issue a warning, if they
>>>>> handle
>>>>> > > >> > multiple
>>>>> > > >> > >> splits (see motivation section).
>>>>> > > >> > >>
>>>>> > > >> > >> I think renaming alignSplits to facilitate future use
>>>>> cases makes
>>>>> > > >> sense
>>>>> > > >> > but
>>>>> > > >> > >> then all interfaces (if 1c) is chosen) should be adjusted
>>>>> > > >> accordingly.
>>>>> > > >> > >> AlignedSourceReader could be PausingSourceReader and I'd
>>>>> go for
>>>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could
>>>>> also
>>>>> > split
>>>>> > > it
>>>>> > > >> > into
>>>>> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may
>>>>> allow
>>>>> > > >> Sources
>>>>> > > >> > to
>>>>> > > >> > >> just use 1 instead of 2 library calls (as written in the
>>>>> > Javadoc),
>>>>> > > >> both
>>>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if there is
>>>>> a
>>>>> > system
>>>>> > > >> that
>>>>> > > >> > >> can.
>>>>> > > >> > >>
>>>>> > > >> > >> Some nit for the FLIP:
>>>>> > > >> > >> - Please replace "stop" with "pause".
>>>>> > > >> > >> - Not sure if it's worth it in the capability section:
>>>>> Sources
>>>>> > that
>>>>> > > >> > adopt
>>>>> > > >> > >> this interface cannot be used in earlier versions. So it
>>>>> feels
>>>>> > like
>>>>> > > >> we
>>>>> > > >> > are
>>>>> > > >> > >> only forward compatible (old sources can be used after the
>>>>> > change);
>>>>> > > >> but
>>>>> > > >> > I
>>>>> > > >> > >> guess this holds for any API addition.
>>>>> > > >> > >> - You might want to add what happens when all splits are
>>>>> paused.
>>>>> > > >> > >> - You may want to describe how the 3 flavors of
>>>>> SourceReaderBase
>>>>> > > >> > interact
>>>>> > > >> > >> with the interface.
>>>>> > > >> > >> - I'm not sure if it makes sense to include Kafka and
>>>>> Pulsar in
>>>>> > the
>>>>> > > >> > FLIP.
>>>>> > > >> > >> For me, this is rather immediate follow-up work. (could be
>>>>> in the
>>>>> > > >> same
>>>>> > > >> > >> umbrella ticket)
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >>
>>>>> > > >> > >> Arvid
>>>>> > > >> > >>
>>>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>> > > >> > >> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>>>> WithSplitsAlignment",
>>>>> > > >> along
>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>>>>> WithSplitsAlignment",
>>>>> > > or
>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>>>> > > "MySplitReader
>>>>> > > >> > >> implements AlignedSplitReader", or
>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>>>> "MySplitReader
>>>>> > > >> > implements
>>>>> > > >> > >> SplitReader".
>>>>> > > >> > >>
>>>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>>>> "MySplitReader
>>>>> > > >> > implements
>>>>> > > >> > >> AlignedSplitReader".
>>>>> > > >> > >> I am fine with this API, although personally speaking I
>>>>> think it
>>>>> > is
>>>>> > > >> > simpler
>>>>> > > >> > >> to just add a new method to the split reader with default
>>>>> impl.
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> I think that is a good idea to have it aligned as much as
>>>>> > possible.
>>>>> > > >> I'd
>>>>> > > >> > be
>>>>> > > >> > >> +1 for your option c). We can merge AlignedSplitReader with
>>>>> > > >> > SplitReader. We
>>>>> > > >> > >> will update the FLIP shortly.
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >>
>>>>> > > >> > >> Dawid
>>>>> > > >> > >>
>>>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks for the comment, Jark.
>>>>> > > >> > >>
>>>>> > > >> > >> 3. Interface/Method Name.
>>>>> > > >> > >>
>>>>> > > >> > >> Can the interface be used to align other things in the
>>>>> future?
>>>>> > For
>>>>> > > >> > example,
>>>>> > > >> > >> align read speed, I have
>>>>> > > >> > >> seen users requesting global rate limits. This feature may
>>>>> also
>>>>> > > need
>>>>> > > >> an
>>>>> > > >> > >> interface like this.
>>>>> > > >> > >> If we don't plan to extend this interface to support align
>>>>> other
>>>>> > > >> > things, I
>>>>> > > >> > >> suggest explicitly declaring
>>>>> > > >> > >> the purpose of the methods, such as
>>>>> `alignWatermarksForSplits`
>>>>> > > >> instead
>>>>> > > >> > of
>>>>> > > >> > >> `alignSplits`.
>>>>> > > >> > >>
>>>>> > > >> > >> This is a good point. Naming wise, it would usually be more
>>>>> > > >> extensible
>>>>> > > >> > to
>>>>> > > >> > >> just describe what the method actually does, instead of
>>>>> assuming
>>>>> > > the
>>>>> > > >> > >> purpose of doing this. For example, in this case,
>>>>> > > >> pauseOrResumeSplits()
>>>>> > > >> > >> would be more extensible because this can be used for any
>>>>> kind of
>>>>> > > >> flow
>>>>> > > >> > >> control, be it watermark alignment or simple rate limiting.
>>>>> > > >> > >>
>>>>> > > >> > >> 4. Interface or Method.
>>>>> > > >> > >>
>>>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>>>> their
>>>>> > own
>>>>> > > >> > >> advantages.
>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>>>> abilities
>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>>> > > >> > >> and I prefer Interfaces rather than methods in this case.
>>>>> When
>>>>> > you
>>>>> > > >> have
>>>>> > > >> > a
>>>>> > > >> > >> bunch of abilities and each ability
>>>>> > > >> > >> has more than one method, Interfaces can help to organize
>>>>> them
>>>>> > and
>>>>> > > >> make
>>>>> > > >> > >> users clear which methods
>>>>> > > >> > >> need to implement when you want to have an ability.
>>>>> > > >> > >>
>>>>> > > >> > >> I am OK with decorative interfaces if this is a general
>>>>> design
>>>>> > > >> pattern
>>>>> > > >> > in
>>>>> > > >> > >> the other components in Flink. But it looks like the
>>>>> current API
>>>>> > > >> > proposal
>>>>> > > >> > >> is not symmetric.
>>>>> > > >> > >>
>>>>> > > >> > >> The current proposal is essentially "MySourceReader
>>>>> implements
>>>>> > > >> > >> SourceReader, WithSplitsAlignment", along with
>>>>> "MySplitReader
>>>>> > > >> implements
>>>>> > > >> > >> AlignedSplitsReader".
>>>>> > > >> > >>
>>>>> > > >> > >> Should we make the API symmetric? I'd consider any one of
>>>>> the
>>>>> > > >> following
>>>>> > > >> > as
>>>>> > > >> > >> symmetric.
>>>>> > > >> > >>
>>>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>>>> WithSplitsAlignment",
>>>>> > > >> along
>>>>> > > >> > >> with "MySplitReader implements SplitReader,
>>>>> WithSplitsAlignment",
>>>>> > > or
>>>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>>>> > > "MySplitReader
>>>>> > > >> > >> implements AlignedSplitReader", or
>>>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>>>> "MySplitReader
>>>>> > > >> > implements
>>>>> > > >> > >> SplitReader".
>>>>> > > >> > >>
>>>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>>>> "MySplitReader
>>>>> > > >> > implements
>>>>> > > >> > >> AlignedSplitReader".
>>>>> > > >> > >> I am fine with this API, although personally speaking I
>>>>> think it
>>>>> > is
>>>>> > > >> > simpler
>>>>> > > >> > >> to just add a new method to the split reader with default
>>>>> impl.
>>>>> > > >> > >>
>>>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
>>>>> > dwysakowicz@apache.org
>>>>> > > >
>>>>> > > >> <
>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for
>>>>> the
>>>>> > > reply.
>>>>> > > >> > >>
>>>>> > > >> > >> Having said that, as I don't have a preference and I agree
>>>>> most
>>>>> > of
>>>>> > > >> the
>>>>> > > >> > >>
>>>>> > > >> > >> sources will support the alignment I am fine following your
>>>>> > > >> suggestion
>>>>> > > >> > to
>>>>> > > >> > >> have the SourceReader extending from
>>>>> > WithWatermarksSplitsAlignment,
>>>>> > > >> but
>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
>>>>> keep the
>>>>> > > two
>>>>> > > >> > >> methods together.
>>>>> > > >> > >>
>>>>> > > >> > >> One benefit of having the "supportsXXX" in Source is that
>>>>> this
>>>>> > > allows
>>>>> > > >> > some
>>>>> > > >> > >> compile time check. For example, if a user enabled
>>>>> watermark
>>>>> > > >> alignment
>>>>> > > >> > >> while it is not supported by the Source, an exception can
>>>>> be
>>>>> > thrown
>>>>> > > >> at
>>>>> > > >> > >> compile time. It seems in general useful. That said, I
>>>>> agree that
>>>>> > > API
>>>>> > > >> > >> cleanliness wise it is better to put the two methods
>>>>> together.
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks,
>>>>> > > >> > >>
>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>> > > >> > >>
>>>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com>
>>>>> <
>>>>> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com>
>>>>> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss
>>>>> it.
>>>>> > > >> > >> I don't have other concerns then.
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >> Jark
>>>>> > > >> > >>
>>>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
>>>>> > > >> dwysakowicz@apache.org>
>>>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
>>>>> > > >> dwysakowicz@apache.org>
>>>>> > > >> > >> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> @Jark:
>>>>> > > >> > >>
>>>>> > > >> > >> 1. Will the framework always align with watermarks when the
>>>>> > source
>>>>> > > >> > >> implements the interface?
>>>>> > > >> > >> I'm afraid not every case needs watermark alignment even
>>>>> if Kafka
>>>>> > > >> > >> implements the interface,
>>>>> > > >> > >> and this will affect the throughput somehow. I agree with
>>>>> Becket
>>>>> > > >> > >> we may need a
>>>>> > > >> > >> `supportSplitsAlignment()` method for users to configure
>>>>> the
>>>>> > source
>>>>> > > >> to
>>>>> > > >> > >> enable/disable the alignment.
>>>>> > > >> > >>
>>>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>>>>> affect
>>>>> > > >> > >>
>>>>> > > >> > >> throughput
>>>>> > > >> > >>
>>>>> > > >> > >> if the reader is constantly
>>>>> > > >> > >>  switching between pause and resume. Can users configure
>>>>> the
>>>>> > > >> alignment
>>>>> > > >> > >> offset?
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> This is covered in the previous FLIP[1] which has been
>>>>> already
>>>>> > > >> > >>
>>>>> > > >> > >> implemented
>>>>> > > >> > >>
>>>>> > > >> > >> in 1.15. In short, it must be enabled with the watermark
>>>>> strategy
>>>>> > > >> which
>>>>> > > >> > >> also configures drift and update interval.
>>>>> > > >> > >>
>>>>> > > >> > >> If we don't plan to extend this interface to support align
>>>>> other
>>>>> > > >> things,
>>>>> > > >> > >>
>>>>> > > >> > >> I
>>>>> > > >> > >>
>>>>> > > >> > >> suggest explicitly declaring
>>>>> > > >> > >> the purpose of the methods, such as
>>>>> `alignWatermarksForSplits`
>>>>> > > >> instead
>>>>> > > >> > of
>>>>> > > >> > >> `alignSplits`.
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> Sure let's rename it.
>>>>> > > >> > >>
>>>>> > > >> > >> @Becket:
>>>>> > > >> > >>
>>>>> > > >> > >> I understand your point. On the other hand putting all
>>>>> methods,
>>>>> > > even
>>>>> > > >> > with
>>>>> > > >> > >> "supportsXXX" methods for enabling certain features, makes
>>>>> the
>>>>> > > entry
>>>>> > > >> > >> threshold for writing a new source higher. Instead of
>>>>> focusing on
>>>>> > > the
>>>>> > > >> > >>
>>>>> > > >> > >> basic
>>>>> > > >> > >>
>>>>> > > >> > >> and required properties of the Source, the person
>>>>> implementing a
>>>>> > > >> source
>>>>> > > >> > >> must bother with and need to figure out what all of the
>>>>> extra
>>>>> > > >> features
>>>>> > > >> > >>
>>>>> > > >> > >> are
>>>>> > > >> > >>
>>>>> > > >> > >> about and how to deal with them. It makes it also harder to
>>>>> > > organize
>>>>> > > >> > >> methods in coupled groups as Jark said.
>>>>> > > >> > >>
>>>>> > > >> > >> Having said that, as I don't have a preference and I agree
>>>>> most
>>>>> > of
>>>>> > > >> the
>>>>> > > >> > >> sources will support the alignment I am fine following your
>>>>> > > >> suggestion
>>>>> > > >> > to
>>>>> > > >> > >> have the SourceReader extending from
>>>>> > WithWatermarksSplitsAlignment,
>>>>> > > >> but
>>>>> > > >> > >> would put the "supportsXXX" there, not in the Source to
>>>>> keep the
>>>>> > > two
>>>>> > > >> > >> methods together.
>>>>> > > >> > >>
>>>>> > > >> > >> Lastly, I agree it is really unfortunate the "alignSplits"
>>>>> > methods
>>>>> > > >> > differ
>>>>> > > >> > >> slightly for SourceReader and SpitReader. The reason for
>>>>> that is
>>>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
>>>>> SplitReader
>>>>> > > needs
>>>>> > > >> the
>>>>> > > >> > >> actual splits to pause them. I found the discrepancy
>>>>> acceptable
>>>>> > for
>>>>> > > >> the
>>>>> > > >> > >> sake of simplifying changes significantly, especially as
>>>>> they
>>>>> > would
>>>>> > > >> > >>
>>>>> > > >> > >> highly
>>>>> > > >> > >>
>>>>> > > >> > >> likely impact performance as we would have to perform
>>>>> additional
>>>>> > > >> > lookups.
>>>>> > > >> > >> Moreover the SplitReader is a secondary interface.
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >>
>>>>> > > >> > >> Dawid
>>>>> > > >> > >>
>>>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>>>>> > > >> > >>
>>>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
>>>>> > > >> > >>
>>>>> > > >> > >> I just have some minor questions (maybe I missed
>>>>> something).
>>>>> > > >> > >>
>>>>> > > >> > >> 1. Will the framework always align with watermarks when the
>>>>> > source
>>>>> > > >> > >> implements the interface?
>>>>> > > >> > >> I'm afraid not every case needs watermark alignment even
>>>>> if Kafka
>>>>> > > >> > >> implements the interface,
>>>>> > > >> > >> and this will affect the throughput somehow. I agree with
>>>>> Becket
>>>>> > > >> > >> we may need a
>>>>> > > >> > >> `supportSplitsAlignment()` method for users to configure
>>>>> the
>>>>> > source
>>>>> > > >> to
>>>>> > > >> > >> enable/disable the alignment.
>>>>> > > >> > >>
>>>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>>>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>>>>> affect
>>>>> > > >> > >>
>>>>> > > >> > >> throughput
>>>>> > > >> > >>
>>>>> > > >> > >> if the reader is constantly
>>>>> > > >> > >>  switching between pause and resume. Can users configure
>>>>> the
>>>>> > > >> alignment
>>>>> > > >> > >> offset?
>>>>> > > >> > >>
>>>>> > > >> > >> 3. Interface/Method Name.
>>>>> > > >> > >> Can the interface be used to align other things in the
>>>>> future?
>>>>> > For
>>>>> > > >> > >>
>>>>> > > >> > >> example,
>>>>> > > >> > >>
>>>>> > > >> > >> align read speed, I have
>>>>> > > >> > >> seen users requesting global rate limits. This feature may
>>>>> also
>>>>> > > need
>>>>> > > >> an
>>>>> > > >> > >> interface like this.
>>>>> > > >> > >> If we don't plan to extend this interface to support align
>>>>> other
>>>>> > > >> things,
>>>>> > > >> > >>
>>>>> > > >> > >> I
>>>>> > > >> > >>
>>>>> > > >> > >> suggest explicitly declaring
>>>>> > > >> > >> the purpose of the methods, such as
>>>>> `alignWatermarksForSplits`
>>>>> > > >> instead
>>>>> > > >> > of
>>>>> > > >> > >> `alignSplits`.
>>>>> > > >> > >>
>>>>> > > >> > >> 4. Interface or Method.
>>>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>>>> their
>>>>> > own
>>>>> > > >> > >> advantages.
>>>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>>>> abilities
>>>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>>>> > > >> > >> and I prefer Interfaces rather than methods in this case.
>>>>> When
>>>>> > you
>>>>> > > >> have
>>>>> > > >> > a
>>>>> > > >> > >> bunch of abilities and each ability
>>>>> > > >> > >> has more than one method, Interfaces can help to organize
>>>>> them
>>>>> > and
>>>>> > > >> make
>>>>> > > >> > >> users clear which methods
>>>>> > > >> > >> need to implement when you want to have an ability.
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >> Jark
>>>>> > > >> > >>
>>>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>>>>> becket.qin@gmail.com>
>>>>> > <
>>>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
>>>>> becket.qin@gmail.com>
>>>>> > <
>>>>> > > >> > >> becket.qin@gmail.com> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> Hi Dawid,
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks for the explanation. Apologies that I somehow
>>>>> misread a
>>>>> > > bunch
>>>>> > > >> of
>>>>> > > >> > >> "align" and thought they were "assign".
>>>>> > > >> > >>
>>>>> > > >> > >> Regarding 1, by default implementation, I was thinking of
>>>>> the
>>>>> > > default
>>>>> > > >> > >>
>>>>> > > >> > >> no-op
>>>>> > > >> > >>
>>>>> > > >> > >> implementation. I am a little worried about the
>>>>> proliferation of
>>>>> > > >> > >>
>>>>> > > >> > >> decorative
>>>>> > > >> > >>
>>>>> > > >> > >> interfaces. I think the most important thing about
>>>>> interfaces is
>>>>> > > that
>>>>> > > >> > >>
>>>>> > > >> > >> they
>>>>> > > >> > >>
>>>>> > > >> > >> are easy to understand. In this case, I prefer adding new
>>>>> method
>>>>> > to
>>>>> > > >> the
>>>>> > > >> > >> existing interface for the following reasons:
>>>>> > > >> > >>
>>>>> > > >> > >> a) I feel the biggest drawback of decorative interfaces is
>>>>> which
>>>>> > > >> > >>
>>>>> > > >> > >> interface
>>>>> > > >> > >>
>>>>> > > >> > >> they can decorate and which combinations of multiple
>>>>> decorative
>>>>> > > >> > >>
>>>>> > > >> > >> interfaces
>>>>> > > >> > >>
>>>>> > > >> > >> are valid. In the current FLIP, the withSplitsAlignment
>>>>> interface
>>>>> > > is
>>>>> > > >> > only
>>>>> > > >> > >> applicable to the SourceReader which means it can't
>>>>> decorate any
>>>>> > > >> other
>>>>> > > >> > >> interface. From an interface design perspective, a natural
>>>>> > question
>>>>> > > >> is
>>>>> > > >> > >>
>>>>> > > >> > >> why
>>>>> > > >> > >>
>>>>> > > >> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"?
>>>>> And it
>>>>> > > is
>>>>> > > >> > also
>>>>> > > >> > >> natural to assume that a split reader implementing both
>>>>> > SplitReader
>>>>> > > >> and
>>>>> > > >> > >> WithSplitAlignment would work, because a source reader
>>>>> > implementing
>>>>> > > >> > >> SourceReader and withSplitsAlignment works. So why isn't
>>>>> there an
>>>>> > > >> > >>
>>>>> > > >> > >> interface
>>>>> > > >> > >>
>>>>> > > >> > >> of AlignedSourceReader? In the future, if there is a new
>>>>> feature
>>>>> > > >> added
>>>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we going
>>>>> to
>>>>> > create
>>>>> > > >> > >>
>>>>> > > >> > >> another
>>>>> > > >> > >>
>>>>> > > >> > >> interface of SplitReader such as SortedSplitReader or
>>>>> > > >> > >>
>>>>> > > >> > >> PrePartitionedAware?
>>>>> > > >> > >>
>>>>> > > >> > >> Can they be combined? So I think the additional decorative
>>>>> > > interface
>>>>> > > >> > like
>>>>> > > >> > >> withSplitsAlignment actually increases the understanding
>>>>> cost of
>>>>> > > >> users
>>>>> > > >> > >> because they have to know what decorative interfaces are
>>>>> there,
>>>>> > > which
>>>>> > > >> > >> interface they can decorate and which combinations of the
>>>>> > > decorative
>>>>> > > >> > >> interfaces are valid and which are not. Ideally we want to
>>>>> avoid
>>>>> > > >> that.
>>>>> > > >> > To
>>>>> > > >> > >> be clear, I am not opposing having an interface of
>>>>> > > >> withSplitsAlignment,
>>>>> > > >> > >>
>>>>> > > >> > >> it
>>>>> > > >> > >>
>>>>> > > >> > >> is completely OK to have it as an internal interface and
>>>>> let
>>>>> > > >> > SourceReader
>>>>> > > >> > >> and SplitReader both extend it.
>>>>> > > >> > >>
>>>>> > > >> > >> b) Adding a new method to the SourceReader with a default
>>>>> > > >> implementation
>>>>> > > >> > >>
>>>>> > > >> > >> of
>>>>> > > >> > >>
>>>>> > > >> > >> no-op would help avoid logic branching in the source logic,
>>>>> > > >> especially
>>>>> > > >> > >> given that we agree that the vast majority of the
>>>>> SourceReader
>>>>> > > >> > >> implementations, if not all, would just extend from the
>>>>> > > >> > SourceReaderBase.
>>>>> > > >> > >> That means adding a new method to the interface would
>>>>> effectively
>>>>> > > >> give
>>>>> > > >> > >>
>>>>> > > >> > >> the
>>>>> > > >> > >>
>>>>> > > >> > >> same user experience, but simpler.
>>>>> > > >> > >>
>>>>> > > >> > >> c) A related design principle that may be worth discussing
>>>>> is how
>>>>> > > do
>>>>> > > >> we
>>>>> > > >> > >>
>>>>> > > >> > >> let
>>>>> > > >> > >>
>>>>> > > >> > >> the Source implementations tell Flink what capability is
>>>>> > supported
>>>>> > > >> and
>>>>> > > >> > >>
>>>>> > > >> > >> what
>>>>> > > >> > >>
>>>>> > > >> > >> is not. Personally speaking I feel the most intuitive
>>>>> place to me
>>>>> > > is
>>>>> > > >> in
>>>>> > > >> > >>
>>>>> > > >> > >> the
>>>>> > > >> > >>
>>>>> > > >> > >> Source itself, because that is the entrance of the entire
>>>>> Source
>>>>> > > >> > >>
>>>>> > > >> > >> connector
>>>>> > > >> > >>
>>>>> > > >> > >> logic.
>>>>> > > >> > >>
>>>>> > > >> > >> Based on the above thoughts, I am wondering if the
>>>>> following
>>>>> > > >> interface
>>>>> > > >> > >> would be easier to understand by the users.
>>>>> > > >> > >>
>>>>> > > >> > >> - Change "withSplitsAlignment" to internal interface, let
>>>>> both
>>>>> > > >> > >>
>>>>> > > >> > >> SourceReader
>>>>> > > >> > >>
>>>>> > > >> > >> and SplitReader extend from it, with a default no-op
>>>>> > > implementation.
>>>>> > > >> > >> - Add a new method "boolean supportSplitsAlignment()" to
>>>>> the
>>>>> > Source
>>>>> > > >> > >> interface, with a default implementation returning false.
>>>>> Sources
>>>>> > > >> that
>>>>> > > >> > >>
>>>>> > > >> > >> have
>>>>> > > >> > >>
>>>>> > > >> > >> implemented the alignment logic can change this to return
>>>>> true,
>>>>> > and
>>>>> > > >> > >> override the alignSplits() methods in the SourceReader /
>>>>> > > SplitReader
>>>>> > > >> if
>>>>> > > >> > >> needed.
>>>>> > > >> > >> - In the future, if a new optional feature is going to be
>>>>> added
>>>>> > to
>>>>> > > >> the
>>>>> > > >> > >> Source, and that feature requires the awareness from
>>>>> Flink, we
>>>>> > can
>>>>> > > >> add
>>>>> > > >> > >>
>>>>> > > >> > >> more
>>>>> > > >> > >>
>>>>> > > >> > >> such methods to the Source.
>>>>> > > >> > >>
>>>>> > > >> > >> What do you think?
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks,
>>>>> > > >> > >>
>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
>>>>> > > >> dwysakowicz@apache.org>
>>>>> > > >> > <dw...@apache.org>
>>>>> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>> > > >> > >>
>>>>> > > >> > >> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> @Konstantin:
>>>>> > > >> > >>
>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>>>> (aka the
>>>>> > > >> stop &
>>>>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>>>>> only,
>>>>> > > >> correct?
>>>>> > > >> > >>
>>>>> > > >> > >> Correct, as far as I know though, those are the only
>>>>> sources
>>>>> > which
>>>>> > > >> > >>
>>>>> > > >> > >> consume
>>>>> > > >> > >>
>>>>> > > >> > >> concurrently from multiple splits and thus alignment
>>>>> applies.
>>>>> > > >> > >>
>>>>> > > >> > >> @Thomas:
>>>>> > > >> > >>
>>>>> > > >> > >> I wonder if "supporting" split alignment in
>>>>> SourceReaderBase and
>>>>> > > then
>>>>> > > >> > >>
>>>>> > > >> > >> doing
>>>>> > > >> > >>
>>>>> > > >> > >> nothing if the split reader does not implement
>>>>> AlignedSplitReader
>>>>> > > >> could
>>>>> > > >> > >>
>>>>> > > >> > >> be
>>>>> > > >> > >>
>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
>>>>> added to
>>>>> > the
>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>>>>> > explicit
>>>>> > > >> that
>>>>> > > >> > >> the source actually supports it.
>>>>> > > >> > >>
>>>>> > > >> > >> I understand your concern. Hmm, I think we could actually
>>>>> do
>>>>> > that.
>>>>> > > >> Given
>>>>> > > >> > >> the actual implementation of the
>>>>> SourceReaderBase#alignSplits is
>>>>> > > >> rather
>>>>> > > >> > >> short (just a forward to the corresponding method of
>>>>> > SplitFetcher),
>>>>> > > >> we
>>>>> > > >> > >> could reimplement it in the actual source implementations.
>>>>> This
>>>>> > > >> solution
>>>>> > > >> > >> has the downside though. Authors of new sources would have
>>>>> to do
>>>>> > > two
>>>>> > > >> > >> things: extend from AlignedSplitReader and implement
>>>>> > > >> > >>
>>>>> > > >> > >> WithSplitsAssignment,
>>>>> > > >> > >>
>>>>> > > >> > >> instead of just extending AlignedSplitReader. I would be
>>>>> fine
>>>>> > with
>>>>> > > >> such
>>>>> > > >> > a
>>>>> > > >> > >> tradeoff though. What others think?
>>>>> > > >> > >>
>>>>> > > >> > >> @Steven:
>>>>> > > >> > >>
>>>>> > > >> > >> For this part from the motivation section, is it accurate?
>>>>> Let's
>>>>> > > >> assume
>>>>> > > >> > >>
>>>>> > > >> > >> one
>>>>> > > >> > >>
>>>>> > > >> > >> source task consumes from 3 partitions and one of the
>>>>> partition
>>>>> > is
>>>>> > > >> > >> significantly slower. In this situation, watermark for this
>>>>> > source
>>>>> > > >> task
>>>>> > > >> > >> won't hold back as it is reading recent data from other
>>>>> two Kafka
>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>>>> > watermark.
>>>>> > > I
>>>>> > > >> > >> thought the problem is that we may have late data for this
>>>>> slow
>>>>> > > >> > >>
>>>>> > > >> > >> partition.
>>>>> > > >> > >>
>>>>> > > >> > >> It will hold back the watermark. Watermark of an operator
>>>>> is the
>>>>> > > >> minimum
>>>>> > > >> > >> of watermarks of all splits[1]
>>>>> > > >> > >>
>>>>> > > >> > >> I have another question about the restart. Say split
>>>>> alignment is
>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>>>>> restored from
>>>>> > > the
>>>>> > > >> > last
>>>>> > > >> > >> checkpoint. because alignment decision is not checkpointed,
>>>>> > > initially
>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
>>>>> watermark
>>>>> > > >> > aggregation
>>>>> > > >> > >> and propagation, right? Not saying this corner is a
>>>>> problem. Just
>>>>> > > >> want
>>>>> > > >> > to
>>>>> > > >> > >> understand it more.
>>>>> > > >> > >>
>>>>> > > >> > >> Your understanding is correct.
>>>>> > > >> > >>
>>>>> > > >> > >> @Becket:
>>>>> > > >> > >>
>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>>>> case, so
>>>>> > > >> should
>>>>> > > >> > >>
>>>>> > > >> > >> we
>>>>> > > >> > >>
>>>>> > > >> > >> just add the related methods to SourceReader directly
>>>>> instead of
>>>>> > > >> > >> introducing the new interface of WithSplitAssignment? We
>>>>> can
>>>>> > > provide
>>>>> > > >> > >> default implementations, so backwards compatibility won't
>>>>> be an
>>>>> > > >> issue.
>>>>> > > >> > >>
>>>>> > > >> > >> I don't think we can provide a default implementation. How
>>>>> would
>>>>> > we
>>>>> > > >> do
>>>>> > > >> > >> that? Would it be just a no-op? Is it better than having an
>>>>> > opt-in
>>>>> > > >> > >> interface? The default implementation would have to be
>>>>> added
>>>>> > > >> exclusively
>>>>> > > >> > >>
>>>>> > > >> > >> in
>>>>> > > >> > >>
>>>>> > > >> > >> a *Public* SourceReader interface. By the way notice
>>>>> > > SourceReaderBase
>>>>> > > >> > >> does extend from WithSplitsAlignment, so effectively all
>>>>> > > >> implementations
>>>>> > > >> > >>
>>>>> > > >> > >> do
>>>>> > > >> > >>
>>>>> > > >> > >> handle the alignment case. To be honest I think it is
>>>>> impossible
>>>>> > to
>>>>> > > >> > >> implement the SourceReader interface directly by end users.
>>>>> > > >> > >>
>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
>>>>> also
>>>>> > needs
>>>>> > > >> some
>>>>> > > >> > >> change to support throttling at the split granularity. Can
>>>>> you
>>>>> > add
>>>>> > > >> that
>>>>> > > >> > >> interface change into the public interface section as well?
>>>>> > > >> > >>
>>>>> > > >> > >> It has been added from the beginning. See
>>>>> *AlignedSplitReader.*
>>>>> > > >> > >>
>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
>>>>> here,
>>>>> > given
>>>>> > > >> that
>>>>> > > >> > >>
>>>>> > > >> > >> it
>>>>> > > >> > >>
>>>>> > > >> > >> is not actually changing the split assignments? It seems
>>>>> > something
>>>>> > > >> like
>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>>>>> > > accurate.
>>>>> > > >> > >>
>>>>> > > >> > >> The method's called *alignSplits*, not assign. Do you still
>>>>> > prefer
>>>>> > > a
>>>>> > > >> > >> different name for that? Personally, I am open for
>>>>> suggestions
>>>>> > > here.
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >>
>>>>> > > >> > >> Dawid
>>>>> > > >> > >>
>>>>> > > >> > >> [1]
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> >
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>>>>> > > >> > >>
>>>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks for driving the effort, Sebastion. I think the
>>>>> motivation
>>>>> > > >> makes a
>>>>> > > >> > >> lot of sense. Just a few suggestions / questions.
>>>>> > > >> > >>
>>>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>>>> case, so
>>>>> > > >> should
>>>>> > > >> > >>
>>>>> > > >> > >> we
>>>>> > > >> > >>
>>>>> > > >> > >> just add the related methods to SourceReader directly
>>>>> instead of
>>>>> > > >> > >> introducing the new interface of WithSplitAssignment? We
>>>>> can
>>>>> > > provide
>>>>> > > >> > >> default implementations, so backwards compatibility won't
>>>>> be an
>>>>> > > >> issue.
>>>>> > > >> > >>
>>>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably
>>>>> also
>>>>> > needs
>>>>> > > >> some
>>>>> > > >> > >> change to support throttling at the split granularity. Can
>>>>> you
>>>>> > add
>>>>> > > >> that
>>>>> > > >> > >> interface change into the public interface section as well?
>>>>> > > >> > >>
>>>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits
>>>>> here,
>>>>> > given
>>>>> > > >> that
>>>>> > > >> > >>
>>>>> > > >> > >> it
>>>>> > > >> > >>
>>>>> > > >> > >> is not actually changing the split assignments? It seems
>>>>> > something
>>>>> > > >> like
>>>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>>>>> > > accurate.
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks,
>>>>> > > >> > >>
>>>>> > > >> > >> Jiangjie (Becket) Qin
>>>>> > > >> > >>
>>>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
>>>>> stevenz3wu@gmail.com
>>>>> > >
>>>>> > > <
>>>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
>>>>> stevenz3wu@gmail.com>
>>>>> > <
>>>>> > > >> > >> stevenz3wu@gmail.com> <
>>>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> However, a single source operator may read data from
>>>>> multiple
>>>>> > > >> > >>
>>>>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions, such
>>>>> that
>>>>> > even
>>>>> > > >> with
>>>>> > > >> > >> watermark alignment the source operator may need to buffer
>>>>> > > excessive
>>>>> > > >> > >>
>>>>> > > >> > >> amount
>>>>> > > >> > >>
>>>>> > > >> > >> of data if one split emits data faster than another.
>>>>> > > >> > >>
>>>>> > > >> > >> For this part from the motivation section, is it accurate?
>>>>> Let's
>>>>> > > >> assume
>>>>> > > >> > >>
>>>>> > > >> > >> one
>>>>> > > >> > >>
>>>>> > > >> > >> source task consumes from 3 partitions and one of the
>>>>> partition
>>>>> > is
>>>>> > > >> > >> significantly slower. In this situation, watermark for this
>>>>> > source
>>>>> > > >> task
>>>>> > > >> > >> won't hold back as it is reading recent data from other
>>>>> two Kafka
>>>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>>>> > watermark.
>>>>> > > I
>>>>> > > >> > >> thought the problem is that we may have late data for this
>>>>> slow
>>>>> > > >> > >>
>>>>> > > >> > >> partition.
>>>>> > > >> > >>
>>>>> > > >> > >> I have another question about the restart. Say split
>>>>> alignment is
>>>>> > > >> > >> triggered. checkpoint is completed. job failed and
>>>>> restored from
>>>>> > > the
>>>>> > > >> > last
>>>>> > > >> > >> checkpoint. because alignment decision is not checkpointed,
>>>>> > > initially
>>>>> > > >> > >> alignment won't be enforced until we get a cycle of
>>>>> watermark
>>>>> > > >> > aggregation
>>>>> > > >> > >> and propagation, right? Not saying this corner is a
>>>>> problem. Just
>>>>> > > >> want
>>>>> > > >> > to
>>>>> > > >> > >> understand it more.
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <
>>>>> thw@apache.org> <
>>>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>>>>> > > >> > >> thw@apache.org> <
>>>>> > > >> > >> thw@apache.org> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks for working on this!
>>>>> > > >> > >>
>>>>> > > >> > >> I wonder if "supporting" split alignment in
>>>>> SourceReaderBase and
>>>>> > > then
>>>>> > > >> > >>
>>>>> > > >> > >> doing
>>>>> > > >> > >>
>>>>> > > >> > >> nothing if the split reader does not implement
>>>>> AlignedSplitReader
>>>>> > > >> could
>>>>> > > >> > >>
>>>>> > > >> > >> be
>>>>> > > >> > >>
>>>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be
>>>>> added to
>>>>> > the
>>>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>>>>> > explicit
>>>>> > > >> that
>>>>> > > >> > >> the source actually supports it.
>>>>> > > >> > >>
>>>>> > > >> > >> Thanks,
>>>>> > > >> > >> Thomas
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
>>>>> > > knaufk@apache.org>
>>>>> > > >> <
>>>>> > > >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
>>>>> > > >> > >> knaufk@apache.org> <
>>>>> > > >> > >> knaufk@apache.org>
>>>>> > > >> > >>
>>>>> > > >> > >> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> Hi Sebastian, Hi Dawid,
>>>>> > > >> > >>
>>>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>>>> (aka the
>>>>> > > >> stop
>>>>> > > >> > >>
>>>>> > > >> > >> &
>>>>> > > >> > >>
>>>>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>>>>> only,
>>>>> > > >> > >>
>>>>> > > >> > >> correct?
>>>>> > > >> > >>
>>>>> > > >> > >> +1 in general. I believe it is valuable to complete the
>>>>> watermark
>>>>> > > >> > >>
>>>>> > > >> > >> aligned
>>>>> > > >> > >>
>>>>> > > >> > >> story with this FLIP.
>>>>> > > >> > >>
>>>>> > > >> > >> Cheers,
>>>>> > > >> > >>
>>>>> > > >> > >> Konstantin
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
>>>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>>>> > > >> > >>
>>>>> > > >> > >> wrote:
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >> To be explicit, having worked on it, I support it ;) I
>>>>> think we
>>>>> > can
>>>>> > > >> > >> start a vote thread soonish, as there are no concerns so
>>>>> far.
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >>
>>>>> > > >> > >> Dawid
>>>>> > > >> > >>
>>>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>>>>> > > >> > >>
>>>>> > > >> > >> Dear Flink developers,
>>>>> > > >> > >>
>>>>> > > >> > >> I would like to open a discussion on FLIP 217 [1] for an
>>>>> > extension
>>>>> > > >> > >>
>>>>> > > >> > >> of
>>>>> > > >> > >>
>>>>> > > >> > >> Watermark Alignment to perform alignment also in
>>>>> SplitReaders. To
>>>>> > > >> > >>
>>>>> > > >> > >> do
>>>>> > > >> > >>
>>>>> > > >> > >> so,
>>>>> > > >> > >>
>>>>> > > >> > >> SplitReaders must be able to suspend and resume reading
>>>>> from
>>>>> > split
>>>>> > > >> > >>
>>>>> > > >> > >> sources
>>>>> > > >> > >>
>>>>> > > >> > >> where the SourceOperator coordinates and controlls suspend
>>>>> and
>>>>> > > >> > >>
>>>>> > > >> > >> resume.
>>>>> > > >> > >>
>>>>> > > >> > >> To
>>>>> > > >> > >>
>>>>> > > >> > >> gather information about current watermarks of the
>>>>> SplitReaders,
>>>>> > we
>>>>> > > >> > >>
>>>>> > > >> > >> extend
>>>>> > > >> > >>
>>>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
>>>>> watermarks to
>>>>> > > >> > >>
>>>>> > > >> > >> the
>>>>> > > >> > >>
>>>>> > > >> > >> SourceOperator.
>>>>> > > >> > >>
>>>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid
>>>>> Heise and
>>>>> > > >> > >>
>>>>> > > >> > >> revised
>>>>> > > >> > >>
>>>>> > > >> > >> and
>>>>> > > >> > >>
>>>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the work.)
>>>>> and me.
>>>>> > The
>>>>> > > >> > >>
>>>>> > > >> > >> changes
>>>>> > > >> > >>
>>>>> > > >> > >> are backwards compatible in a way that if affected
>>>>> components do
>>>>> > > >> > >>
>>>>> > > >> > >> not
>>>>> > > >> > >>
>>>>> > > >> > >> support split alignment the behavior is as before.
>>>>> > > >> > >>
>>>>> > > >> > >> Best,
>>>>> > > >> > >> Sebastian
>>>>> > > >> > >>
>>>>> > > >> > >> [1]
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> >
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>>>> > > >> > >>
>>>>> > > >> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
>>>>> > > >> > >>
>>>>> > > >> > >> --
>>>>> > > >> > >>
>>>>> > > >> > >> Konstantin Knaufhttps://
>>>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>>>>> > > >> > >>
>>>>> > > >> > >>
>>>>> > > >> >
>>>>> > > >>
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Hi Sebastian,

Thanks for the reply and patient discussion. I agree this is a tricky
decision.


> Nevertheless, Piotr has valid concerns about Option c) which I see as
> follows:
> (1) An interface with default NOOP implementation makes the implementation
> optional. And in my opinion, a default implementation is and will remain a
> way of making implementation optional because even in future a developer
> can decide to implement the "old flavor" without support for pausable
> splits.
> (2) It may not be too critical but I also find it suboptimal that with a
> NOOP default implementation there is no way to check at runtime if
> SourceReader or SplitReader actually support pausing. (To do so, one would
> need a supportsX method which makes it again more complicated.)\


Based on the last few messages in the mailing list.  Piotr and I agreed
that the default implementation should just throw an
UnsupportedOperationException if the source is unpausable. So this
basically tells the Source developers that this feature is expected to be
supported. Because we cannot prevent end users from putting an unpausable
source into the watermark alignment group, that basically means watermark
alignment is an non-optional feature to the end users. So making that
expectation aligned with the source developers seems reasonable.  And if a
source does not support this feature, the end users should explicitly
remove that source from the watermark alignment group.

Personally speaking I think this is a simple and clean solution from both
the end user and source developers' standpoint.

Does this address your concerns?

Thanks,

Jiangjie (Becket) Qin

On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis <se...@ververica.com>
wrote:

> Hi Piotr, Hi Becket, Hi everybody,
>
> we, Dawid and I, discussed the various suggestions/options and we would be
> okay either way because we find neither solution is perfect just because of
> the already present complexity.
>
> Option c) Adding methods to the interfaces of SourceReader and SplitReader
> Option a) Adding decorative interfaces to be used by SourceReader and
> SplitReader
>
> As of the current status (v. 12) of the FLIP [1], it is based on Option c)
> which we find acceptable because the complexity added is only a single
> method.
>
> Nevertheless, Piotr has valid concerns about Option c) which I see as
> follows:
> (1) An interface with default NOOP implementation makes the implementation
> optional. And in my opinion, a default implementation is and will remain a
> way of making implementation optional because even in future a developer
> can decide to implement the "old flavor" without support for pausable
> splits.
> (2) It may not be too critical but I also find it suboptimal that with a
> NOOP default implementation there is no way to check at runtime if
> SourceReader or SplitReader actually support pausing. (To do so, one would
> need a supportsX method which makes it again more complicated.)
>
> However, we haven't changed it because Option a) is also not optimal or
> straight-forward:
> (1) We need to add two distinct yet similar decorative interfaces since,
> as mentioned, the signatures of the methods are different. For example, we
> would need decorative interfaces like `SplitReaderWithPausableSplits` and
> `SourceReaderWithPausableSplits`.
> (2) As a consequence, we would need to somehow document how/where to
> implement both interfaces and how this relates to each other. This we could
> solve by adding a note in the interface of SourceReader and SplitReader and
> reference to the decorative interfaces but it still increases complexity
> too.
>
> In summary, we see both as acceptable and preferred over other options.
> The question is if we can find a solution or compromise that is acceptable
> for everybody to reach consensus.
>
> Please let us know what you think because we would be happy if we can
> conclude the discussion to avoid dropping the initiative on this FLIP.
>
> Regards,
> Sebastian
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> (v. 12)
>
> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <pn...@apache.org>
> wrote:
>
>> Hi Guowei,
>>
>> as Dawid wrote a couple of messages back:
>>
>> > This is covered in the previous FLIP[1] which has been already
>> implemented in 1.15. In short, it must be enabled with the watermark
>> strategy which also configures drift and update interval
>>
>> So by default watermark alignment is disabled, regardless if a source
>> supports it or not.
>>
>> Best,
>> Piotrek
>>
>> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com> napisał(a):
>>
>>> Hi,
>>>
>>> We know that in the case of Bounded input Flink supports the Batch
>>> execution mode. Currently in Batch execution mode, flink is executed on a
>>> stage-by-stage basis. In this way, perhaps watermark alignment might not
>>> gain much.
>>>
>>> So my question is: Is watermark alignment the default behavior(for
>>> implemented source only)? If so, have you considered evaluating the
>>> impact
>>> of this behavior on the Batch execution mode? Or thinks it is not
>>> necessary.
>>>
>>> Correct me if I miss something.
>>>
>>> Best,
>>> Guowei
>>>
>>>
>>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <pi...@gmail.com>
>>> wrote:
>>>
>>> > Hi Becket and Dawid,
>>> >
>>> > > I feel that no matter which option we choose this can not be solved
>>> > entirely in either of the options, because of the point above and
>>> because
>>> > the signature of SplitReader#pauseOrResumeSplits and
>>> > SourceReader#pauseOrResumeSplits are slightly different (one identifies
>>> > splits with splitId the other one passes the splits directly).
>>> >
>>> > Yes, that's a good point in this case and for features that need to be
>>> > implemented in more than one place.
>>> >
>>> > > Is there any reason for pausing reading from a split an optional
>>> feature,
>>> > > other than that this was not included in the original interface?
>>> >
>>> > An additional argument in favor of making it optional is to simplify
>>> source
>>> > implementation. But on its own I'm not sure if that would be enough to
>>> > justify making this feature optional. Maybe.
>>> >
>>> > > I think it would be way simpler and clearer to just let end users and
>>> > Flink
>>> > > assume all the connectors will implement this feature.
>>> >
>>> > As I wrote above that would be an interesting choice to make (ease of
>>> > implementation for new users, vs system consistency). Regardless of
>>> that,
>>> > yes, for me the main argument is the API backward compatibility. But
>>> let's
>>> > clear a couple of points:
>>> > - The current proposal adding methods to the base interface with
>>> default
>>> > implementations is an OPTIONAL feature. Same as the decorative version
>>> > would be.
>>> > - Decorative version could implement "throw
>>> UnsupportedOperationException"
>>> > if user enabled watermark alignment just as well and I agree that's a
>>> > better option compared to logging a warning.
>>> >
>>> > Best,
>>> > Piotrek
>>> >
>>> >
>>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com> napisał(a):
>>> >
>>> > > Thanks for the reply and patient discussion, Piotr and Dawid.
>>> > >
>>> > > Is there any reason for pausing reading from a split an optional
>>> feature,
>>> > > other than that this was not included in the original interface?
>>> > >
>>> > > To be honest I am really worried about the complexity of the user
>>> story
>>> > > here. Optional features like this have a high overhead. Imagine this
>>> > > feature is optional, now a user enabled watermark alignment and
>>> defined a
>>> > > few watermark groups. Would it work? Hmm, that depends on whether the
>>> > > involved Source has implmemented this feature. If the Sources are
>>> well
>>> > > documented, good luck. Otherwise end users may have to look into the
>>> code
>>> > > of the Source to see whether the feature is supported. Which is
>>> something
>>> > > they shouldn't have to do.
>>> > >
>>> > > I think it would be way simpler and clearer to just let end users and
>>> > Flink
>>> > > assume all the connectors will implement this feature. After all the
>>> > > watermark group is not optinoal to the end users. If in some rare
>>> cases,
>>> > > the feature cannot be supported, a clear
>>> UnsupportedOperationException
>>> > will
>>> > > be thrown to tell users to explicitly remove this Source from the
>>> > watermark
>>> > > group. I don't think we should have a warning message here, as they
>>> tend
>>> > to
>>> > > be ignored in many cases. If we do this, we don't even need the
>>> > supportXXX
>>> > > method in the Source for this feature. In fact this is exactly how
>>> many
>>> > > interfaces works today. For example, SplitEnumerator#addSplitsBack()
>>> is
>>> > not
>>> > > supported by Pravega source because it does not support partial
>>> failover.
>>> > > In that case, it simply throws an exception to trigger a global
>>> recovery.
>>> > >
>>> > > The reason we add a default implementation in this case would just
>>> for
>>> > the
>>> > > sake of backwards compatibility so the old source can still compile.
>>> > Sure,
>>> > > in short term, this feature might not be supported by many existing
>>> > > sources. That is OK, and it is quite visible to the source developers
>>> > that
>>> > > they did not override the default impl which throws an
>>> > > UnsupportedOperationException.
>>> > >
>>> > > @Dawid,
>>> > >
>>> > > the Java doc of the SupportXXX() method in the Source would be the
>>> single
>>> > > >> source of truth regarding how to implement this feature.
>>> > > >
>>> > > >
>>> > >
>>> > > I also don't find it entirely true. Half of the classes are
>>> theoretically
>>> > > > optional and are utility classes from the point of view how the
>>> > > interfaces
>>> > > > are organized. Theoretically users do not need to use any of
>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>> methods in
>>> > > the
>>> > > > Source interface.
>>> > >
>>> > > I think the ultimate goal of java docs is to guide users to
>>> implement the
>>> > > Source. If SourceReaderBase is the preferred way to implement a
>>> > > SourceReader, it seems worth mentioning that. Even the Java language
>>> > > documentation interfaces lists the konwn implementations [1] so
>>> people
>>> > can
>>> > > leverage them. But for this particular case, if we make the feature
>>> > > non-optional, we don't even need the supportXXX() method for now.
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Jiangjie (Becket) Qin
>>> > >
>>> > >
>>> > >
>>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>>> dwysakowicz@apache.org>
>>> > > wrote:
>>> > >
>>> > > > Hey Piotr and Becket,
>>> > > >
>>> > > > First of all, let me say I am happy with whichever option is
>>> agreed in
>>> > > the
>>> > > > discussion.
>>> > > >
>>> > > > I wanted to clarify a few points from the discussion though:
>>> > > >
>>> > > > @Becket:
>>> > > >
>>> > > > The main argument for adding the methods to the SourceReader is
>>> that
>>> > > these
>>> > > > methods are effectively NON-OPTIONAL to SourceReader impl, i.e.
>>> > starting
>>> > > > from this FLIP, all the SourceReaders impl are expected to support
>>> this
>>> > > > method, although some old implementations may not have implemented
>>> this
>>> > > > feature. I think we should distinguish the new features from the
>>> > optional
>>> > > > features. While the public decorative interface is a solution to
>>> the
>>> > > > optional features. We should not use it for the features that are
>>> > > > non-optional.
>>> > > >
>>> > > > I don't think that this feature is NON-OPTIONAL. Even though
>>> > preferred, I
>>> > > > still think it can be simply optional.
>>> > > >
>>> > > > the Java doc of the SupportXXX() method in the Source would be the
>>> > single
>>> > > > source of truth regarding how to implement this feature.
>>> > > >
>>> > > > I also don't find it entirely true. Half of the classes are
>>> > theoretically
>>> > > > optional and are utility classes from the point of view how the
>>> > > interfaces
>>> > > > are organized. Theoretically users do not need to use any of
>>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>>> methods in
>>> > > the
>>> > > > Source interface.
>>> > > >
>>> > > > @Piotr
>>> > > >
>>> > > > If we have all of the methods with default implementation in the
>>> base
>>> > > > interface, the API doesn't give any clue to the user which set of
>>> > methods
>>> > > > are required to be implemented at the same time.
>>> > > >
>>> > > > I feel that no matter which option we choose this can not be solved
>>> > > > entirely in either of the options, because of the point above and
>>> > because
>>> > > > the signature of SplitReader#pauseOrResumeSplits and
>>> > > > SourceReader#pauseOrResumeSplits are slightly different (one
>>> identifies
>>> > > > splits with splitId the other one passes the splits directly).
>>> > > >
>>> > > > Best,
>>> > > >
>>> > > > Dawid
>>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>>> > > >
>>> > > > Hi Piotr,
>>> > > >
>>> > > > Thanks for the comment.
>>> > > >
>>> > > > Just to clarify, I am not against the decorative interfaces, but I
>>> do
>>> > > > think we should use them with caution. The main argument for
>>> adding the
>>> > > > methods to the SourceReader is that these methods are
>>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting from
>>> this
>>> > > > FLIP, all the SourceReaders impl are expected to support this
>>> > > > method, although some old implementations may not have implemented
>>> this
>>> > > > feature. I think we should distinguish the new features from the
>>> > optional
>>> > > > features. While the public decorative interface is a solution to
>>> the
>>> > > > optional features. We should not use it for the features that are
>>> > > > non-optional.
>>> > > >
>>> > > > That said, this feature is optional for SplitReaders. Arguably we
>>> can
>>> > > have
>>> > > > a decorative interface for that, but for simplicity and symmetry
>>> of the
>>> > > > interface, personally I prefer just adding a new method.
>>> > > >
>>> > > > Regarding the advantages you mentioned about the decorative
>>> interfaces,
>>> > > > they would make sense if:
>>> > > > 1. The feature is optional.
>>> > > > 2. There is only one decorative interface involved for a feature.
>>> > > > Otherwise the argument that all the methods are grouped together
>>> will
>>> > not
>>> > > > stand.
>>> > > >
>>> > > > Compared with that, I think the current solution works fine in all
>>> > cases,
>>> > > > i.e. "having supportXXX() method in Source, and default methods /
>>> > > > decorative interfaces in base interfaces.".
>>> > > >
>>> > > > The advantages are:
>>> > > >> - clean and easy to implement base interface
>>> > > >
>>> > > > In the current approach, the Java doc of the SupportXXX() method
>>> in the
>>> > > > Source would be the single source of truth regarding how to
>>> implement
>>> > > this
>>> > > > feature. It lists the method that has to be implemented to support
>>> this
>>> > > > feature, regardless of how many classes / interfaces are involved.
>>> > > >
>>> > > > When implementing the base interface, users do not need to
>>> implement a
>>> > > > method with default implementation. If they are curious what the
>>> method
>>> > > is
>>> > > > for, the java doc of that method simply points users to the
>>> > SupportXXX()
>>> > > > method in the Source. It seems not adding work to the users
>>> compared
>>> > with
>>> > > > decorative interfaces, but gives much better discoverability.
>>> > > >
>>> > > > - all of the methods from a single feature are grouped in a single
>>> > > >> decorator interface, together with their dedicated java doc. It's
>>> also
>>> > > >> easier to google search for help using the decorator name
>>> > > >
>>> > > > - if an optional feature requires two methods to be implemented at
>>> > once,
>>> > > >> decorator can guarantee that
>>> > > >
>>> > > > These two points are not true when multiple components and classes
>>> are
>>> > > > involved collaboratively to provide a feature. In our case, we have
>>> > both
>>> > > > SourceReader and SplitReader involved. And there might be other
>>> > > interfaces
>>> > > > on the JM side involved for some future features. So the relevant
>>> > methods
>>> > > > can actually be scattered over the places. That said, we may still
>>> use
>>> > > > decorative interfaces for each component, if the feature is
>>> optional,
>>> > > given
>>> > > > there is a single source of truth for the feature.
>>> > > >
>>> > > > Here I would strongly lean towards making life easier for new
>>> users,
>>> > > >> lowering the entry barrier, at the (imo) slight expense for the
>>> power
>>> > > >> users.
>>> > > >
>>> > > > I actually think the current approach is simpler, more extensible
>>> and
>>> > > more
>>> > > > general for all the users. Can you articulate a bit more on which
>>> part
>>> > > you
>>> > > > think makes users harder to understand?
>>> > > >
>>> > > >
>>> > > > There is another benefit of the decorative interfaces which is not
>>> > > > mentioned, but might be worth considering here. Usually the
>>> decorative
>>> > > > interfaces give slightly better backwards compatibility than the
>>> new
>>> > > > default method in the interfaces. That is when users are using a
>>> jar
>>> > that
>>> > > > was compiled with an older version of Flink which does not have the
>>> > > default
>>> > > > method in the interfaces in question. A decorative interface may
>>> still
>>> > > > provide backwards compatibility in that case, while default method
>>> impl
>>> > > > cannot.
>>> > > >
>>> > > > I think in Flink we in general do not guarantee custom components
>>> > > compiled
>>> > > > with an older version can run with a newer version of Flink. A
>>> > recompile
>>> > > > with a newer version would be required. That said, if we do care
>>> about
>>> > > > this, we can just change the "supportXXX()" method in the Source
>>> > > interface
>>> > > > to use decorative interfaces, and leave the other parts unchanged.
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Jiangjie (Becket) Qin
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <
>>> pnowojski@apache.org>
>>> > > > wrote:
>>> > > >
>>> > > >> Hi,
>>> > > >>
>>> > > >> Sorry for chipping in so late, but I was OoO for the last two
>>> weeks.
>>> > > >> Regarding the interfaces, I would be actually against adding those
>>> > > methods
>>> > > >> to the base interfaces for the reasons mentioned above. Clogging
>>> the
>>> > > base
>>> > > >> interface for new users with tons of methods that they do not
>>> need, do
>>> > > not
>>> > > >> understand and do not know what to do with them. Moreover, such
>>> > > decorative
>>> > > >> interfaces are solving a problem if a feature requires two or more
>>> > > methods
>>> > > >> to be implemented at the same time. If we have all of the methods
>>> with
>>> > > >> default implementation in the base interface, the API doesn't
>>> give any
>>> > > >> clue
>>> > > >> to the user which set of methods are required to be implemented
>>> at the
>>> > > >> same
>>> > > >> time.
>>> > > >>
>>> > > >> > a) I feel the biggest drawback of decorative interfaces is which
>>> > > >> interface
>>> > > >> > they can decorate and which combinations of multiple decorative
>>> > > >> interfaces
>>> > > >> > are valid. (...)
>>> > > >> > In the future, if there is a new feature added
>>> > > >> > (e.g. sorted or pre-partitioned data aware), are we going to
>>> create
>>> > > >> another
>>> > > >> > interface of SplitReader such as SortedSplitReader or
>>> > > >> PrePartitionedAware?
>>> > > >> > Can they be combined? So I think the additional decorative
>>> interface
>>> > > >> like
>>> > > >> > withSplitsAlignment actually increases the understanding cost of
>>> > users
>>> > > >> > because they have to know what decorative interfaces are there,
>>> > which
>>> > > >> > interface they can decorate and which combinations of the
>>> decorative
>>> > > >> > interfaces are valid and which are not. Ideally we want to avoid
>>> > that.
>>> > > >>
>>> > > >> I'm not sure if I understand how embedding default methods in the
>>> base
>>> > > >> interface is solving the problem: what can be combined or not? If
>>> > there
>>> > > >> are
>>> > > >> two conflicting features, having decorative interfaces that can
>>> not be
>>> > > >> mixed together actually makes much more sense to me rather than
>>> having
>>> > > >> them
>>> > > >> all in one base class. How would you allow users to implement
>>> only one
>>> > > of
>>> > > >> those two features?
>>> > > >>
>>> > > >> To reiterate on the issue. Yes, there are drawbacks:
>>> > > >> - how can a user discover what decorators are there?
>>> > > >> - how can a user know where the decorator can be applied to?
>>> > > >>
>>> > > >> However those are drawbacks for more power users, that can be
>>> > mitigated
>>> > > by
>>> > > >> the documentation. For example listing all of the decorators with
>>> > > >> detailed explanation both in the docs and in the java docs. More
>>> > > >> experienced users will be able to deal with those issues easier,
>>> as
>>> > they
>>> > > >> will already have some basic understanding of Flink. Also if user
>>> has
>>> > a
>>> > > >> problem that he wants to solve, he will google search a potential
>>> > > solution
>>> > > >> to his problem anyway, and while doing that he is very likely to
>>> > > discover
>>> > > >> the decorator that he needs anyway in the docs.
>>> > > >>
>>> > > >> The advantages are:
>>> > > >> - clean and easy to implement base interface
>>> > > >> - all of the methods from a single feature are grouped in a single
>>> > > >> decorator interface, together with their dedicated java doc. It's
>>> also
>>> > > >> easier to google search for help using the decorator name
>>> > > >> - if an optional feature requires two methods to be implemented at
>>> > once,
>>> > > >> decorator can guarantee that
>>> > > >>
>>> > > >> Here I would strongly lean towards making life easier for new
>>> users,
>>> > > >> lowering the entry barrier, at the (imo) slight expense for the
>>> power
>>> > > >> users.
>>> > > >>
>>> > > >> Best,
>>> > > >> Piotrek
>>> > > >>
>>> > > >>
>>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
>>> > napisał(a):
>>> > > >>
>>> > > >> > Thanks for the reply Sebastian and Dawid,
>>> > > >> >
>>> > > >> > I think Sebastion has a good summary. This is a really helpful
>>> > > >> discussion.
>>> > > >> >
>>> > > >> > Thinking a bit more, I feel that it might still be better to
>>> add the
>>> > > >> > supportsXXX() method in the Source rather than SourceReader.
>>> > > >> >
>>> > > >> > Generally speaking, what we are trying to do here is to let the
>>> > Flink
>>> > > >> > framework know what the Source is capable of. In this FLIP, it
>>> > happens
>>> > > >> to
>>> > > >> > be the capability that only involves SourceReader. But in the
>>> > future,
>>> > > >> it is
>>> > > >> > possible that another functionality involves both the
>>> > SplitEnumerator
>>> > > >> and
>>> > > >> > SourceReader. In that case, following the current approach, we
>>> > should
>>> > > >> put
>>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
>>> SourceReader.
>>> > > >> > Because if we only put this in the SourceReader, then the JM
>>> would
>>> > > have
>>> > > >> to
>>> > > >> > create a SourceReader in order to know whether this feature is
>>> > > >> supported,
>>> > > >> > which is a little ugly. But if we put the "supportsXXX()"
>>> method in
>>> > > the
>>> > > >> > Source, we will break the "symmetric" design because this FLIP
>>> > chose a
>>> > > >> > different way.
>>> > > >> >
>>> > > >> > This is also why I think supportsXXX() method seems a good
>>> thing to
>>> > > >> have,
>>> > > >> > because when there are a few interfaces / methods that are
>>> expected
>>> > to
>>> > > >> be
>>> > > >> > implemented at the same time in order to deliver a feature, it
>>> is
>>> > > always
>>> > > >> > good to have a single source of truth to tell the framework
>>> what to
>>> > > do,
>>> > > >> so
>>> > > >> > the framework can do consistent things in different parts.
>>> > > >> >
>>> > > >> > @Sebastian Mattheis <se...@ververica.com>
>>> > > >> >
>>> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
>>> > > >> > AlignedSplitReader, what I feel awkward about is that we are
>>> > > essentially
>>> > > >> > expecting almost all the SourceReader implementations to extend
>>> > > >> > SourceReaderBase, which effectively makes the SourceReader
>>> interface
>>> > > >> > without the pausing support useless. So this indicates that
>>> public
>>> > > >> > decorative interfaces (or sub-interfaces for the same purpose)
>>> only
>>> > > >> > make sense if the original interface is also expected to be
>>> used.
>>> > > >> > Otherwise, it seems makes more sense to add the method to the
>>> > original
>>> > > >> > interface itself.
>>> > > >> >
>>> > > >> > Cheers,
>>> > > >> >
>>> > > >> > Jiangjie (Becket) Qin
>>> > > >> >
>>> > > >> >
>>> > > >> >
>>> > > >> >
>>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
>>> > > >> dwysakowicz@apache.org>
>>> > > >> > wrote:
>>> > > >> >
>>> > > >> > > Thanks @Sebastian for the nice summary.
>>> > > >> > >
>>> > > >> > > I think most of your points aligned with the suggestions I
>>> made to
>>> > > the
>>> > > >> > > FLIP, while you were writing your reply (I believe we hit
>>> enter
>>> > > >> nearly at
>>> > > >> > > the same time ;) )
>>> > > >> > >
>>> > > >> > > Two points after we synced offline
>>> > > >> > >
>>> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment to
>>> > > >> > > supportsPausingSplits to express the general capability of
>>> > pausing.
>>> > > >> > >
>>> > > >> > > 2. As for if we should PausingSourceReader/PausingSplitReader
>>> > > (option
>>> > > >> b)
>>> > > >> > > or if we should just add the methods (option c), I suggest to
>>> > simply
>>> > > >> add
>>> > > >> > > the two methods as I felt this is much preferred approach
>>> Becket,
>>> > > >> which
>>> > > >> > > others do not object. Unless there is an opposition let's go
>>> with
>>> > > this
>>> > > >> > > option c.
>>> > > >> > >
>>> > > >> > > Best,
>>> > > >> > >
>>> > > >> > > Dawid
>>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>>> > > >> > >
>>> > > >> > > Hi folks,
>>> > > >> > >
>>> > > >> > > Sorry for being a bit silent. Many thanks for all the input
>>> and
>>> > > >> > > suggestions. As I'm a bit new, I needed some time to catch up
>>> and
>>> > > >> > structure
>>> > > >> > > (for myself) the discussion and I wanted to find a way to
>>> > structure
>>> > > >> the
>>> > > >> > > conclusions. (Also because I had the feeling that some
>>> concerns
>>> > got
>>> > > >> lost
>>> > > >> > in
>>> > > >> > > the discussion.) This is my attempt and please correct me if
>>> > > >> something is
>>> > > >> > > wrong or misunderstood. I tried to collect and assemble the
>>> > > opinions,
>>> > > >> > > suggestions, and conclusions (to the best of my knowledge):
>>> > > >> > >
>>> > > >> > > # Top A: Should split alignment (pause/resume behavior) be a
>>> > general
>>> > > >> > > capability?
>>> > > >> > >
>>> > > >> > > I personally don't see any reason no to have it a general
>>> > capability
>>> > > >> > > because for the alignSplit method it is actually independent
>>> of
>>> > the
>>> > > >> > > watermarks. If we agree here to have it a general capability,
>>> we
>>> > > >> should
>>> > > >> > > also agree on the right wording. Does
>>> "alignSplits(splitsToResume,
>>> > > >> > > splitsToPause)" refer to what is then actually meant? (I see
>>> it as
>>> > > >> okay.
>>> > > >> > I
>>> > > >> > > don't have any better idea whilst Arvid suggested
>>> > > >> "pauseOrResumeSplits".)
>>> > > >> > >
>>> > > >> > > # Top B: Should it be possible do enable/disable split
>>> alignment?
>>> > > >> > >
>>> > > >> > > I would personally not disable the split alignment on the
>>> source
>>> > > >> reader
>>> > > >> > > side because if split alignment is used for some other use
>>> case
>>> > (see
>>> > > >> A)
>>> > > >> > it
>>> > > >> > > could have nasty side effects on other/future use cases.
>>> Instead,
>>> > I
>>> > > >> would
>>> > > >> > > disable "watermark split alignment" where I think it should
>>> > disable
>>> > > >> the
>>> > > >> > > watermark-dependent trigger for split alignment.
>>> > > >> > >
>>> > > >> > > # Top C: Should we add a supportsX method?
>>> > > >> > >
>>> > > >> > > I find it difficult to define the scope of a supportsX method
>>> > w.r.t.
>>> > > >> to
>>> > > >> > > the following questions: a) Where is it used? and b) What is
>>> the
>>> > > >> expected
>>> > > >> > > output? To b), it's not straight-forward to provide a
>>> meaningful
>>> > > >> output,
>>> > > >> > > e.g., if SourceReader supports split alignment but SplitReader
>>> > not.
>>> > > >> This
>>> > > >> > is
>>> > > >> > > because with the current implementation, we can determine
>>> whether
>>> > > >> split
>>> > > >> > > alignment is fully supported only during runtime and
>>> specifically
>>> > > >> > actually
>>> > > >> > > only when calling alignSplits down the call hierarchy up to
>>> the
>>> > > actual
>>> > > >> > > SplitReaders.
>>> > > >> > >
>>> > > >> > > Therefore, I would suggest to either raise an error or
>>> warning if
>>> > > the
>>> > > >> > > alignment is called but not supported at some point. I know we
>>> > > should
>>> > > >> > > carefully think about when this could be the case because we
>>> don't
>>> > > >> want
>>> > > >> > to
>>> > > >> > > flood anybody with such warnings. However, warnings could be
>>> an
>>> > > >> indicator
>>> > > >> > > for the user that for watermark split alignment use case split
>>> > > >> reading is
>>> > > >> > > imbalanced with the conclusion to either disable the trigger
>>> for
>>> > > >> > watermark
>>> > > >> > > split alignment (see Top B) or to use/implement a source and
>>> > reader
>>> > > >> that
>>> > > >> > > fully supports split alignment.
>>> > > >> > >
>>> > > >> > > # Top D: How to design interfaces?
>>> > > >> > >
>>> > > >> > > Thanks for structuring the discussion with the the various
>>> > > >> possibilities
>>> > > >> > > (a-d). From the discussion and emails, I would like to
>>> summarize
>>> > the
>>> > > >> > > following requirements:
>>> > > >> > > - Interfaces should be consistent ("symmetric"), i.e., similar
>>> > > >> semantics
>>> > > >> > > should have similar interfaces with similar usage.
>>> > > >> > > - Make explicit which implementations implement
>>> interfaces/support
>>> > > >> > > behavior.
>>> > > >> > > - Make clear what are default implementations and how to
>>> implement
>>> > > >> > > interfaces with desired behavior.
>>> > > >> > >
>>> > > >> > > This is a simplified view of the relations between relevant
>>> > classes
>>> > > of
>>> > > >> > the
>>> > > >> > > PoC implementation:
>>> > > >> > >
>>> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- ..
>>> > <|--
>>> > > >> > > MySourceReader
>>> > > >> > >
>>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
>>> > SplitFetcher
>>> > > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
>>> > > >> > >
>>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
>>> > > >> > >
>>> > > >> > > Note that SourceReaderBase and SplitFetcherManager implement
>>> most
>>> > of
>>> > > >> the
>>> > > >> > > "logic" for split alignment just because we wanted to
>>> implement
>>> > > split
>>> > > >> > > alignment and wanted it to be available as kind of a default.
>>> As a
>>> > > >> > > consequence, we have a "default implementation" for
>>> SourceReader
>>> > > that
>>> > > >> > > implements the actual logic for split alignment. For that
>>> reason,
>>> > I
>>> > > >> find
>>> > > >> > it
>>> > > >> > > very confusing to have a NOOP default implementation in the
>>> > > interface
>>> > > >> for
>>> > > >> > > the SourceReader. As a consequence, interface strategy c) is
>>> > > difficult
>>> > > >> > > because this would require NOOP default implementations in the
>>> > > public
>>> > > >> > > interfaces of SourceReader and SplitReader. This is the same
>>> for
>>> > > >> strategy
>>> > > >> > > d) because it would require NOOP default implementation in the
>>> > > >> > > SourceReader. Further, as Dawid described method signatures of
>>> > > >> alignSplit
>>> > > >> > > for SourceReader and SplitReader differ and it would be
>>> extremely
>>> > > >> > difficult
>>> > > >> > > to make the signatures the same (with even potential
>>> performance
>>> > > >> impact
>>> > > >> > > because of additional loop-ups of split ids). Therefore,
>>> having a
>>> > > >> > symmetric
>>> > > >> > > decorative interface as of strategy a) is actually not
>>> possible
>>> > and
>>> > > >> > having
>>> > > >> > > two decorative interfaces with different method signatures is
>>> > > >> confusing.
>>> > > >> > My
>>> > > >> > > conclusion is that we are best with strategy b) which means to
>>> > have
>>> > > >> > > specializing sub-interfaces that inherit from the parent
>>> > interface:
>>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
>>> > > >> > AlignedSplitReader
>>> > > >> > > With this option, I'm not 100% sure what the implications are
>>> and
>>> > if
>>> > > >> this
>>> > > >> > > could get nasty. I would suggest that Dawid and I just try to
>>> > > >> implement
>>> > > >> > and
>>> > > >> > > see if we like it. :)
>>> > > >> > >
>>> > > >> > > # Summary
>>> > > >> > >
>>> > > >> > > In conclusion, please let me know your perspectives. Please
>>> > correct
>>> > > >> me,
>>> > > >> > if
>>> > > >> > > something is wrong or if I misunderstood something. My
>>> perspective
>>> > > >> would
>>> > > >> > be:
>>> > > >> > >
>>> > > >> > > Top A: Yes
>>> > > >> > > Top B: Yes (but disable watermark trigger for split alignment)
>>> > > >> > > Top C: No
>>> > > >> > > Top D: b)
>>> > > >> > >
>>> > > >> > > Best,
>>> > > >> > > Sebastian
>>> > > >> > >
>>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
>>> > > >> dwysakowicz@apache.org
>>> > > >> > >
>>> > > >> > > wrote:
>>> > > >> > >
>>> > > >> > >> @Arvid:
>>> > > >> > >>
>>> > > >> > >> While I also like Becket's capability approach, I fear that
>>> it
>>> > > >> doesn't
>>> > > >> > work
>>> > > >> > >> for this particular use case: Sources can always be aligned
>>> > > >> cross-task
>>> > > >> > and
>>> > > >> > >> this is just about intra-task alignment. So it's plausible
>>> to put
>>> > > >> > sources
>>> > > >> > >> into an alignment group even though they do not use any of
>>> the
>>> > > >> presented
>>> > > >> > >> API of FLIP-217. They should just issue a warning, if they
>>> handle
>>> > > >> > multiple
>>> > > >> > >> splits (see motivation section).
>>> > > >> > >>
>>> > > >> > >> Yes, but the "supportXXX" method would be for telling if it
>>> > > supports
>>> > > >> > that
>>> > > >> > >> intra-task alignment. Cross-task alignment would always be
>>> > > supported.
>>> > > >> > >>
>>> > > >> > >> I updated interfaces to what I believe to be closest to a
>>> > consensus
>>> > > >> > >> between all participants. Do you mind taking a look?
>>> > > >> > >>
>>> > > >> > >> @Sebastian Do you mind addressing the nits?
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >>
>>> > > >> > >> Dawid
>>> > > >> > >>
>>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>>> > > >> > >>
>>> > > >> > >> Thanks for pushing this effort.
>>> > > >> > >>
>>> > > >> > >> I'd actually be in favor of 1b). I fully agree that decorator
>>> > > >> interfaces
>>> > > >> > >> should be avoided but I'm also not a big fan of overloading
>>> the
>>> > > base
>>> > > >> > >> interfaces (they are hard to implement as is). The usual
>>> feedback
>>> > > to
>>> > > >> > >> Source-related interfaces are always that they are
>>> overwhelming
>>> > and
>>> > > >> too
>>> > > >> > >> hard to implement. However, I'd also not oppose 1c) as
>>> scattered
>>> > > >> > interfaces
>>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
>>> > > >> > >> While I also like Becket's capability approach, I fear that
>>> it
>>> > > >> doesn't
>>> > > >> > work
>>> > > >> > >> for this particular use case: Sources can always be aligned
>>> > > >> cross-task
>>> > > >> > and
>>> > > >> > >> this is just about intra-task alignment. So it's plausible
>>> to put
>>> > > >> > sources
>>> > > >> > >> into an alignment group even though they do not use any of
>>> the
>>> > > >> presented
>>> > > >> > >> API of FLIP-217. They should just issue a warning, if they
>>> handle
>>> > > >> > multiple
>>> > > >> > >> splits (see motivation section).
>>> > > >> > >>
>>> > > >> > >> I think renaming alignSplits to facilitate future use cases
>>> makes
>>> > > >> sense
>>> > > >> > but
>>> > > >> > >> then all interfaces (if 1c) is chosen) should be adjusted
>>> > > >> accordingly.
>>> > > >> > >> AlignedSourceReader could be PausingSourceReader and I'd go
>>> for
>>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could also
>>> > split
>>> > > it
>>> > > >> > into
>>> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may
>>> allow
>>> > > >> Sources
>>> > > >> > to
>>> > > >> > >> just use 1 instead of 2 library calls (as written in the
>>> > Javadoc),
>>> > > >> both
>>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if there is a
>>> > system
>>> > > >> that
>>> > > >> > >> can.
>>> > > >> > >>
>>> > > >> > >> Some nit for the FLIP:
>>> > > >> > >> - Please replace "stop" with "pause".
>>> > > >> > >> - Not sure if it's worth it in the capability section:
>>> Sources
>>> > that
>>> > > >> > adopt
>>> > > >> > >> this interface cannot be used in earlier versions. So it
>>> feels
>>> > like
>>> > > >> we
>>> > > >> > are
>>> > > >> > >> only forward compatible (old sources can be used after the
>>> > change);
>>> > > >> but
>>> > > >> > I
>>> > > >> > >> guess this holds for any API addition.
>>> > > >> > >> - You might want to add what happens when all splits are
>>> paused.
>>> > > >> > >> - You may want to describe how the 3 flavors of
>>> SourceReaderBase
>>> > > >> > interact
>>> > > >> > >> with the interface.
>>> > > >> > >> - I'm not sure if it makes sense to include Kafka and Pulsar
>>> in
>>> > the
>>> > > >> > FLIP.
>>> > > >> > >> For me, this is rather immediate follow-up work. (could be
>>> in the
>>> > > >> same
>>> > > >> > >> umbrella ticket)
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >>
>>> > > >> > >> Arvid
>>> > > >> > >>
>>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>> > > >> > >> wrote:
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>> WithSplitsAlignment",
>>> > > >> along
>>> > > >> > >> with "MySplitReader implements SplitReader,
>>> WithSplitsAlignment",
>>> > > or
>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>> > > "MySplitReader
>>> > > >> > >> implements AlignedSplitReader", or
>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>> "MySplitReader
>>> > > >> > implements
>>> > > >> > >> SplitReader".
>>> > > >> > >>
>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>> "MySplitReader
>>> > > >> > implements
>>> > > >> > >> AlignedSplitReader".
>>> > > >> > >> I am fine with this API, although personally speaking I
>>> think it
>>> > is
>>> > > >> > simpler
>>> > > >> > >> to just add a new method to the split reader with default
>>> impl.
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> I think that is a good idea to have it aligned as much as
>>> > possible.
>>> > > >> I'd
>>> > > >> > be
>>> > > >> > >> +1 for your option c). We can merge AlignedSplitReader with
>>> > > >> > SplitReader. We
>>> > > >> > >> will update the FLIP shortly.
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >>
>>> > > >> > >> Dawid
>>> > > >> > >>
>>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>>> > > >> > >>
>>> > > >> > >> Thanks for the comment, Jark.
>>> > > >> > >>
>>> > > >> > >> 3. Interface/Method Name.
>>> > > >> > >>
>>> > > >> > >> Can the interface be used to align other things in the
>>> future?
>>> > For
>>> > > >> > example,
>>> > > >> > >> align read speed, I have
>>> > > >> > >> seen users requesting global rate limits. This feature may
>>> also
>>> > > need
>>> > > >> an
>>> > > >> > >> interface like this.
>>> > > >> > >> If we don't plan to extend this interface to support align
>>> other
>>> > > >> > things, I
>>> > > >> > >> suggest explicitly declaring
>>> > > >> > >> the purpose of the methods, such as
>>> `alignWatermarksForSplits`
>>> > > >> instead
>>> > > >> > of
>>> > > >> > >> `alignSplits`.
>>> > > >> > >>
>>> > > >> > >> This is a good point. Naming wise, it would usually be more
>>> > > >> extensible
>>> > > >> > to
>>> > > >> > >> just describe what the method actually does, instead of
>>> assuming
>>> > > the
>>> > > >> > >> purpose of doing this. For example, in this case,
>>> > > >> pauseOrResumeSplits()
>>> > > >> > >> would be more extensible because this can be used for any
>>> kind of
>>> > > >> flow
>>> > > >> > >> control, be it watermark alignment or simple rate limiting.
>>> > > >> > >>
>>> > > >> > >> 4. Interface or Method.
>>> > > >> > >>
>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>> their
>>> > own
>>> > > >> > >> advantages.
>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>> abilities
>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>> > > >> > >> and I prefer Interfaces rather than methods in this case.
>>> When
>>> > you
>>> > > >> have
>>> > > >> > a
>>> > > >> > >> bunch of abilities and each ability
>>> > > >> > >> has more than one method, Interfaces can help to organize
>>> them
>>> > and
>>> > > >> make
>>> > > >> > >> users clear which methods
>>> > > >> > >> need to implement when you want to have an ability.
>>> > > >> > >>
>>> > > >> > >> I am OK with decorative interfaces if this is a general
>>> design
>>> > > >> pattern
>>> > > >> > in
>>> > > >> > >> the other components in Flink. But it looks like the current
>>> API
>>> > > >> > proposal
>>> > > >> > >> is not symmetric.
>>> > > >> > >>
>>> > > >> > >> The current proposal is essentially "MySourceReader
>>> implements
>>> > > >> > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
>>> > > >> implements
>>> > > >> > >> AlignedSplitsReader".
>>> > > >> > >>
>>> > > >> > >> Should we make the API symmetric? I'd consider any one of the
>>> > > >> following
>>> > > >> > as
>>> > > >> > >> symmetric.
>>> > > >> > >>
>>> > > >> > >> a) "MySourceReader implements SourceReader,
>>> WithSplitsAlignment",
>>> > > >> along
>>> > > >> > >> with "MySplitReader implements SplitReader,
>>> WithSplitsAlignment",
>>> > > or
>>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>>> > > "MySplitReader
>>> > > >> > >> implements AlignedSplitReader", or
>>> > > >> > >> c) "MySourceReader implements SourceReader" and
>>> "MySplitReader
>>> > > >> > implements
>>> > > >> > >> SplitReader".
>>> > > >> > >>
>>> > > >> > >> I think the latest proposal according to Dawid would be:
>>> > > >> > >> d) "MySourceReader implements SourceReader" and
>>> "MySplitReader
>>> > > >> > implements
>>> > > >> > >> AlignedSplitReader".
>>> > > >> > >> I am fine with this API, although personally speaking I
>>> think it
>>> > is
>>> > > >> > simpler
>>> > > >> > >> to just add a new method to the split reader with default
>>> impl.
>>> > > >> > >>
>>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
>>> > dwysakowicz@apache.org
>>> > > >
>>> > > >> <
>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for
>>> the
>>> > > reply.
>>> > > >> > >>
>>> > > >> > >> Having said that, as I don't have a preference and I agree
>>> most
>>> > of
>>> > > >> the
>>> > > >> > >>
>>> > > >> > >> sources will support the alignment I am fine following your
>>> > > >> suggestion
>>> > > >> > to
>>> > > >> > >> have the SourceReader extending from
>>> > WithWatermarksSplitsAlignment,
>>> > > >> but
>>> > > >> > >> would put the "supportsXXX" there, not in the Source to keep
>>> the
>>> > > two
>>> > > >> > >> methods together.
>>> > > >> > >>
>>> > > >> > >> One benefit of having the "supportsXXX" in Source is that
>>> this
>>> > > allows
>>> > > >> > some
>>> > > >> > >> compile time check. For example, if a user enabled watermark
>>> > > >> alignment
>>> > > >> > >> while it is not supported by the Source, an exception can be
>>> > thrown
>>> > > >> at
>>> > > >> > >> compile time. It seems in general useful. That said, I agree
>>> that
>>> > > API
>>> > > >> > >> cleanliness wise it is better to put the two methods
>>> together.
>>> > > >> > >>
>>> > > >> > >> Thanks,
>>> > > >> > >>
>>> > > >> > >> Jiangjie (Becket) Qin
>>> > > >> > >>
>>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
>>> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss
>>> it.
>>> > > >> > >> I don't have other concerns then.
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >> Jark
>>> > > >> > >>
>>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
>>> > > >> dwysakowicz@apache.org>
>>> > > >> > <dw...@apache.org> <dw...@apache.org> <
>>> > > >> dwysakowicz@apache.org>
>>> > > >> > >> wrote:
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> @Jark:
>>> > > >> > >>
>>> > > >> > >> 1. Will the framework always align with watermarks when the
>>> > source
>>> > > >> > >> implements the interface?
>>> > > >> > >> I'm afraid not every case needs watermark alignment even if
>>> Kafka
>>> > > >> > >> implements the interface,
>>> > > >> > >> and this will affect the throughput somehow. I agree with
>>> Becket
>>> > > >> > >> we may need a
>>> > > >> > >> `supportSplitsAlignment()` method for users to configure the
>>> > source
>>> > > >> to
>>> > > >> > >> enable/disable the alignment.
>>> > > >> > >>
>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>>> affect
>>> > > >> > >>
>>> > > >> > >> throughput
>>> > > >> > >>
>>> > > >> > >> if the reader is constantly
>>> > > >> > >>  switching between pause and resume. Can users configure the
>>> > > >> alignment
>>> > > >> > >> offset?
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> This is covered in the previous FLIP[1] which has been
>>> already
>>> > > >> > >>
>>> > > >> > >> implemented
>>> > > >> > >>
>>> > > >> > >> in 1.15. In short, it must be enabled with the watermark
>>> strategy
>>> > > >> which
>>> > > >> > >> also configures drift and update interval.
>>> > > >> > >>
>>> > > >> > >> If we don't plan to extend this interface to support align
>>> other
>>> > > >> things,
>>> > > >> > >>
>>> > > >> > >> I
>>> > > >> > >>
>>> > > >> > >> suggest explicitly declaring
>>> > > >> > >> the purpose of the methods, such as
>>> `alignWatermarksForSplits`
>>> > > >> instead
>>> > > >> > of
>>> > > >> > >> `alignSplits`.
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> Sure let's rename it.
>>> > > >> > >>
>>> > > >> > >> @Becket:
>>> > > >> > >>
>>> > > >> > >> I understand your point. On the other hand putting all
>>> methods,
>>> > > even
>>> > > >> > with
>>> > > >> > >> "supportsXXX" methods for enabling certain features, makes
>>> the
>>> > > entry
>>> > > >> > >> threshold for writing a new source higher. Instead of
>>> focusing on
>>> > > the
>>> > > >> > >>
>>> > > >> > >> basic
>>> > > >> > >>
>>> > > >> > >> and required properties of the Source, the person
>>> implementing a
>>> > > >> source
>>> > > >> > >> must bother with and need to figure out what all of the extra
>>> > > >> features
>>> > > >> > >>
>>> > > >> > >> are
>>> > > >> > >>
>>> > > >> > >> about and how to deal with them. It makes it also harder to
>>> > > organize
>>> > > >> > >> methods in coupled groups as Jark said.
>>> > > >> > >>
>>> > > >> > >> Having said that, as I don't have a preference and I agree
>>> most
>>> > of
>>> > > >> the
>>> > > >> > >> sources will support the alignment I am fine following your
>>> > > >> suggestion
>>> > > >> > to
>>> > > >> > >> have the SourceReader extending from
>>> > WithWatermarksSplitsAlignment,
>>> > > >> but
>>> > > >> > >> would put the "supportsXXX" there, not in the Source to keep
>>> the
>>> > > two
>>> > > >> > >> methods together.
>>> > > >> > >>
>>> > > >> > >> Lastly, I agree it is really unfortunate the "alignSplits"
>>> > methods
>>> > > >> > differ
>>> > > >> > >> slightly for SourceReader and SpitReader. The reason for
>>> that is
>>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas
>>> SplitReader
>>> > > needs
>>> > > >> the
>>> > > >> > >> actual splits to pause them. I found the discrepancy
>>> acceptable
>>> > for
>>> > > >> the
>>> > > >> > >> sake of simplifying changes significantly, especially as they
>>> > would
>>> > > >> > >>
>>> > > >> > >> highly
>>> > > >> > >>
>>> > > >> > >> likely impact performance as we would have to perform
>>> additional
>>> > > >> > lookups.
>>> > > >> > >> Moreover the SplitReader is a secondary interface.
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >>
>>> > > >> > >> Dawid
>>> > > >> > >>
>>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>>> > > >> > >>
>>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>>> > > >> > >>
>>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
>>> > > >> > >>
>>> > > >> > >> I just have some minor questions (maybe I missed something).
>>> > > >> > >>
>>> > > >> > >> 1. Will the framework always align with watermarks when the
>>> > source
>>> > > >> > >> implements the interface?
>>> > > >> > >> I'm afraid not every case needs watermark alignment even if
>>> Kafka
>>> > > >> > >> implements the interface,
>>> > > >> > >> and this will affect the throughput somehow. I agree with
>>> Becket
>>> > > >> > >> we may need a
>>> > > >> > >> `supportSplitsAlignment()` method for users to configure the
>>> > source
>>> > > >> to
>>> > > >> > >> enable/disable the alignment.
>>> > > >> > >>
>>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>>> affect
>>> > > >> > >>
>>> > > >> > >> throughput
>>> > > >> > >>
>>> > > >> > >> if the reader is constantly
>>> > > >> > >>  switching between pause and resume. Can users configure the
>>> > > >> alignment
>>> > > >> > >> offset?
>>> > > >> > >>
>>> > > >> > >> 3. Interface/Method Name.
>>> > > >> > >> Can the interface be used to align other things in the
>>> future?
>>> > For
>>> > > >> > >>
>>> > > >> > >> example,
>>> > > >> > >>
>>> > > >> > >> align read speed, I have
>>> > > >> > >> seen users requesting global rate limits. This feature may
>>> also
>>> > > need
>>> > > >> an
>>> > > >> > >> interface like this.
>>> > > >> > >> If we don't plan to extend this interface to support align
>>> other
>>> > > >> things,
>>> > > >> > >>
>>> > > >> > >> I
>>> > > >> > >>
>>> > > >> > >> suggest explicitly declaring
>>> > > >> > >> the purpose of the methods, such as
>>> `alignWatermarksForSplits`
>>> > > >> instead
>>> > > >> > of
>>> > > >> > >> `alignSplits`.
>>> > > >> > >>
>>> > > >> > >> 4. Interface or Method.
>>> > > >> > >> I don't have a strong opinion on this. I think they have
>>> their
>>> > own
>>> > > >> > >> advantages.
>>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>>> abilities
>>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>>> > > >> > >> and I prefer Interfaces rather than methods in this case.
>>> When
>>> > you
>>> > > >> have
>>> > > >> > a
>>> > > >> > >> bunch of abilities and each ability
>>> > > >> > >> has more than one method, Interfaces can help to organize
>>> them
>>> > and
>>> > > >> make
>>> > > >> > >> users clear which methods
>>> > > >> > >> need to implement when you want to have an ability.
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >> Jark
>>> > > >> > >>
>>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>>> becket.qin@gmail.com>
>>> > <
>>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
>>> becket.qin@gmail.com>
>>> > <
>>> > > >> > >> becket.qin@gmail.com> wrote:
>>> > > >> > >>
>>> > > >> > >> Hi Dawid,
>>> > > >> > >>
>>> > > >> > >> Thanks for the explanation. Apologies that I somehow misread
>>> a
>>> > > bunch
>>> > > >> of
>>> > > >> > >> "align" and thought they were "assign".
>>> > > >> > >>
>>> > > >> > >> Regarding 1, by default implementation, I was thinking of the
>>> > > default
>>> > > >> > >>
>>> > > >> > >> no-op
>>> > > >> > >>
>>> > > >> > >> implementation. I am a little worried about the
>>> proliferation of
>>> > > >> > >>
>>> > > >> > >> decorative
>>> > > >> > >>
>>> > > >> > >> interfaces. I think the most important thing about
>>> interfaces is
>>> > > that
>>> > > >> > >>
>>> > > >> > >> they
>>> > > >> > >>
>>> > > >> > >> are easy to understand. In this case, I prefer adding new
>>> method
>>> > to
>>> > > >> the
>>> > > >> > >> existing interface for the following reasons:
>>> > > >> > >>
>>> > > >> > >> a) I feel the biggest drawback of decorative interfaces is
>>> which
>>> > > >> > >>
>>> > > >> > >> interface
>>> > > >> > >>
>>> > > >> > >> they can decorate and which combinations of multiple
>>> decorative
>>> > > >> > >>
>>> > > >> > >> interfaces
>>> > > >> > >>
>>> > > >> > >> are valid. In the current FLIP, the withSplitsAlignment
>>> interface
>>> > > is
>>> > > >> > only
>>> > > >> > >> applicable to the SourceReader which means it can't decorate
>>> any
>>> > > >> other
>>> > > >> > >> interface. From an interface design perspective, a natural
>>> > question
>>> > > >> is
>>> > > >> > >>
>>> > > >> > >> why
>>> > > >> > >>
>>> > > >> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"?
>>> And it
>>> > > is
>>> > > >> > also
>>> > > >> > >> natural to assume that a split reader implementing both
>>> > SplitReader
>>> > > >> and
>>> > > >> > >> WithSplitAlignment would work, because a source reader
>>> > implementing
>>> > > >> > >> SourceReader and withSplitsAlignment works. So why isn't
>>> there an
>>> > > >> > >>
>>> > > >> > >> interface
>>> > > >> > >>
>>> > > >> > >> of AlignedSourceReader? In the future, if there is a new
>>> feature
>>> > > >> added
>>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we going to
>>> > create
>>> > > >> > >>
>>> > > >> > >> another
>>> > > >> > >>
>>> > > >> > >> interface of SplitReader such as SortedSplitReader or
>>> > > >> > >>
>>> > > >> > >> PrePartitionedAware?
>>> > > >> > >>
>>> > > >> > >> Can they be combined? So I think the additional decorative
>>> > > interface
>>> > > >> > like
>>> > > >> > >> withSplitsAlignment actually increases the understanding
>>> cost of
>>> > > >> users
>>> > > >> > >> because they have to know what decorative interfaces are
>>> there,
>>> > > which
>>> > > >> > >> interface they can decorate and which combinations of the
>>> > > decorative
>>> > > >> > >> interfaces are valid and which are not. Ideally we want to
>>> avoid
>>> > > >> that.
>>> > > >> > To
>>> > > >> > >> be clear, I am not opposing having an interface of
>>> > > >> withSplitsAlignment,
>>> > > >> > >>
>>> > > >> > >> it
>>> > > >> > >>
>>> > > >> > >> is completely OK to have it as an internal interface and let
>>> > > >> > SourceReader
>>> > > >> > >> and SplitReader both extend it.
>>> > > >> > >>
>>> > > >> > >> b) Adding a new method to the SourceReader with a default
>>> > > >> implementation
>>> > > >> > >>
>>> > > >> > >> of
>>> > > >> > >>
>>> > > >> > >> no-op would help avoid logic branching in the source logic,
>>> > > >> especially
>>> > > >> > >> given that we agree that the vast majority of the
>>> SourceReader
>>> > > >> > >> implementations, if not all, would just extend from the
>>> > > >> > SourceReaderBase.
>>> > > >> > >> That means adding a new method to the interface would
>>> effectively
>>> > > >> give
>>> > > >> > >>
>>> > > >> > >> the
>>> > > >> > >>
>>> > > >> > >> same user experience, but simpler.
>>> > > >> > >>
>>> > > >> > >> c) A related design principle that may be worth discussing
>>> is how
>>> > > do
>>> > > >> we
>>> > > >> > >>
>>> > > >> > >> let
>>> > > >> > >>
>>> > > >> > >> the Source implementations tell Flink what capability is
>>> > supported
>>> > > >> and
>>> > > >> > >>
>>> > > >> > >> what
>>> > > >> > >>
>>> > > >> > >> is not. Personally speaking I feel the most intuitive place
>>> to me
>>> > > is
>>> > > >> in
>>> > > >> > >>
>>> > > >> > >> the
>>> > > >> > >>
>>> > > >> > >> Source itself, because that is the entrance of the entire
>>> Source
>>> > > >> > >>
>>> > > >> > >> connector
>>> > > >> > >>
>>> > > >> > >> logic.
>>> > > >> > >>
>>> > > >> > >> Based on the above thoughts, I am wondering if the following
>>> > > >> interface
>>> > > >> > >> would be easier to understand by the users.
>>> > > >> > >>
>>> > > >> > >> - Change "withSplitsAlignment" to internal interface, let
>>> both
>>> > > >> > >>
>>> > > >> > >> SourceReader
>>> > > >> > >>
>>> > > >> > >> and SplitReader extend from it, with a default no-op
>>> > > implementation.
>>> > > >> > >> - Add a new method "boolean supportSplitsAlignment()" to the
>>> > Source
>>> > > >> > >> interface, with a default implementation returning false.
>>> Sources
>>> > > >> that
>>> > > >> > >>
>>> > > >> > >> have
>>> > > >> > >>
>>> > > >> > >> implemented the alignment logic can change this to return
>>> true,
>>> > and
>>> > > >> > >> override the alignSplits() methods in the SourceReader /
>>> > > SplitReader
>>> > > >> if
>>> > > >> > >> needed.
>>> > > >> > >> - In the future, if a new optional feature is going to be
>>> added
>>> > to
>>> > > >> the
>>> > > >> > >> Source, and that feature requires the awareness from Flink,
>>> we
>>> > can
>>> > > >> add
>>> > > >> > >>
>>> > > >> > >> more
>>> > > >> > >>
>>> > > >> > >> such methods to the Source.
>>> > > >> > >>
>>> > > >> > >> What do you think?
>>> > > >> > >>
>>> > > >> > >> Thanks,
>>> > > >> > >>
>>> > > >> > >> Jiangjie (Becket) Qin
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
>>> > > >> dwysakowicz@apache.org>
>>> > > >> > <dw...@apache.org>
>>> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>> > > >> > >>
>>> > > >> > >> wrote:
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> @Konstantin:
>>> > > >> > >>
>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>> (aka the
>>> > > >> stop &
>>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>>> only,
>>> > > >> correct?
>>> > > >> > >>
>>> > > >> > >> Correct, as far as I know though, those are the only sources
>>> > which
>>> > > >> > >>
>>> > > >> > >> consume
>>> > > >> > >>
>>> > > >> > >> concurrently from multiple splits and thus alignment applies.
>>> > > >> > >>
>>> > > >> > >> @Thomas:
>>> > > >> > >>
>>> > > >> > >> I wonder if "supporting" split alignment in SourceReaderBase
>>> and
>>> > > then
>>> > > >> > >>
>>> > > >> > >> doing
>>> > > >> > >>
>>> > > >> > >> nothing if the split reader does not implement
>>> AlignedSplitReader
>>> > > >> could
>>> > > >> > >>
>>> > > >> > >> be
>>> > > >> > >>
>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added
>>> to
>>> > the
>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>>> > explicit
>>> > > >> that
>>> > > >> > >> the source actually supports it.
>>> > > >> > >>
>>> > > >> > >> I understand your concern. Hmm, I think we could actually do
>>> > that.
>>> > > >> Given
>>> > > >> > >> the actual implementation of the
>>> SourceReaderBase#alignSplits is
>>> > > >> rather
>>> > > >> > >> short (just a forward to the corresponding method of
>>> > SplitFetcher),
>>> > > >> we
>>> > > >> > >> could reimplement it in the actual source implementations.
>>> This
>>> > > >> solution
>>> > > >> > >> has the downside though. Authors of new sources would have
>>> to do
>>> > > two
>>> > > >> > >> things: extend from AlignedSplitReader and implement
>>> > > >> > >>
>>> > > >> > >> WithSplitsAssignment,
>>> > > >> > >>
>>> > > >> > >> instead of just extending AlignedSplitReader. I would be fine
>>> > with
>>> > > >> such
>>> > > >> > a
>>> > > >> > >> tradeoff though. What others think?
>>> > > >> > >>
>>> > > >> > >> @Steven:
>>> > > >> > >>
>>> > > >> > >> For this part from the motivation section, is it accurate?
>>> Let's
>>> > > >> assume
>>> > > >> > >>
>>> > > >> > >> one
>>> > > >> > >>
>>> > > >> > >> source task consumes from 3 partitions and one of the
>>> partition
>>> > is
>>> > > >> > >> significantly slower. In this situation, watermark for this
>>> > source
>>> > > >> task
>>> > > >> > >> won't hold back as it is reading recent data from other two
>>> Kafka
>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>> > watermark.
>>> > > I
>>> > > >> > >> thought the problem is that we may have late data for this
>>> slow
>>> > > >> > >>
>>> > > >> > >> partition.
>>> > > >> > >>
>>> > > >> > >> It will hold back the watermark. Watermark of an operator is
>>> the
>>> > > >> minimum
>>> > > >> > >> of watermarks of all splits[1]
>>> > > >> > >>
>>> > > >> > >> I have another question about the restart. Say split
>>> alignment is
>>> > > >> > >> triggered. checkpoint is completed. job failed and restored
>>> from
>>> > > the
>>> > > >> > last
>>> > > >> > >> checkpoint. because alignment decision is not checkpointed,
>>> > > initially
>>> > > >> > >> alignment won't be enforced until we get a cycle of watermark
>>> > > >> > aggregation
>>> > > >> > >> and propagation, right? Not saying this corner is a problem.
>>> Just
>>> > > >> want
>>> > > >> > to
>>> > > >> > >> understand it more.
>>> > > >> > >>
>>> > > >> > >> Your understanding is correct.
>>> > > >> > >>
>>> > > >> > >> @Becket:
>>> > > >> > >>
>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>> case, so
>>> > > >> should
>>> > > >> > >>
>>> > > >> > >> we
>>> > > >> > >>
>>> > > >> > >> just add the related methods to SourceReader directly
>>> instead of
>>> > > >> > >> introducing the new interface of WithSplitAssignment? We can
>>> > > provide
>>> > > >> > >> default implementations, so backwards compatibility won't be
>>> an
>>> > > >> issue.
>>> > > >> > >>
>>> > > >> > >> I don't think we can provide a default implementation. How
>>> would
>>> > we
>>> > > >> do
>>> > > >> > >> that? Would it be just a no-op? Is it better than having an
>>> > opt-in
>>> > > >> > >> interface? The default implementation would have to be added
>>> > > >> exclusively
>>> > > >> > >>
>>> > > >> > >> in
>>> > > >> > >>
>>> > > >> > >> a *Public* SourceReader interface. By the way notice
>>> > > SourceReaderBase
>>> > > >> > >> does extend from WithSplitsAlignment, so effectively all
>>> > > >> implementations
>>> > > >> > >>
>>> > > >> > >> do
>>> > > >> > >>
>>> > > >> > >> handle the alignment case. To be honest I think it is
>>> impossible
>>> > to
>>> > > >> > >> implement the SourceReader interface directly by end users.
>>> > > >> > >>
>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably also
>>> > needs
>>> > > >> some
>>> > > >> > >> change to support throttling at the split granularity. Can
>>> you
>>> > add
>>> > > >> that
>>> > > >> > >> interface change into the public interface section as well?
>>> > > >> > >>
>>> > > >> > >> It has been added from the beginning. See
>>> *AlignedSplitReader.*
>>> > > >> > >>
>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
>>> > given
>>> > > >> that
>>> > > >> > >>
>>> > > >> > >> it
>>> > > >> > >>
>>> > > >> > >> is not actually changing the split assignments? It seems
>>> > something
>>> > > >> like
>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>>> > > accurate.
>>> > > >> > >>
>>> > > >> > >> The method's called *alignSplits*, not assign. Do you still
>>> > prefer
>>> > > a
>>> > > >> > >> different name for that? Personally, I am open for
>>> suggestions
>>> > > here.
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >>
>>> > > >> > >> Dawid
>>> > > >> > >>
>>> > > >> > >> [1]
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> >
>>> > > >>
>>> > >
>>> >
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>>> > > >> > >>
>>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>>> > > >> > >>
>>> > > >> > >> Thanks for driving the effort, Sebastion. I think the
>>> motivation
>>> > > >> makes a
>>> > > >> > >> lot of sense. Just a few suggestions / questions.
>>> > > >> > >>
>>> > > >> > >> 1. I think watermark alignment is sort of a general use
>>> case, so
>>> > > >> should
>>> > > >> > >>
>>> > > >> > >> we
>>> > > >> > >>
>>> > > >> > >> just add the related methods to SourceReader directly
>>> instead of
>>> > > >> > >> introducing the new interface of WithSplitAssignment? We can
>>> > > provide
>>> > > >> > >> default implementations, so backwards compatibility won't be
>>> an
>>> > > >> issue.
>>> > > >> > >>
>>> > > >> > >> 2. As you mentioned, the SplitReader interface probably also
>>> > needs
>>> > > >> some
>>> > > >> > >> change to support throttling at the split granularity. Can
>>> you
>>> > add
>>> > > >> that
>>> > > >> > >> interface change into the public interface section as well?
>>> > > >> > >>
>>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
>>> > given
>>> > > >> that
>>> > > >> > >>
>>> > > >> > >> it
>>> > > >> > >>
>>> > > >> > >> is not actually changing the split assignments? It seems
>>> > something
>>> > > >> like
>>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>>> > > accurate.
>>> > > >> > >>
>>> > > >> > >> Thanks,
>>> > > >> > >>
>>> > > >> > >> Jiangjie (Becket) Qin
>>> > > >> > >>
>>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
>>> stevenz3wu@gmail.com
>>> > >
>>> > > <
>>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
>>> stevenz3wu@gmail.com>
>>> > <
>>> > > >> > >> stevenz3wu@gmail.com> <
>>> > > >> > >> stevenz3wu@gmail.com> wrote:
>>> > > >> > >>
>>> > > >> > >> However, a single source operator may read data from multiple
>>> > > >> > >>
>>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions, such that
>>> > even
>>> > > >> with
>>> > > >> > >> watermark alignment the source operator may need to buffer
>>> > > excessive
>>> > > >> > >>
>>> > > >> > >> amount
>>> > > >> > >>
>>> > > >> > >> of data if one split emits data faster than another.
>>> > > >> > >>
>>> > > >> > >> For this part from the motivation section, is it accurate?
>>> Let's
>>> > > >> assume
>>> > > >> > >>
>>> > > >> > >> one
>>> > > >> > >>
>>> > > >> > >> source task consumes from 3 partitions and one of the
>>> partition
>>> > is
>>> > > >> > >> significantly slower. In this situation, watermark for this
>>> > source
>>> > > >> task
>>> > > >> > >> won't hold back as it is reading recent data from other two
>>> Kafka
>>> > > >> > >> partitions. As a result, it won't hold back the overall
>>> > watermark.
>>> > > I
>>> > > >> > >> thought the problem is that we may have late data for this
>>> slow
>>> > > >> > >>
>>> > > >> > >> partition.
>>> > > >> > >>
>>> > > >> > >> I have another question about the restart. Say split
>>> alignment is
>>> > > >> > >> triggered. checkpoint is completed. job failed and restored
>>> from
>>> > > the
>>> > > >> > last
>>> > > >> > >> checkpoint. because alignment decision is not checkpointed,
>>> > > initially
>>> > > >> > >> alignment won't be enforced until we get a cycle of watermark
>>> > > >> > aggregation
>>> > > >> > >> and propagation, right? Not saying this corner is a problem.
>>> Just
>>> > > >> want
>>> > > >> > to
>>> > > >> > >> understand it more.
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org>
>>> <
>>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>>> > > >> > >> thw@apache.org> <
>>> > > >> > >> thw@apache.org> wrote:
>>> > > >> > >>
>>> > > >> > >> Thanks for working on this!
>>> > > >> > >>
>>> > > >> > >> I wonder if "supporting" split alignment in SourceReaderBase
>>> and
>>> > > then
>>> > > >> > >>
>>> > > >> > >> doing
>>> > > >> > >>
>>> > > >> > >> nothing if the split reader does not implement
>>> AlignedSplitReader
>>> > > >> could
>>> > > >> > >>
>>> > > >> > >> be
>>> > > >> > >>
>>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added
>>> to
>>> > the
>>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>>> > explicit
>>> > > >> that
>>> > > >> > >> the source actually supports it.
>>> > > >> > >>
>>> > > >> > >> Thanks,
>>> > > >> > >> Thomas
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
>>> > > knaufk@apache.org>
>>> > > >> <
>>> > > >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
>>> > > >> > >> knaufk@apache.org> <
>>> > > >> > >> knaufk@apache.org>
>>> > > >> > >>
>>> > > >> > >> wrote:
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> Hi Sebastian, Hi Dawid,
>>> > > >> > >>
>>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface
>>> (aka the
>>> > > >> stop
>>> > > >> > >>
>>> > > >> > >> &
>>> > > >> > >>
>>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>>> only,
>>> > > >> > >>
>>> > > >> > >> correct?
>>> > > >> > >>
>>> > > >> > >> +1 in general. I believe it is valuable to complete the
>>> watermark
>>> > > >> > >>
>>> > > >> > >> aligned
>>> > > >> > >>
>>> > > >> > >> story with this FLIP.
>>> > > >> > >>
>>> > > >> > >> Cheers,
>>> > > >> > >>
>>> > > >> > >> Konstantin
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
>>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>>> > > >> > >>
>>> > > >> > >> wrote:
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >> To be explicit, having worked on it, I support it ;) I think
>>> we
>>> > can
>>> > > >> > >> start a vote thread soonish, as there are no concerns so far.
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >>
>>> > > >> > >> Dawid
>>> > > >> > >>
>>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>>> > > >> > >>
>>> > > >> > >> Dear Flink developers,
>>> > > >> > >>
>>> > > >> > >> I would like to open a discussion on FLIP 217 [1] for an
>>> > extension
>>> > > >> > >>
>>> > > >> > >> of
>>> > > >> > >>
>>> > > >> > >> Watermark Alignment to perform alignment also in
>>> SplitReaders. To
>>> > > >> > >>
>>> > > >> > >> do
>>> > > >> > >>
>>> > > >> > >> so,
>>> > > >> > >>
>>> > > >> > >> SplitReaders must be able to suspend and resume reading from
>>> > split
>>> > > >> > >>
>>> > > >> > >> sources
>>> > > >> > >>
>>> > > >> > >> where the SourceOperator coordinates and controlls suspend
>>> and
>>> > > >> > >>
>>> > > >> > >> resume.
>>> > > >> > >>
>>> > > >> > >> To
>>> > > >> > >>
>>> > > >> > >> gather information about current watermarks of the
>>> SplitReaders,
>>> > we
>>> > > >> > >>
>>> > > >> > >> extend
>>> > > >> > >>
>>> > > >> > >> the internal WatermarkOutputMulitplexer and report
>>> watermarks to
>>> > > >> > >>
>>> > > >> > >> the
>>> > > >> > >>
>>> > > >> > >> SourceOperator.
>>> > > >> > >>
>>> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise
>>> and
>>> > > >> > >>
>>> > > >> > >> revised
>>> > > >> > >>
>>> > > >> > >> and
>>> > > >> > >>
>>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the work.) and
>>> me.
>>> > The
>>> > > >> > >>
>>> > > >> > >> changes
>>> > > >> > >>
>>> > > >> > >> are backwards compatible in a way that if affected
>>> components do
>>> > > >> > >>
>>> > > >> > >> not
>>> > > >> > >>
>>> > > >> > >> support split alignment the behavior is as before.
>>> > > >> > >>
>>> > > >> > >> Best,
>>> > > >> > >> Sebastian
>>> > > >> > >>
>>> > > >> > >> [1]
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> >
>>> > > >>
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>>> > > >> > >>
>>> > > >> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
>>> > > >> > >>
>>> > > >> > >> --
>>> > > >> > >>
>>> > > >> > >> Konstantin Knaufhttps://
>>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>>> > > >> > >>
>>> > > >> > >>
>>> > > >> >
>>> > > >>
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Sebastian Mattheis <se...@ververica.com>.
Hi Piotr, Hi Becket, Hi everybody,

we, Dawid and I, discussed the various suggestions/options and we would be
okay either way because we find neither solution is perfect just because of
the already present complexity.

Option c) Adding methods to the interfaces of SourceReader and SplitReader
Option a) Adding decorative interfaces to be used by SourceReader and
SplitReader

As of the current status (v. 12) of the FLIP [1], it is based on Option c)
which we find acceptable because the complexity added is only a single
method.

Nevertheless, Piotr has valid concerns about Option c) which I see as
follows:
(1) An interface with default NOOP implementation makes the implementation
optional. And in my opinion, a default implementation is and will remain a
way of making implementation optional because even in future a developer
can decide to implement the "old flavor" without support for pausable
splits.
(2) It may not be too critical but I also find it suboptimal that with a
NOOP default implementation there is no way to check at runtime if
SourceReader or SplitReader actually support pausing. (To do so, one would
need a supportsX method which makes it again more complicated.)

However, we haven't changed it because Option a) is also not optimal or
straight-forward:
(1) We need to add two distinct yet similar decorative interfaces since, as
mentioned, the signatures of the methods are different. For example, we
would need decorative interfaces like `SplitReaderWithPausableSplits` and
`SourceReaderWithPausableSplits`.
(2) As a consequence, we would need to somehow document how/where to
implement both interfaces and how this relates to each other. This we could
solve by adding a note in the interface of SourceReader and SplitReader and
reference to the decorative interfaces but it still increases complexity
too.

In summary, we see both as acceptable and preferred over other options. The
question is if we can find a solution or compromise that is acceptable for
everybody to reach consensus.

Please let us know what you think because we would be happy if we can
conclude the discussion to avoid dropping the initiative on this FLIP.

Regards,
Sebastian

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
(v. 12)

On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski <pn...@apache.org> wrote:

> Hi Guowei,
>
> as Dawid wrote a couple of messages back:
>
> > This is covered in the previous FLIP[1] which has been already
> implemented in 1.15. In short, it must be enabled with the watermark
> strategy which also configures drift and update interval
>
> So by default watermark alignment is disabled, regardless if a source
> supports it or not.
>
> Best,
> Piotrek
>
> czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com> napisał(a):
>
>> Hi,
>>
>> We know that in the case of Bounded input Flink supports the Batch
>> execution mode. Currently in Batch execution mode, flink is executed on a
>> stage-by-stage basis. In this way, perhaps watermark alignment might not
>> gain much.
>>
>> So my question is: Is watermark alignment the default behavior(for
>> implemented source only)? If so, have you considered evaluating the impact
>> of this behavior on the Batch execution mode? Or thinks it is not
>> necessary.
>>
>> Correct me if I miss something.
>>
>> Best,
>> Guowei
>>
>>
>> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <pi...@gmail.com>
>> wrote:
>>
>> > Hi Becket and Dawid,
>> >
>> > > I feel that no matter which option we choose this can not be solved
>> > entirely in either of the options, because of the point above and
>> because
>> > the signature of SplitReader#pauseOrResumeSplits and
>> > SourceReader#pauseOrResumeSplits are slightly different (one identifies
>> > splits with splitId the other one passes the splits directly).
>> >
>> > Yes, that's a good point in this case and for features that need to be
>> > implemented in more than one place.
>> >
>> > > Is there any reason for pausing reading from a split an optional
>> feature,
>> > > other than that this was not included in the original interface?
>> >
>> > An additional argument in favor of making it optional is to simplify
>> source
>> > implementation. But on its own I'm not sure if that would be enough to
>> > justify making this feature optional. Maybe.
>> >
>> > > I think it would be way simpler and clearer to just let end users and
>> > Flink
>> > > assume all the connectors will implement this feature.
>> >
>> > As I wrote above that would be an interesting choice to make (ease of
>> > implementation for new users, vs system consistency). Regardless of
>> that,
>> > yes, for me the main argument is the API backward compatibility. But
>> let's
>> > clear a couple of points:
>> > - The current proposal adding methods to the base interface with default
>> > implementations is an OPTIONAL feature. Same as the decorative version
>> > would be.
>> > - Decorative version could implement "throw
>> UnsupportedOperationException"
>> > if user enabled watermark alignment just as well and I agree that's a
>> > better option compared to logging a warning.
>> >
>> > Best,
>> > Piotrek
>> >
>> >
>> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com> napisał(a):
>> >
>> > > Thanks for the reply and patient discussion, Piotr and Dawid.
>> > >
>> > > Is there any reason for pausing reading from a split an optional
>> feature,
>> > > other than that this was not included in the original interface?
>> > >
>> > > To be honest I am really worried about the complexity of the user
>> story
>> > > here. Optional features like this have a high overhead. Imagine this
>> > > feature is optional, now a user enabled watermark alignment and
>> defined a
>> > > few watermark groups. Would it work? Hmm, that depends on whether the
>> > > involved Source has implmemented this feature. If the Sources are well
>> > > documented, good luck. Otherwise end users may have to look into the
>> code
>> > > of the Source to see whether the feature is supported. Which is
>> something
>> > > they shouldn't have to do.
>> > >
>> > > I think it would be way simpler and clearer to just let end users and
>> > Flink
>> > > assume all the connectors will implement this feature. After all the
>> > > watermark group is not optinoal to the end users. If in some rare
>> cases,
>> > > the feature cannot be supported, a clear UnsupportedOperationException
>> > will
>> > > be thrown to tell users to explicitly remove this Source from the
>> > watermark
>> > > group. I don't think we should have a warning message here, as they
>> tend
>> > to
>> > > be ignored in many cases. If we do this, we don't even need the
>> > supportXXX
>> > > method in the Source for this feature. In fact this is exactly how
>> many
>> > > interfaces works today. For example, SplitEnumerator#addSplitsBack()
>> is
>> > not
>> > > supported by Pravega source because it does not support partial
>> failover.
>> > > In that case, it simply throws an exception to trigger a global
>> recovery.
>> > >
>> > > The reason we add a default implementation in this case would just for
>> > the
>> > > sake of backwards compatibility so the old source can still compile.
>> > Sure,
>> > > in short term, this feature might not be supported by many existing
>> > > sources. That is OK, and it is quite visible to the source developers
>> > that
>> > > they did not override the default impl which throws an
>> > > UnsupportedOperationException.
>> > >
>> > > @Dawid,
>> > >
>> > > the Java doc of the SupportXXX() method in the Source would be the
>> single
>> > > >> source of truth regarding how to implement this feature.
>> > > >
>> > > >
>> > >
>> > > I also don't find it entirely true. Half of the classes are
>> theoretically
>> > > > optional and are utility classes from the point of view how the
>> > > interfaces
>> > > > are organized. Theoretically users do not need to use any of
>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>> methods in
>> > > the
>> > > > Source interface.
>> > >
>> > > I think the ultimate goal of java docs is to guide users to implement
>> the
>> > > Source. If SourceReaderBase is the preferred way to implement a
>> > > SourceReader, it seems worth mentioning that. Even the Java language
>> > > documentation interfaces lists the konwn implementations [1] so people
>> > can
>> > > leverage them. But for this particular case, if we make the feature
>> > > non-optional, we don't even need the supportXXX() method for now.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > >
>> > >
>> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
>> dwysakowicz@apache.org>
>> > > wrote:
>> > >
>> > > > Hey Piotr and Becket,
>> > > >
>> > > > First of all, let me say I am happy with whichever option is agreed
>> in
>> > > the
>> > > > discussion.
>> > > >
>> > > > I wanted to clarify a few points from the discussion though:
>> > > >
>> > > > @Becket:
>> > > >
>> > > > The main argument for adding the methods to the SourceReader is that
>> > > these
>> > > > methods are effectively NON-OPTIONAL to SourceReader impl, i.e.
>> > starting
>> > > > from this FLIP, all the SourceReaders impl are expected to support
>> this
>> > > > method, although some old implementations may not have implemented
>> this
>> > > > feature. I think we should distinguish the new features from the
>> > optional
>> > > > features. While the public decorative interface is a solution to the
>> > > > optional features. We should not use it for the features that are
>> > > > non-optional.
>> > > >
>> > > > I don't think that this feature is NON-OPTIONAL. Even though
>> > preferred, I
>> > > > still think it can be simply optional.
>> > > >
>> > > > the Java doc of the SupportXXX() method in the Source would be the
>> > single
>> > > > source of truth regarding how to implement this feature.
>> > > >
>> > > > I also don't find it entirely true. Half of the classes are
>> > theoretically
>> > > > optional and are utility classes from the point of view how the
>> > > interfaces
>> > > > are organized. Theoretically users do not need to use any of
>> > > > SourceReaderBase & SplitReader. Would be weird to list their
>> methods in
>> > > the
>> > > > Source interface.
>> > > >
>> > > > @Piotr
>> > > >
>> > > > If we have all of the methods with default implementation in the
>> base
>> > > > interface, the API doesn't give any clue to the user which set of
>> > methods
>> > > > are required to be implemented at the same time.
>> > > >
>> > > > I feel that no matter which option we choose this can not be solved
>> > > > entirely in either of the options, because of the point above and
>> > because
>> > > > the signature of SplitReader#pauseOrResumeSplits and
>> > > > SourceReader#pauseOrResumeSplits are slightly different (one
>> identifies
>> > > > splits with splitId the other one passes the splits directly).
>> > > >
>> > > > Best,
>> > > >
>> > > > Dawid
>> > > > On 03/05/2022 14:30, Becket Qin wrote:
>> > > >
>> > > > Hi Piotr,
>> > > >
>> > > > Thanks for the comment.
>> > > >
>> > > > Just to clarify, I am not against the decorative interfaces, but I
>> do
>> > > > think we should use them with caution. The main argument for adding
>> the
>> > > > methods to the SourceReader is that these methods are
>> > > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting from
>> this
>> > > > FLIP, all the SourceReaders impl are expected to support this
>> > > > method, although some old implementations may not have implemented
>> this
>> > > > feature. I think we should distinguish the new features from the
>> > optional
>> > > > features. While the public decorative interface is a solution to the
>> > > > optional features. We should not use it for the features that are
>> > > > non-optional.
>> > > >
>> > > > That said, this feature is optional for SplitReaders. Arguably we
>> can
>> > > have
>> > > > a decorative interface for that, but for simplicity and symmetry of
>> the
>> > > > interface, personally I prefer just adding a new method.
>> > > >
>> > > > Regarding the advantages you mentioned about the decorative
>> interfaces,
>> > > > they would make sense if:
>> > > > 1. The feature is optional.
>> > > > 2. There is only one decorative interface involved for a feature.
>> > > > Otherwise the argument that all the methods are grouped together
>> will
>> > not
>> > > > stand.
>> > > >
>> > > > Compared with that, I think the current solution works fine in all
>> > cases,
>> > > > i.e. "having supportXXX() method in Source, and default methods /
>> > > > decorative interfaces in base interfaces.".
>> > > >
>> > > > The advantages are:
>> > > >> - clean and easy to implement base interface
>> > > >
>> > > > In the current approach, the Java doc of the SupportXXX() method in
>> the
>> > > > Source would be the single source of truth regarding how to
>> implement
>> > > this
>> > > > feature. It lists the method that has to be implemented to support
>> this
>> > > > feature, regardless of how many classes / interfaces are involved.
>> > > >
>> > > > When implementing the base interface, users do not need to
>> implement a
>> > > > method with default implementation. If they are curious what the
>> method
>> > > is
>> > > > for, the java doc of that method simply points users to the
>> > SupportXXX()
>> > > > method in the Source. It seems not adding work to the users compared
>> > with
>> > > > decorative interfaces, but gives much better discoverability.
>> > > >
>> > > > - all of the methods from a single feature are grouped in a single
>> > > >> decorator interface, together with their dedicated java doc. It's
>> also
>> > > >> easier to google search for help using the decorator name
>> > > >
>> > > > - if an optional feature requires two methods to be implemented at
>> > once,
>> > > >> decorator can guarantee that
>> > > >
>> > > > These two points are not true when multiple components and classes
>> are
>> > > > involved collaboratively to provide a feature. In our case, we have
>> > both
>> > > > SourceReader and SplitReader involved. And there might be other
>> > > interfaces
>> > > > on the JM side involved for some future features. So the relevant
>> > methods
>> > > > can actually be scattered over the places. That said, we may still
>> use
>> > > > decorative interfaces for each component, if the feature is
>> optional,
>> > > given
>> > > > there is a single source of truth for the feature.
>> > > >
>> > > > Here I would strongly lean towards making life easier for new users,
>> > > >> lowering the entry barrier, at the (imo) slight expense for the
>> power
>> > > >> users.
>> > > >
>> > > > I actually think the current approach is simpler, more extensible
>> and
>> > > more
>> > > > general for all the users. Can you articulate a bit more on which
>> part
>> > > you
>> > > > think makes users harder to understand?
>> > > >
>> > > >
>> > > > There is another benefit of the decorative interfaces which is not
>> > > > mentioned, but might be worth considering here. Usually the
>> decorative
>> > > > interfaces give slightly better backwards compatibility than the new
>> > > > default method in the interfaces. That is when users are using a jar
>> > that
>> > > > was compiled with an older version of Flink which does not have the
>> > > default
>> > > > method in the interfaces in question. A decorative interface may
>> still
>> > > > provide backwards compatibility in that case, while default method
>> impl
>> > > > cannot.
>> > > >
>> > > > I think in Flink we in general do not guarantee custom components
>> > > compiled
>> > > > with an older version can run with a newer version of Flink. A
>> > recompile
>> > > > with a newer version would be required. That said, if we do care
>> about
>> > > > this, we can just change the "supportXXX()" method in the Source
>> > > interface
>> > > > to use decorative interfaces, and leave the other parts unchanged.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pnowojski@apache.org
>> >
>> > > > wrote:
>> > > >
>> > > >> Hi,
>> > > >>
>> > > >> Sorry for chipping in so late, but I was OoO for the last two
>> weeks.
>> > > >> Regarding the interfaces, I would be actually against adding those
>> > > methods
>> > > >> to the base interfaces for the reasons mentioned above. Clogging
>> the
>> > > base
>> > > >> interface for new users with tons of methods that they do not
>> need, do
>> > > not
>> > > >> understand and do not know what to do with them. Moreover, such
>> > > decorative
>> > > >> interfaces are solving a problem if a feature requires two or more
>> > > methods
>> > > >> to be implemented at the same time. If we have all of the methods
>> with
>> > > >> default implementation in the base interface, the API doesn't give
>> any
>> > > >> clue
>> > > >> to the user which set of methods are required to be implemented at
>> the
>> > > >> same
>> > > >> time.
>> > > >>
>> > > >> > a) I feel the biggest drawback of decorative interfaces is which
>> > > >> interface
>> > > >> > they can decorate and which combinations of multiple decorative
>> > > >> interfaces
>> > > >> > are valid. (...)
>> > > >> > In the future, if there is a new feature added
>> > > >> > (e.g. sorted or pre-partitioned data aware), are we going to
>> create
>> > > >> another
>> > > >> > interface of SplitReader such as SortedSplitReader or
>> > > >> PrePartitionedAware?
>> > > >> > Can they be combined? So I think the additional decorative
>> interface
>> > > >> like
>> > > >> > withSplitsAlignment actually increases the understanding cost of
>> > users
>> > > >> > because they have to know what decorative interfaces are there,
>> > which
>> > > >> > interface they can decorate and which combinations of the
>> decorative
>> > > >> > interfaces are valid and which are not. Ideally we want to avoid
>> > that.
>> > > >>
>> > > >> I'm not sure if I understand how embedding default methods in the
>> base
>> > > >> interface is solving the problem: what can be combined or not? If
>> > there
>> > > >> are
>> > > >> two conflicting features, having decorative interfaces that can
>> not be
>> > > >> mixed together actually makes much more sense to me rather than
>> having
>> > > >> them
>> > > >> all in one base class. How would you allow users to implement only
>> one
>> > > of
>> > > >> those two features?
>> > > >>
>> > > >> To reiterate on the issue. Yes, there are drawbacks:
>> > > >> - how can a user discover what decorators are there?
>> > > >> - how can a user know where the decorator can be applied to?
>> > > >>
>> > > >> However those are drawbacks for more power users, that can be
>> > mitigated
>> > > by
>> > > >> the documentation. For example listing all of the decorators with
>> > > >> detailed explanation both in the docs and in the java docs. More
>> > > >> experienced users will be able to deal with those issues easier, as
>> > they
>> > > >> will already have some basic understanding of Flink. Also if user
>> has
>> > a
>> > > >> problem that he wants to solve, he will google search a potential
>> > > solution
>> > > >> to his problem anyway, and while doing that he is very likely to
>> > > discover
>> > > >> the decorator that he needs anyway in the docs.
>> > > >>
>> > > >> The advantages are:
>> > > >> - clean and easy to implement base interface
>> > > >> - all of the methods from a single feature are grouped in a single
>> > > >> decorator interface, together with their dedicated java doc. It's
>> also
>> > > >> easier to google search for help using the decorator name
>> > > >> - if an optional feature requires two methods to be implemented at
>> > once,
>> > > >> decorator can guarantee that
>> > > >>
>> > > >> Here I would strongly lean towards making life easier for new
>> users,
>> > > >> lowering the entry barrier, at the (imo) slight expense for the
>> power
>> > > >> users.
>> > > >>
>> > > >> Best,
>> > > >> Piotrek
>> > > >>
>> > > >>
>> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
>> > napisał(a):
>> > > >>
>> > > >> > Thanks for the reply Sebastian and Dawid,
>> > > >> >
>> > > >> > I think Sebastion has a good summary. This is a really helpful
>> > > >> discussion.
>> > > >> >
>> > > >> > Thinking a bit more, I feel that it might still be better to add
>> the
>> > > >> > supportsXXX() method in the Source rather than SourceReader.
>> > > >> >
>> > > >> > Generally speaking, what we are trying to do here is to let the
>> > Flink
>> > > >> > framework know what the Source is capable of. In this FLIP, it
>> > happens
>> > > >> to
>> > > >> > be the capability that only involves SourceReader. But in the
>> > future,
>> > > >> it is
>> > > >> > possible that another functionality involves both the
>> > SplitEnumerator
>> > > >> and
>> > > >> > SourceReader. In that case, following the current approach, we
>> > should
>> > > >> put
>> > > >> > the "supportsXXX()" method in both SplitEnumerator and
>> SourceReader.
>> > > >> > Because if we only put this in the SourceReader, then the JM
>> would
>> > > have
>> > > >> to
>> > > >> > create a SourceReader in order to know whether this feature is
>> > > >> supported,
>> > > >> > which is a little ugly. But if we put the "supportsXXX()" method
>> in
>> > > the
>> > > >> > Source, we will break the "symmetric" design because this FLIP
>> > chose a
>> > > >> > different way.
>> > > >> >
>> > > >> > This is also why I think supportsXXX() method seems a good thing
>> to
>> > > >> have,
>> > > >> > because when there are a few interfaces / methods that are
>> expected
>> > to
>> > > >> be
>> > > >> > implemented at the same time in order to deliver a feature, it is
>> > > always
>> > > >> > good to have a single source of truth to tell the framework what
>> to
>> > > do,
>> > > >> so
>> > > >> > the framework can do consistent things in different parts.
>> > > >> >
>> > > >> > @Sebastian Mattheis <se...@ververica.com>
>> > > >> >
>> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
>> > > >> > AlignedSplitReader, what I feel awkward about is that we are
>> > > essentially
>> > > >> > expecting almost all the SourceReader implementations to extend
>> > > >> > SourceReaderBase, which effectively makes the SourceReader
>> interface
>> > > >> > without the pausing support useless. So this indicates that
>> public
>> > > >> > decorative interfaces (or sub-interfaces for the same purpose)
>> only
>> > > >> > make sense if the original interface is also expected to be used.
>> > > >> > Otherwise, it seems makes more sense to add the method to the
>> > original
>> > > >> > interface itself.
>> > > >> >
>> > > >> > Cheers,
>> > > >> >
>> > > >> > Jiangjie (Becket) Qin
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
>> > > >> dwysakowicz@apache.org>
>> > > >> > wrote:
>> > > >> >
>> > > >> > > Thanks @Sebastian for the nice summary.
>> > > >> > >
>> > > >> > > I think most of your points aligned with the suggestions I
>> made to
>> > > the
>> > > >> > > FLIP, while you were writing your reply (I believe we hit enter
>> > > >> nearly at
>> > > >> > > the same time ;) )
>> > > >> > >
>> > > >> > > Two points after we synced offline
>> > > >> > >
>> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment to
>> > > >> > > supportsPausingSplits to express the general capability of
>> > pausing.
>> > > >> > >
>> > > >> > > 2. As for if we should PausingSourceReader/PausingSplitReader
>> > > (option
>> > > >> b)
>> > > >> > > or if we should just add the methods (option c), I suggest to
>> > simply
>> > > >> add
>> > > >> > > the two methods as I felt this is much preferred approach
>> Becket,
>> > > >> which
>> > > >> > > others do not object. Unless there is an opposition let's go
>> with
>> > > this
>> > > >> > > option c.
>> > > >> > >
>> > > >> > > Best,
>> > > >> > >
>> > > >> > > Dawid
>> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>> > > >> > >
>> > > >> > > Hi folks,
>> > > >> > >
>> > > >> > > Sorry for being a bit silent. Many thanks for all the input and
>> > > >> > > suggestions. As I'm a bit new, I needed some time to catch up
>> and
>> > > >> > structure
>> > > >> > > (for myself) the discussion and I wanted to find a way to
>> > structure
>> > > >> the
>> > > >> > > conclusions. (Also because I had the feeling that some concerns
>> > got
>> > > >> lost
>> > > >> > in
>> > > >> > > the discussion.) This is my attempt and please correct me if
>> > > >> something is
>> > > >> > > wrong or misunderstood. I tried to collect and assemble the
>> > > opinions,
>> > > >> > > suggestions, and conclusions (to the best of my knowledge):
>> > > >> > >
>> > > >> > > # Top A: Should split alignment (pause/resume behavior) be a
>> > general
>> > > >> > > capability?
>> > > >> > >
>> > > >> > > I personally don't see any reason no to have it a general
>> > capability
>> > > >> > > because for the alignSplit method it is actually independent of
>> > the
>> > > >> > > watermarks. If we agree here to have it a general capability,
>> we
>> > > >> should
>> > > >> > > also agree on the right wording. Does
>> "alignSplits(splitsToResume,
>> > > >> > > splitsToPause)" refer to what is then actually meant? (I see
>> it as
>> > > >> okay.
>> > > >> > I
>> > > >> > > don't have any better idea whilst Arvid suggested
>> > > >> "pauseOrResumeSplits".)
>> > > >> > >
>> > > >> > > # Top B: Should it be possible do enable/disable split
>> alignment?
>> > > >> > >
>> > > >> > > I would personally not disable the split alignment on the
>> source
>> > > >> reader
>> > > >> > > side because if split alignment is used for some other use case
>> > (see
>> > > >> A)
>> > > >> > it
>> > > >> > > could have nasty side effects on other/future use cases.
>> Instead,
>> > I
>> > > >> would
>> > > >> > > disable "watermark split alignment" where I think it should
>> > disable
>> > > >> the
>> > > >> > > watermark-dependent trigger for split alignment.
>> > > >> > >
>> > > >> > > # Top C: Should we add a supportsX method?
>> > > >> > >
>> > > >> > > I find it difficult to define the scope of a supportsX method
>> > w.r.t.
>> > > >> to
>> > > >> > > the following questions: a) Where is it used? and b) What is
>> the
>> > > >> expected
>> > > >> > > output? To b), it's not straight-forward to provide a
>> meaningful
>> > > >> output,
>> > > >> > > e.g., if SourceReader supports split alignment but SplitReader
>> > not.
>> > > >> This
>> > > >> > is
>> > > >> > > because with the current implementation, we can determine
>> whether
>> > > >> split
>> > > >> > > alignment is fully supported only during runtime and
>> specifically
>> > > >> > actually
>> > > >> > > only when calling alignSplits down the call hierarchy up to the
>> > > actual
>> > > >> > > SplitReaders.
>> > > >> > >
>> > > >> > > Therefore, I would suggest to either raise an error or warning
>> if
>> > > the
>> > > >> > > alignment is called but not supported at some point. I know we
>> > > should
>> > > >> > > carefully think about when this could be the case because we
>> don't
>> > > >> want
>> > > >> > to
>> > > >> > > flood anybody with such warnings. However, warnings could be an
>> > > >> indicator
>> > > >> > > for the user that for watermark split alignment use case split
>> > > >> reading is
>> > > >> > > imbalanced with the conclusion to either disable the trigger
>> for
>> > > >> > watermark
>> > > >> > > split alignment (see Top B) or to use/implement a source and
>> > reader
>> > > >> that
>> > > >> > > fully supports split alignment.
>> > > >> > >
>> > > >> > > # Top D: How to design interfaces?
>> > > >> > >
>> > > >> > > Thanks for structuring the discussion with the the various
>> > > >> possibilities
>> > > >> > > (a-d). From the discussion and emails, I would like to
>> summarize
>> > the
>> > > >> > > following requirements:
>> > > >> > > - Interfaces should be consistent ("symmetric"), i.e., similar
>> > > >> semantics
>> > > >> > > should have similar interfaces with similar usage.
>> > > >> > > - Make explicit which implementations implement
>> interfaces/support
>> > > >> > > behavior.
>> > > >> > > - Make clear what are default implementations and how to
>> implement
>> > > >> > > interfaces with desired behavior.
>> > > >> > >
>> > > >> > > This is a simplified view of the relations between relevant
>> > classes
>> > > of
>> > > >> > the
>> > > >> > > PoC implementation:
>> > > >> > >
>> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- ..
>> > <|--
>> > > >> > > MySourceReader
>> > > >> > >
>> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
>> > SplitFetcher
>> > > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
>> > > >> > >
>> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
>> > > >> > >
>> > > >> > > Note that SourceReaderBase and SplitFetcherManager implement
>> most
>> > of
>> > > >> the
>> > > >> > > "logic" for split alignment just because we wanted to implement
>> > > split
>> > > >> > > alignment and wanted it to be available as kind of a default.
>> As a
>> > > >> > > consequence, we have a "default implementation" for
>> SourceReader
>> > > that
>> > > >> > > implements the actual logic for split alignment. For that
>> reason,
>> > I
>> > > >> find
>> > > >> > it
>> > > >> > > very confusing to have a NOOP default implementation in the
>> > > interface
>> > > >> for
>> > > >> > > the SourceReader. As a consequence, interface strategy c) is
>> > > difficult
>> > > >> > > because this would require NOOP default implementations in the
>> > > public
>> > > >> > > interfaces of SourceReader and SplitReader. This is the same
>> for
>> > > >> strategy
>> > > >> > > d) because it would require NOOP default implementation in the
>> > > >> > > SourceReader. Further, as Dawid described method signatures of
>> > > >> alignSplit
>> > > >> > > for SourceReader and SplitReader differ and it would be
>> extremely
>> > > >> > difficult
>> > > >> > > to make the signatures the same (with even potential
>> performance
>> > > >> impact
>> > > >> > > because of additional loop-ups of split ids). Therefore,
>> having a
>> > > >> > symmetric
>> > > >> > > decorative interface as of strategy a) is actually not possible
>> > and
>> > > >> > having
>> > > >> > > two decorative interfaces with different method signatures is
>> > > >> confusing.
>> > > >> > My
>> > > >> > > conclusion is that we are best with strategy b) which means to
>> > have
>> > > >> > > specializing sub-interfaces that inherit from the parent
>> > interface:
>> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
>> > > >> > AlignedSplitReader
>> > > >> > > With this option, I'm not 100% sure what the implications are
>> and
>> > if
>> > > >> this
>> > > >> > > could get nasty. I would suggest that Dawid and I just try to
>> > > >> implement
>> > > >> > and
>> > > >> > > see if we like it. :)
>> > > >> > >
>> > > >> > > # Summary
>> > > >> > >
>> > > >> > > In conclusion, please let me know your perspectives. Please
>> > correct
>> > > >> me,
>> > > >> > if
>> > > >> > > something is wrong or if I misunderstood something. My
>> perspective
>> > > >> would
>> > > >> > be:
>> > > >> > >
>> > > >> > > Top A: Yes
>> > > >> > > Top B: Yes (but disable watermark trigger for split alignment)
>> > > >> > > Top C: No
>> > > >> > > Top D: b)
>> > > >> > >
>> > > >> > > Best,
>> > > >> > > Sebastian
>> > > >> > >
>> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
>> > > >> dwysakowicz@apache.org
>> > > >> > >
>> > > >> > > wrote:
>> > > >> > >
>> > > >> > >> @Arvid:
>> > > >> > >>
>> > > >> > >> While I also like Becket's capability approach, I fear that it
>> > > >> doesn't
>> > > >> > work
>> > > >> > >> for this particular use case: Sources can always be aligned
>> > > >> cross-task
>> > > >> > and
>> > > >> > >> this is just about intra-task alignment. So it's plausible to
>> put
>> > > >> > sources
>> > > >> > >> into an alignment group even though they do not use any of the
>> > > >> presented
>> > > >> > >> API of FLIP-217. They should just issue a warning, if they
>> handle
>> > > >> > multiple
>> > > >> > >> splits (see motivation section).
>> > > >> > >>
>> > > >> > >> Yes, but the "supportXXX" method would be for telling if it
>> > > supports
>> > > >> > that
>> > > >> > >> intra-task alignment. Cross-task alignment would always be
>> > > supported.
>> > > >> > >>
>> > > >> > >> I updated interfaces to what I believe to be closest to a
>> > consensus
>> > > >> > >> between all participants. Do you mind taking a look?
>> > > >> > >>
>> > > >> > >> @Sebastian Do you mind addressing the nits?
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >>
>> > > >> > >> Dawid
>> > > >> > >>
>> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>> > > >> > >>
>> > > >> > >> Thanks for pushing this effort.
>> > > >> > >>
>> > > >> > >> I'd actually be in favor of 1b). I fully agree that decorator
>> > > >> interfaces
>> > > >> > >> should be avoided but I'm also not a big fan of overloading
>> the
>> > > base
>> > > >> > >> interfaces (they are hard to implement as is). The usual
>> feedback
>> > > to
>> > > >> > >> Source-related interfaces are always that they are
>> overwhelming
>> > and
>> > > >> too
>> > > >> > >> hard to implement. However, I'd also not oppose 1c) as
>> scattered
>> > > >> > interfaces
>> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
>> > > >> > >> While I also like Becket's capability approach, I fear that it
>> > > >> doesn't
>> > > >> > work
>> > > >> > >> for this particular use case: Sources can always be aligned
>> > > >> cross-task
>> > > >> > and
>> > > >> > >> this is just about intra-task alignment. So it's plausible to
>> put
>> > > >> > sources
>> > > >> > >> into an alignment group even though they do not use any of the
>> > > >> presented
>> > > >> > >> API of FLIP-217. They should just issue a warning, if they
>> handle
>> > > >> > multiple
>> > > >> > >> splits (see motivation section).
>> > > >> > >>
>> > > >> > >> I think renaming alignSplits to facilitate future use cases
>> makes
>> > > >> sense
>> > > >> > but
>> > > >> > >> then all interfaces (if 1c) is chosen) should be adjusted
>> > > >> accordingly.
>> > > >> > >> AlignedSourceReader could be PausingSourceReader and I'd go
>> for
>> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could also
>> > split
>> > > it
>> > > >> > into
>> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may
>> allow
>> > > >> Sources
>> > > >> > to
>> > > >> > >> just use 1 instead of 2 library calls (as written in the
>> > Javadoc),
>> > > >> both
>> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if there is a
>> > system
>> > > >> that
>> > > >> > >> can.
>> > > >> > >>
>> > > >> > >> Some nit for the FLIP:
>> > > >> > >> - Please replace "stop" with "pause".
>> > > >> > >> - Not sure if it's worth it in the capability section: Sources
>> > that
>> > > >> > adopt
>> > > >> > >> this interface cannot be used in earlier versions. So it feels
>> > like
>> > > >> we
>> > > >> > are
>> > > >> > >> only forward compatible (old sources can be used after the
>> > change);
>> > > >> but
>> > > >> > I
>> > > >> > >> guess this holds for any API addition.
>> > > >> > >> - You might want to add what happens when all splits are
>> paused.
>> > > >> > >> - You may want to describe how the 3 flavors of
>> SourceReaderBase
>> > > >> > interact
>> > > >> > >> with the interface.
>> > > >> > >> - I'm not sure if it makes sense to include Kafka and Pulsar
>> in
>> > the
>> > > >> > FLIP.
>> > > >> > >> For me, this is rather immediate follow-up work. (could be in
>> the
>> > > >> same
>> > > >> > >> umbrella ticket)
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >>
>> > > >> > >> Arvid
>> > > >> > >>
>> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>> > > >> > >> wrote:
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> a) "MySourceReader implements SourceReader,
>> WithSplitsAlignment",
>> > > >> along
>> > > >> > >> with "MySplitReader implements SplitReader,
>> WithSplitsAlignment",
>> > > or
>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>> > > "MySplitReader
>> > > >> > >> implements AlignedSplitReader", or
>> > > >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
>> > > >> > implements
>> > > >> > >> SplitReader".
>> > > >> > >>
>> > > >> > >> I think the latest proposal according to Dawid would be:
>> > > >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
>> > > >> > implements
>> > > >> > >> AlignedSplitReader".
>> > > >> > >> I am fine with this API, although personally speaking I think
>> it
>> > is
>> > > >> > simpler
>> > > >> > >> to just add a new method to the split reader with default
>> impl.
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> I think that is a good idea to have it aligned as much as
>> > possible.
>> > > >> I'd
>> > > >> > be
>> > > >> > >> +1 for your option c). We can merge AlignedSplitReader with
>> > > >> > SplitReader. We
>> > > >> > >> will update the FLIP shortly.
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >>
>> > > >> > >> Dawid
>> > > >> > >>
>> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
>> > > >> > >>
>> > > >> > >> Thanks for the comment, Jark.
>> > > >> > >>
>> > > >> > >> 3. Interface/Method Name.
>> > > >> > >>
>> > > >> > >> Can the interface be used to align other things in the future?
>> > For
>> > > >> > example,
>> > > >> > >> align read speed, I have
>> > > >> > >> seen users requesting global rate limits. This feature may
>> also
>> > > need
>> > > >> an
>> > > >> > >> interface like this.
>> > > >> > >> If we don't plan to extend this interface to support align
>> other
>> > > >> > things, I
>> > > >> > >> suggest explicitly declaring
>> > > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
>> > > >> instead
>> > > >> > of
>> > > >> > >> `alignSplits`.
>> > > >> > >>
>> > > >> > >> This is a good point. Naming wise, it would usually be more
>> > > >> extensible
>> > > >> > to
>> > > >> > >> just describe what the method actually does, instead of
>> assuming
>> > > the
>> > > >> > >> purpose of doing this. For example, in this case,
>> > > >> pauseOrResumeSplits()
>> > > >> > >> would be more extensible because this can be used for any
>> kind of
>> > > >> flow
>> > > >> > >> control, be it watermark alignment or simple rate limiting.
>> > > >> > >>
>> > > >> > >> 4. Interface or Method.
>> > > >> > >>
>> > > >> > >> I don't have a strong opinion on this. I think they have their
>> > own
>> > > >> > >> advantages.
>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>> abilities
>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>> > > >> > >> and I prefer Interfaces rather than methods in this case. When
>> > you
>> > > >> have
>> > > >> > a
>> > > >> > >> bunch of abilities and each ability
>> > > >> > >> has more than one method, Interfaces can help to organize them
>> > and
>> > > >> make
>> > > >> > >> users clear which methods
>> > > >> > >> need to implement when you want to have an ability.
>> > > >> > >>
>> > > >> > >> I am OK with decorative interfaces if this is a general design
>> > > >> pattern
>> > > >> > in
>> > > >> > >> the other components in Flink. But it looks like the current
>> API
>> > > >> > proposal
>> > > >> > >> is not symmetric.
>> > > >> > >>
>> > > >> > >> The current proposal is essentially "MySourceReader implements
>> > > >> > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
>> > > >> implements
>> > > >> > >> AlignedSplitsReader".
>> > > >> > >>
>> > > >> > >> Should we make the API symmetric? I'd consider any one of the
>> > > >> following
>> > > >> > as
>> > > >> > >> symmetric.
>> > > >> > >>
>> > > >> > >> a) "MySourceReader implements SourceReader,
>> WithSplitsAlignment",
>> > > >> along
>> > > >> > >> with "MySplitReader implements SplitReader,
>> WithSplitsAlignment",
>> > > or
>> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
>> > > "MySplitReader
>> > > >> > >> implements AlignedSplitReader", or
>> > > >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
>> > > >> > implements
>> > > >> > >> SplitReader".
>> > > >> > >>
>> > > >> > >> I think the latest proposal according to Dawid would be:
>> > > >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
>> > > >> > implements
>> > > >> > >> AlignedSplitReader".
>> > > >> > >> I am fine with this API, although personally speaking I think
>> it
>> > is
>> > > >> > simpler
>> > > >> > >> to just add a new method to the split reader with default
>> impl.
>> > > >> > >>
>> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
>> > dwysakowicz@apache.org
>> > > >
>> > > >> <
>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the
>> > > reply.
>> > > >> > >>
>> > > >> > >> Having said that, as I don't have a preference and I agree
>> most
>> > of
>> > > >> the
>> > > >> > >>
>> > > >> > >> sources will support the alignment I am fine following your
>> > > >> suggestion
>> > > >> > to
>> > > >> > >> have the SourceReader extending from
>> > WithWatermarksSplitsAlignment,
>> > > >> but
>> > > >> > >> would put the "supportsXXX" there, not in the Source to keep
>> the
>> > > two
>> > > >> > >> methods together.
>> > > >> > >>
>> > > >> > >> One benefit of having the "supportsXXX" in Source is that this
>> > > allows
>> > > >> > some
>> > > >> > >> compile time check. For example, if a user enabled watermark
>> > > >> alignment
>> > > >> > >> while it is not supported by the Source, an exception can be
>> > thrown
>> > > >> at
>> > > >> > >> compile time. It seems in general useful. That said, I agree
>> that
>> > > API
>> > > >> > >> cleanliness wise it is better to put the two methods together.
>> > > >> > >>
>> > > >> > >> Thanks,
>> > > >> > >>
>> > > >> > >> Jiangjie (Becket) Qin
>> > > >> > >>
>> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
>> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
>> > > >> > >> I don't have other concerns then.
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >> Jark
>> > > >> > >>
>> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
>> > > >> dwysakowicz@apache.org>
>> > > >> > <dw...@apache.org> <dw...@apache.org> <
>> > > >> dwysakowicz@apache.org>
>> > > >> > >> wrote:
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> @Jark:
>> > > >> > >>
>> > > >> > >> 1. Will the framework always align with watermarks when the
>> > source
>> > > >> > >> implements the interface?
>> > > >> > >> I'm afraid not every case needs watermark alignment even if
>> Kafka
>> > > >> > >> implements the interface,
>> > > >> > >> and this will affect the throughput somehow. I agree with
>> Becket
>> > > >> > >> we may need a
>> > > >> > >> `supportSplitsAlignment()` method for users to configure the
>> > source
>> > > >> to
>> > > >> > >> enable/disable the alignment.
>> > > >> > >>
>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>> affect
>> > > >> > >>
>> > > >> > >> throughput
>> > > >> > >>
>> > > >> > >> if the reader is constantly
>> > > >> > >>  switching between pause and resume. Can users configure the
>> > > >> alignment
>> > > >> > >> offset?
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> This is covered in the previous FLIP[1] which has been already
>> > > >> > >>
>> > > >> > >> implemented
>> > > >> > >>
>> > > >> > >> in 1.15. In short, it must be enabled with the watermark
>> strategy
>> > > >> which
>> > > >> > >> also configures drift and update interval.
>> > > >> > >>
>> > > >> > >> If we don't plan to extend this interface to support align
>> other
>> > > >> things,
>> > > >> > >>
>> > > >> > >> I
>> > > >> > >>
>> > > >> > >> suggest explicitly declaring
>> > > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
>> > > >> instead
>> > > >> > of
>> > > >> > >> `alignSplits`.
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> Sure let's rename it.
>> > > >> > >>
>> > > >> > >> @Becket:
>> > > >> > >>
>> > > >> > >> I understand your point. On the other hand putting all
>> methods,
>> > > even
>> > > >> > with
>> > > >> > >> "supportsXXX" methods for enabling certain features, makes the
>> > > entry
>> > > >> > >> threshold for writing a new source higher. Instead of
>> focusing on
>> > > the
>> > > >> > >>
>> > > >> > >> basic
>> > > >> > >>
>> > > >> > >> and required properties of the Source, the person
>> implementing a
>> > > >> source
>> > > >> > >> must bother with and need to figure out what all of the extra
>> > > >> features
>> > > >> > >>
>> > > >> > >> are
>> > > >> > >>
>> > > >> > >> about and how to deal with them. It makes it also harder to
>> > > organize
>> > > >> > >> methods in coupled groups as Jark said.
>> > > >> > >>
>> > > >> > >> Having said that, as I don't have a preference and I agree
>> most
>> > of
>> > > >> the
>> > > >> > >> sources will support the alignment I am fine following your
>> > > >> suggestion
>> > > >> > to
>> > > >> > >> have the SourceReader extending from
>> > WithWatermarksSplitsAlignment,
>> > > >> but
>> > > >> > >> would put the "supportsXXX" there, not in the Source to keep
>> the
>> > > two
>> > > >> > >> methods together.
>> > > >> > >>
>> > > >> > >> Lastly, I agree it is really unfortunate the "alignSplits"
>> > methods
>> > > >> > differ
>> > > >> > >> slightly for SourceReader and SpitReader. The reason for that
>> is
>> > > >> > >> SourceReaderBase deals only with SplitIds, whereas SplitReader
>> > > needs
>> > > >> the
>> > > >> > >> actual splits to pause them. I found the discrepancy
>> acceptable
>> > for
>> > > >> the
>> > > >> > >> sake of simplifying changes significantly, especially as they
>> > would
>> > > >> > >>
>> > > >> > >> highly
>> > > >> > >>
>> > > >> > >> likely impact performance as we would have to perform
>> additional
>> > > >> > lookups.
>> > > >> > >> Moreover the SplitReader is a secondary interface.
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >>
>> > > >> > >> Dawid
>> > > >> > >>
>> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>> > > >> > >>
>> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
>> > > >> > >>
>> > > >> > >> Thanks for the effort, Dawid and Sebastian!
>> > > >> > >>
>> > > >> > >> I just have some minor questions (maybe I missed something).
>> > > >> > >>
>> > > >> > >> 1. Will the framework always align with watermarks when the
>> > source
>> > > >> > >> implements the interface?
>> > > >> > >> I'm afraid not every case needs watermark alignment even if
>> Kafka
>> > > >> > >> implements the interface,
>> > > >> > >> and this will affect the throughput somehow. I agree with
>> Becket
>> > > >> > >> we may need a
>> > > >> > >> `supportSplitsAlignment()` method for users to configure the
>> > source
>> > > >> to
>> > > >> > >> enable/disable the alignment.
>> > > >> > >>
>> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
>> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
>> affect
>> > > >> > >>
>> > > >> > >> throughput
>> > > >> > >>
>> > > >> > >> if the reader is constantly
>> > > >> > >>  switching between pause and resume. Can users configure the
>> > > >> alignment
>> > > >> > >> offset?
>> > > >> > >>
>> > > >> > >> 3. Interface/Method Name.
>> > > >> > >> Can the interface be used to align other things in the future?
>> > For
>> > > >> > >>
>> > > >> > >> example,
>> > > >> > >>
>> > > >> > >> align read speed, I have
>> > > >> > >> seen users requesting global rate limits. This feature may
>> also
>> > > need
>> > > >> an
>> > > >> > >> interface like this.
>> > > >> > >> If we don't plan to extend this interface to support align
>> other
>> > > >> things,
>> > > >> > >>
>> > > >> > >> I
>> > > >> > >>
>> > > >> > >> suggest explicitly declaring
>> > > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
>> > > >> instead
>> > > >> > of
>> > > >> > >> `alignSplits`.
>> > > >> > >>
>> > > >> > >> 4. Interface or Method.
>> > > >> > >> I don't have a strong opinion on this. I think they have their
>> > own
>> > > >> > >> advantages.
>> > > >> > >> In Flink SQL, we heavily use Interfaces for extending
>> abilities
>> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
>> > > >> > >> and I prefer Interfaces rather than methods in this case. When
>> > you
>> > > >> have
>> > > >> > a
>> > > >> > >> bunch of abilities and each ability
>> > > >> > >> has more than one method, Interfaces can help to organize them
>> > and
>> > > >> make
>> > > >> > >> users clear which methods
>> > > >> > >> need to implement when you want to have an ability.
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >> Jark
>> > > >> > >>
>> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <
>> becket.qin@gmail.com>
>> > <
>> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
>> becket.qin@gmail.com>
>> > <
>> > > >> > >> becket.qin@gmail.com> wrote:
>> > > >> > >>
>> > > >> > >> Hi Dawid,
>> > > >> > >>
>> > > >> > >> Thanks for the explanation. Apologies that I somehow misread a
>> > > bunch
>> > > >> of
>> > > >> > >> "align" and thought they were "assign".
>> > > >> > >>
>> > > >> > >> Regarding 1, by default implementation, I was thinking of the
>> > > default
>> > > >> > >>
>> > > >> > >> no-op
>> > > >> > >>
>> > > >> > >> implementation. I am a little worried about the proliferation
>> of
>> > > >> > >>
>> > > >> > >> decorative
>> > > >> > >>
>> > > >> > >> interfaces. I think the most important thing about interfaces
>> is
>> > > that
>> > > >> > >>
>> > > >> > >> they
>> > > >> > >>
>> > > >> > >> are easy to understand. In this case, I prefer adding new
>> method
>> > to
>> > > >> the
>> > > >> > >> existing interface for the following reasons:
>> > > >> > >>
>> > > >> > >> a) I feel the biggest drawback of decorative interfaces is
>> which
>> > > >> > >>
>> > > >> > >> interface
>> > > >> > >>
>> > > >> > >> they can decorate and which combinations of multiple
>> decorative
>> > > >> > >>
>> > > >> > >> interfaces
>> > > >> > >>
>> > > >> > >> are valid. In the current FLIP, the withSplitsAlignment
>> interface
>> > > is
>> > > >> > only
>> > > >> > >> applicable to the SourceReader which means it can't decorate
>> any
>> > > >> other
>> > > >> > >> interface. From an interface design perspective, a natural
>> > question
>> > > >> is
>> > > >> > >>
>> > > >> > >> why
>> > > >> > >>
>> > > >> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"?
>> And it
>> > > is
>> > > >> > also
>> > > >> > >> natural to assume that a split reader implementing both
>> > SplitReader
>> > > >> and
>> > > >> > >> WithSplitAlignment would work, because a source reader
>> > implementing
>> > > >> > >> SourceReader and withSplitsAlignment works. So why isn't
>> there an
>> > > >> > >>
>> > > >> > >> interface
>> > > >> > >>
>> > > >> > >> of AlignedSourceReader? In the future, if there is a new
>> feature
>> > > >> added
>> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we going to
>> > create
>> > > >> > >>
>> > > >> > >> another
>> > > >> > >>
>> > > >> > >> interface of SplitReader such as SortedSplitReader or
>> > > >> > >>
>> > > >> > >> PrePartitionedAware?
>> > > >> > >>
>> > > >> > >> Can they be combined? So I think the additional decorative
>> > > interface
>> > > >> > like
>> > > >> > >> withSplitsAlignment actually increases the understanding cost
>> of
>> > > >> users
>> > > >> > >> because they have to know what decorative interfaces are
>> there,
>> > > which
>> > > >> > >> interface they can decorate and which combinations of the
>> > > decorative
>> > > >> > >> interfaces are valid and which are not. Ideally we want to
>> avoid
>> > > >> that.
>> > > >> > To
>> > > >> > >> be clear, I am not opposing having an interface of
>> > > >> withSplitsAlignment,
>> > > >> > >>
>> > > >> > >> it
>> > > >> > >>
>> > > >> > >> is completely OK to have it as an internal interface and let
>> > > >> > SourceReader
>> > > >> > >> and SplitReader both extend it.
>> > > >> > >>
>> > > >> > >> b) Adding a new method to the SourceReader with a default
>> > > >> implementation
>> > > >> > >>
>> > > >> > >> of
>> > > >> > >>
>> > > >> > >> no-op would help avoid logic branching in the source logic,
>> > > >> especially
>> > > >> > >> given that we agree that the vast majority of the SourceReader
>> > > >> > >> implementations, if not all, would just extend from the
>> > > >> > SourceReaderBase.
>> > > >> > >> That means adding a new method to the interface would
>> effectively
>> > > >> give
>> > > >> > >>
>> > > >> > >> the
>> > > >> > >>
>> > > >> > >> same user experience, but simpler.
>> > > >> > >>
>> > > >> > >> c) A related design principle that may be worth discussing is
>> how
>> > > do
>> > > >> we
>> > > >> > >>
>> > > >> > >> let
>> > > >> > >>
>> > > >> > >> the Source implementations tell Flink what capability is
>> > supported
>> > > >> and
>> > > >> > >>
>> > > >> > >> what
>> > > >> > >>
>> > > >> > >> is not. Personally speaking I feel the most intuitive place
>> to me
>> > > is
>> > > >> in
>> > > >> > >>
>> > > >> > >> the
>> > > >> > >>
>> > > >> > >> Source itself, because that is the entrance of the entire
>> Source
>> > > >> > >>
>> > > >> > >> connector
>> > > >> > >>
>> > > >> > >> logic.
>> > > >> > >>
>> > > >> > >> Based on the above thoughts, I am wondering if the following
>> > > >> interface
>> > > >> > >> would be easier to understand by the users.
>> > > >> > >>
>> > > >> > >> - Change "withSplitsAlignment" to internal interface, let both
>> > > >> > >>
>> > > >> > >> SourceReader
>> > > >> > >>
>> > > >> > >> and SplitReader extend from it, with a default no-op
>> > > implementation.
>> > > >> > >> - Add a new method "boolean supportSplitsAlignment()" to the
>> > Source
>> > > >> > >> interface, with a default implementation returning false.
>> Sources
>> > > >> that
>> > > >> > >>
>> > > >> > >> have
>> > > >> > >>
>> > > >> > >> implemented the alignment logic can change this to return
>> true,
>> > and
>> > > >> > >> override the alignSplits() methods in the SourceReader /
>> > > SplitReader
>> > > >> if
>> > > >> > >> needed.
>> > > >> > >> - In the future, if a new optional feature is going to be
>> added
>> > to
>> > > >> the
>> > > >> > >> Source, and that feature requires the awareness from Flink, we
>> > can
>> > > >> add
>> > > >> > >>
>> > > >> > >> more
>> > > >> > >>
>> > > >> > >> such methods to the Source.
>> > > >> > >>
>> > > >> > >> What do you think?
>> > > >> > >>
>> > > >> > >> Thanks,
>> > > >> > >>
>> > > >> > >> Jiangjie (Becket) Qin
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
>> > > >> dwysakowicz@apache.org>
>> > > >> > <dw...@apache.org>
>> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>> > > >> > >>
>> > > >> > >> wrote:
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> @Konstantin:
>> > > >> > >>
>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka
>> the
>> > > >> stop &
>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>> only,
>> > > >> correct?
>> > > >> > >>
>> > > >> > >> Correct, as far as I know though, those are the only sources
>> > which
>> > > >> > >>
>> > > >> > >> consume
>> > > >> > >>
>> > > >> > >> concurrently from multiple splits and thus alignment applies.
>> > > >> > >>
>> > > >> > >> @Thomas:
>> > > >> > >>
>> > > >> > >> I wonder if "supporting" split alignment in SourceReaderBase
>> and
>> > > then
>> > > >> > >>
>> > > >> > >> doing
>> > > >> > >>
>> > > >> > >> nothing if the split reader does not implement
>> AlignedSplitReader
>> > > >> could
>> > > >> > >>
>> > > >> > >> be
>> > > >> > >>
>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added
>> to
>> > the
>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>> > explicit
>> > > >> that
>> > > >> > >> the source actually supports it.
>> > > >> > >>
>> > > >> > >> I understand your concern. Hmm, I think we could actually do
>> > that.
>> > > >> Given
>> > > >> > >> the actual implementation of the SourceReaderBase#alignSplits
>> is
>> > > >> rather
>> > > >> > >> short (just a forward to the corresponding method of
>> > SplitFetcher),
>> > > >> we
>> > > >> > >> could reimplement it in the actual source implementations.
>> This
>> > > >> solution
>> > > >> > >> has the downside though. Authors of new sources would have to
>> do
>> > > two
>> > > >> > >> things: extend from AlignedSplitReader and implement
>> > > >> > >>
>> > > >> > >> WithSplitsAssignment,
>> > > >> > >>
>> > > >> > >> instead of just extending AlignedSplitReader. I would be fine
>> > with
>> > > >> such
>> > > >> > a
>> > > >> > >> tradeoff though. What others think?
>> > > >> > >>
>> > > >> > >> @Steven:
>> > > >> > >>
>> > > >> > >> For this part from the motivation section, is it accurate?
>> Let's
>> > > >> assume
>> > > >> > >>
>> > > >> > >> one
>> > > >> > >>
>> > > >> > >> source task consumes from 3 partitions and one of the
>> partition
>> > is
>> > > >> > >> significantly slower. In this situation, watermark for this
>> > source
>> > > >> task
>> > > >> > >> won't hold back as it is reading recent data from other two
>> Kafka
>> > > >> > >> partitions. As a result, it won't hold back the overall
>> > watermark.
>> > > I
>> > > >> > >> thought the problem is that we may have late data for this
>> slow
>> > > >> > >>
>> > > >> > >> partition.
>> > > >> > >>
>> > > >> > >> It will hold back the watermark. Watermark of an operator is
>> the
>> > > >> minimum
>> > > >> > >> of watermarks of all splits[1]
>> > > >> > >>
>> > > >> > >> I have another question about the restart. Say split
>> alignment is
>> > > >> > >> triggered. checkpoint is completed. job failed and restored
>> from
>> > > the
>> > > >> > last
>> > > >> > >> checkpoint. because alignment decision is not checkpointed,
>> > > initially
>> > > >> > >> alignment won't be enforced until we get a cycle of watermark
>> > > >> > aggregation
>> > > >> > >> and propagation, right? Not saying this corner is a problem.
>> Just
>> > > >> want
>> > > >> > to
>> > > >> > >> understand it more.
>> > > >> > >>
>> > > >> > >> Your understanding is correct.
>> > > >> > >>
>> > > >> > >> @Becket:
>> > > >> > >>
>> > > >> > >> 1. I think watermark alignment is sort of a general use case,
>> so
>> > > >> should
>> > > >> > >>
>> > > >> > >> we
>> > > >> > >>
>> > > >> > >> just add the related methods to SourceReader directly instead
>> of
>> > > >> > >> introducing the new interface of WithSplitAssignment? We can
>> > > provide
>> > > >> > >> default implementations, so backwards compatibility won't be
>> an
>> > > >> issue.
>> > > >> > >>
>> > > >> > >> I don't think we can provide a default implementation. How
>> would
>> > we
>> > > >> do
>> > > >> > >> that? Would it be just a no-op? Is it better than having an
>> > opt-in
>> > > >> > >> interface? The default implementation would have to be added
>> > > >> exclusively
>> > > >> > >>
>> > > >> > >> in
>> > > >> > >>
>> > > >> > >> a *Public* SourceReader interface. By the way notice
>> > > SourceReaderBase
>> > > >> > >> does extend from WithSplitsAlignment, so effectively all
>> > > >> implementations
>> > > >> > >>
>> > > >> > >> do
>> > > >> > >>
>> > > >> > >> handle the alignment case. To be honest I think it is
>> impossible
>> > to
>> > > >> > >> implement the SourceReader interface directly by end users.
>> > > >> > >>
>> > > >> > >> 2. As you mentioned, the SplitReader interface probably also
>> > needs
>> > > >> some
>> > > >> > >> change to support throttling at the split granularity. Can you
>> > add
>> > > >> that
>> > > >> > >> interface change into the public interface section as well?
>> > > >> > >>
>> > > >> > >> It has been added from the beginning. See
>> *AlignedSplitReader.*
>> > > >> > >>
>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
>> > given
>> > > >> that
>> > > >> > >>
>> > > >> > >> it
>> > > >> > >>
>> > > >> > >> is not actually changing the split assignments? It seems
>> > something
>> > > >> like
>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>> > > accurate.
>> > > >> > >>
>> > > >> > >> The method's called *alignSplits*, not assign. Do you still
>> > prefer
>> > > a
>> > > >> > >> different name for that? Personally, I am open for suggestions
>> > > here.
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >>
>> > > >> > >> Dawid
>> > > >> > >>
>> > > >> > >> [1]
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >>
>> > >
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>> > > >> > >>
>> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
>> > > >> > >>
>> > > >> > >> Thanks for driving the effort, Sebastion. I think the
>> motivation
>> > > >> makes a
>> > > >> > >> lot of sense. Just a few suggestions / questions.
>> > > >> > >>
>> > > >> > >> 1. I think watermark alignment is sort of a general use case,
>> so
>> > > >> should
>> > > >> > >>
>> > > >> > >> we
>> > > >> > >>
>> > > >> > >> just add the related methods to SourceReader directly instead
>> of
>> > > >> > >> introducing the new interface of WithSplitAssignment? We can
>> > > provide
>> > > >> > >> default implementations, so backwards compatibility won't be
>> an
>> > > >> issue.
>> > > >> > >>
>> > > >> > >> 2. As you mentioned, the SplitReader interface probably also
>> > needs
>> > > >> some
>> > > >> > >> change to support throttling at the split granularity. Can you
>> > add
>> > > >> that
>> > > >> > >> interface change into the public interface section as well?
>> > > >> > >>
>> > > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
>> > given
>> > > >> that
>> > > >> > >>
>> > > >> > >> it
>> > > >> > >>
>> > > >> > >> is not actually changing the split assignments? It seems
>> > something
>> > > >> like
>> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>> > > accurate.
>> > > >> > >>
>> > > >> > >> Thanks,
>> > > >> > >>
>> > > >> > >> Jiangjie (Becket) Qin
>> > > >> > >>
>> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
>> stevenz3wu@gmail.com
>> > >
>> > > <
>> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
>> stevenz3wu@gmail.com>
>> > <
>> > > >> > >> stevenz3wu@gmail.com> <
>> > > >> > >> stevenz3wu@gmail.com> wrote:
>> > > >> > >>
>> > > >> > >> However, a single source operator may read data from multiple
>> > > >> > >>
>> > > >> > >> splits/partitions, e.g., multiple Kafka partitions, such that
>> > even
>> > > >> with
>> > > >> > >> watermark alignment the source operator may need to buffer
>> > > excessive
>> > > >> > >>
>> > > >> > >> amount
>> > > >> > >>
>> > > >> > >> of data if one split emits data faster than another.
>> > > >> > >>
>> > > >> > >> For this part from the motivation section, is it accurate?
>> Let's
>> > > >> assume
>> > > >> > >>
>> > > >> > >> one
>> > > >> > >>
>> > > >> > >> source task consumes from 3 partitions and one of the
>> partition
>> > is
>> > > >> > >> significantly slower. In this situation, watermark for this
>> > source
>> > > >> task
>> > > >> > >> won't hold back as it is reading recent data from other two
>> Kafka
>> > > >> > >> partitions. As a result, it won't hold back the overall
>> > watermark.
>> > > I
>> > > >> > >> thought the problem is that we may have late data for this
>> slow
>> > > >> > >>
>> > > >> > >> partition.
>> > > >> > >>
>> > > >> > >> I have another question about the restart. Say split
>> alignment is
>> > > >> > >> triggered. checkpoint is completed. job failed and restored
>> from
>> > > the
>> > > >> > last
>> > > >> > >> checkpoint. because alignment decision is not checkpointed,
>> > > initially
>> > > >> > >> alignment won't be enforced until we get a cycle of watermark
>> > > >> > aggregation
>> > > >> > >> and propagation, right? Not saying this corner is a problem.
>> Just
>> > > >> want
>> > > >> > to
>> > > >> > >> understand it more.
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org>
>> <
>> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>> > > >> > >> thw@apache.org> <
>> > > >> > >> thw@apache.org> wrote:
>> > > >> > >>
>> > > >> > >> Thanks for working on this!
>> > > >> > >>
>> > > >> > >> I wonder if "supporting" split alignment in SourceReaderBase
>> and
>> > > then
>> > > >> > >>
>> > > >> > >> doing
>> > > >> > >>
>> > > >> > >> nothing if the split reader does not implement
>> AlignedSplitReader
>> > > >> could
>> > > >> > >>
>> > > >> > >> be
>> > > >> > >>
>> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added
>> to
>> > the
>> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
>> > explicit
>> > > >> that
>> > > >> > >> the source actually supports it.
>> > > >> > >>
>> > > >> > >> Thanks,
>> > > >> > >> Thomas
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
>> > > knaufk@apache.org>
>> > > >> <
>> > > >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
>> > > >> > >> knaufk@apache.org> <
>> > > >> > >> knaufk@apache.org>
>> > > >> > >>
>> > > >> > >> wrote:
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> Hi Sebastian, Hi Dawid,
>> > > >> > >>
>> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka
>> the
>> > > >> stop
>> > > >> > >>
>> > > >> > >> &
>> > > >> > >>
>> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar
>> only,
>> > > >> > >>
>> > > >> > >> correct?
>> > > >> > >>
>> > > >> > >> +1 in general. I believe it is valuable to complete the
>> watermark
>> > > >> > >>
>> > > >> > >> aligned
>> > > >> > >>
>> > > >> > >> story with this FLIP.
>> > > >> > >>
>> > > >> > >> Cheers,
>> > > >> > >>
>> > > >> > >> Konstantin
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
>> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
>> > > >> > >>
>> > > >> > >> wrote:
>> > > >> > >>
>> > > >> > >>
>> > > >> > >> To be explicit, having worked on it, I support it ;) I think
>> we
>> > can
>> > > >> > >> start a vote thread soonish, as there are no concerns so far.
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >>
>> > > >> > >> Dawid
>> > > >> > >>
>> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>> > > >> > >>
>> > > >> > >> Dear Flink developers,
>> > > >> > >>
>> > > >> > >> I would like to open a discussion on FLIP 217 [1] for an
>> > extension
>> > > >> > >>
>> > > >> > >> of
>> > > >> > >>
>> > > >> > >> Watermark Alignment to perform alignment also in
>> SplitReaders. To
>> > > >> > >>
>> > > >> > >> do
>> > > >> > >>
>> > > >> > >> so,
>> > > >> > >>
>> > > >> > >> SplitReaders must be able to suspend and resume reading from
>> > split
>> > > >> > >>
>> > > >> > >> sources
>> > > >> > >>
>> > > >> > >> where the SourceOperator coordinates and controlls suspend and
>> > > >> > >>
>> > > >> > >> resume.
>> > > >> > >>
>> > > >> > >> To
>> > > >> > >>
>> > > >> > >> gather information about current watermarks of the
>> SplitReaders,
>> > we
>> > > >> > >>
>> > > >> > >> extend
>> > > >> > >>
>> > > >> > >> the internal WatermarkOutputMulitplexer and report watermarks
>> to
>> > > >> > >>
>> > > >> > >> the
>> > > >> > >>
>> > > >> > >> SourceOperator.
>> > > >> > >>
>> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise
>> and
>> > > >> > >>
>> > > >> > >> revised
>> > > >> > >>
>> > > >> > >> and
>> > > >> > >>
>> > > >> > >> reworked by Dawid Wysakowicz (He did most of the work.) and
>> me.
>> > The
>> > > >> > >>
>> > > >> > >> changes
>> > > >> > >>
>> > > >> > >> are backwards compatible in a way that if affected components
>> do
>> > > >> > >>
>> > > >> > >> not
>> > > >> > >>
>> > > >> > >> support split alignment the behavior is as before.
>> > > >> > >>
>> > > >> > >> Best,
>> > > >> > >> Sebastian
>> > > >> > >>
>> > > >> > >> [1]
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>> > > >> > >>
>> > > >> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
>> > > >> > >>
>> > > >> > >> --
>> > > >> > >>
>> > > >> > >> Konstantin Knaufhttps://
>> > > >> twitter.com/snntrablehttps://github.com/knaufk
>> > > >> > >>
>> > > >> > >>
>> > > >> >
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi Guowei,

as Dawid wrote a couple of messages back:

> This is covered in the previous FLIP[1] which has been already
implemented in 1.15. In short, it must be enabled with the watermark
strategy which also configures drift and update interval

So by default watermark alignment is disabled, regardless if a source
supports it or not.

Best,
Piotrek

czw., 5 maj 2022 o 09:56 Guowei Ma <gu...@gmail.com> napisał(a):

> Hi,
>
> We know that in the case of Bounded input Flink supports the Batch
> execution mode. Currently in Batch execution mode, flink is executed on a
> stage-by-stage basis. In this way, perhaps watermark alignment might not
> gain much.
>
> So my question is: Is watermark alignment the default behavior(for
> implemented source only)? If so, have you considered evaluating the impact
> of this behavior on the Batch execution mode? Or thinks it is not
> necessary.
>
> Correct me if I miss something.
>
> Best,
> Guowei
>
>
> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <pi...@gmail.com>
> wrote:
>
> > Hi Becket and Dawid,
> >
> > > I feel that no matter which option we choose this can not be solved
> > entirely in either of the options, because of the point above and because
> > the signature of SplitReader#pauseOrResumeSplits and
> > SourceReader#pauseOrResumeSplits are slightly different (one identifies
> > splits with splitId the other one passes the splits directly).
> >
> > Yes, that's a good point in this case and for features that need to be
> > implemented in more than one place.
> >
> > > Is there any reason for pausing reading from a split an optional
> feature,
> > > other than that this was not included in the original interface?
> >
> > An additional argument in favor of making it optional is to simplify
> source
> > implementation. But on its own I'm not sure if that would be enough to
> > justify making this feature optional. Maybe.
> >
> > > I think it would be way simpler and clearer to just let end users and
> > Flink
> > > assume all the connectors will implement this feature.
> >
> > As I wrote above that would be an interesting choice to make (ease of
> > implementation for new users, vs system consistency). Regardless of that,
> > yes, for me the main argument is the API backward compatibility. But
> let's
> > clear a couple of points:
> > - The current proposal adding methods to the base interface with default
> > implementations is an OPTIONAL feature. Same as the decorative version
> > would be.
> > - Decorative version could implement "throw
> UnsupportedOperationException"
> > if user enabled watermark alignment just as well and I agree that's a
> > better option compared to logging a warning.
> >
> > Best,
> > Piotrek
> >
> >
> > śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com> napisał(a):
> >
> > > Thanks for the reply and patient discussion, Piotr and Dawid.
> > >
> > > Is there any reason for pausing reading from a split an optional
> feature,
> > > other than that this was not included in the original interface?
> > >
> > > To be honest I am really worried about the complexity of the user story
> > > here. Optional features like this have a high overhead. Imagine this
> > > feature is optional, now a user enabled watermark alignment and
> defined a
> > > few watermark groups. Would it work? Hmm, that depends on whether the
> > > involved Source has implmemented this feature. If the Sources are well
> > > documented, good luck. Otherwise end users may have to look into the
> code
> > > of the Source to see whether the feature is supported. Which is
> something
> > > they shouldn't have to do.
> > >
> > > I think it would be way simpler and clearer to just let end users and
> > Flink
> > > assume all the connectors will implement this feature. After all the
> > > watermark group is not optinoal to the end users. If in some rare
> cases,
> > > the feature cannot be supported, a clear UnsupportedOperationException
> > will
> > > be thrown to tell users to explicitly remove this Source from the
> > watermark
> > > group. I don't think we should have a warning message here, as they
> tend
> > to
> > > be ignored in many cases. If we do this, we don't even need the
> > supportXXX
> > > method in the Source for this feature. In fact this is exactly how many
> > > interfaces works today. For example, SplitEnumerator#addSplitsBack() is
> > not
> > > supported by Pravega source because it does not support partial
> failover.
> > > In that case, it simply throws an exception to trigger a global
> recovery.
> > >
> > > The reason we add a default implementation in this case would just for
> > the
> > > sake of backwards compatibility so the old source can still compile.
> > Sure,
> > > in short term, this feature might not be supported by many existing
> > > sources. That is OK, and it is quite visible to the source developers
> > that
> > > they did not override the default impl which throws an
> > > UnsupportedOperationException.
> > >
> > > @Dawid,
> > >
> > > the Java doc of the SupportXXX() method in the Source would be the
> single
> > > >> source of truth regarding how to implement this feature.
> > > >
> > > >
> > >
> > > I also don't find it entirely true. Half of the classes are
> theoretically
> > > > optional and are utility classes from the point of view how the
> > > interfaces
> > > > are organized. Theoretically users do not need to use any of
> > > > SourceReaderBase & SplitReader. Would be weird to list their methods
> in
> > > the
> > > > Source interface.
> > >
> > > I think the ultimate goal of java docs is to guide users to implement
> the
> > > Source. If SourceReaderBase is the preferred way to implement a
> > > SourceReader, it seems worth mentioning that. Even the Java language
> > > documentation interfaces lists the konwn implementations [1] so people
> > can
> > > leverage them. But for this particular case, if we make the feature
> > > non-optional, we don't even need the supportXXX() method for now.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > > wrote:
> > >
> > > > Hey Piotr and Becket,
> > > >
> > > > First of all, let me say I am happy with whichever option is agreed
> in
> > > the
> > > > discussion.
> > > >
> > > > I wanted to clarify a few points from the discussion though:
> > > >
> > > > @Becket:
> > > >
> > > > The main argument for adding the methods to the SourceReader is that
> > > these
> > > > methods are effectively NON-OPTIONAL to SourceReader impl, i.e.
> > starting
> > > > from this FLIP, all the SourceReaders impl are expected to support
> this
> > > > method, although some old implementations may not have implemented
> this
> > > > feature. I think we should distinguish the new features from the
> > optional
> > > > features. While the public decorative interface is a solution to the
> > > > optional features. We should not use it for the features that are
> > > > non-optional.
> > > >
> > > > I don't think that this feature is NON-OPTIONAL. Even though
> > preferred, I
> > > > still think it can be simply optional.
> > > >
> > > > the Java doc of the SupportXXX() method in the Source would be the
> > single
> > > > source of truth regarding how to implement this feature.
> > > >
> > > > I also don't find it entirely true. Half of the classes are
> > theoretically
> > > > optional and are utility classes from the point of view how the
> > > interfaces
> > > > are organized. Theoretically users do not need to use any of
> > > > SourceReaderBase & SplitReader. Would be weird to list their methods
> in
> > > the
> > > > Source interface.
> > > >
> > > > @Piotr
> > > >
> > > > If we have all of the methods with default implementation in the base
> > > > interface, the API doesn't give any clue to the user which set of
> > methods
> > > > are required to be implemented at the same time.
> > > >
> > > > I feel that no matter which option we choose this can not be solved
> > > > entirely in either of the options, because of the point above and
> > because
> > > > the signature of SplitReader#pauseOrResumeSplits and
> > > > SourceReader#pauseOrResumeSplits are slightly different (one
> identifies
> > > > splits with splitId the other one passes the splits directly).
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > > On 03/05/2022 14:30, Becket Qin wrote:
> > > >
> > > > Hi Piotr,
> > > >
> > > > Thanks for the comment.
> > > >
> > > > Just to clarify, I am not against the decorative interfaces, but I do
> > > > think we should use them with caution. The main argument for adding
> the
> > > > methods to the SourceReader is that these methods are
> > > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting from
> this
> > > > FLIP, all the SourceReaders impl are expected to support this
> > > > method, although some old implementations may not have implemented
> this
> > > > feature. I think we should distinguish the new features from the
> > optional
> > > > features. While the public decorative interface is a solution to the
> > > > optional features. We should not use it for the features that are
> > > > non-optional.
> > > >
> > > > That said, this feature is optional for SplitReaders. Arguably we can
> > > have
> > > > a decorative interface for that, but for simplicity and symmetry of
> the
> > > > interface, personally I prefer just adding a new method.
> > > >
> > > > Regarding the advantages you mentioned about the decorative
> interfaces,
> > > > they would make sense if:
> > > > 1. The feature is optional.
> > > > 2. There is only one decorative interface involved for a feature.
> > > > Otherwise the argument that all the methods are grouped together will
> > not
> > > > stand.
> > > >
> > > > Compared with that, I think the current solution works fine in all
> > cases,
> > > > i.e. "having supportXXX() method in Source, and default methods /
> > > > decorative interfaces in base interfaces.".
> > > >
> > > > The advantages are:
> > > >> - clean and easy to implement base interface
> > > >
> > > > In the current approach, the Java doc of the SupportXXX() method in
> the
> > > > Source would be the single source of truth regarding how to implement
> > > this
> > > > feature. It lists the method that has to be implemented to support
> this
> > > > feature, regardless of how many classes / interfaces are involved.
> > > >
> > > > When implementing the base interface, users do not need to implement
> a
> > > > method with default implementation. If they are curious what the
> method
> > > is
> > > > for, the java doc of that method simply points users to the
> > SupportXXX()
> > > > method in the Source. It seems not adding work to the users compared
> > with
> > > > decorative interfaces, but gives much better discoverability.
> > > >
> > > > - all of the methods from a single feature are grouped in a single
> > > >> decorator interface, together with their dedicated java doc. It's
> also
> > > >> easier to google search for help using the decorator name
> > > >
> > > > - if an optional feature requires two methods to be implemented at
> > once,
> > > >> decorator can guarantee that
> > > >
> > > > These two points are not true when multiple components and classes
> are
> > > > involved collaboratively to provide a feature. In our case, we have
> > both
> > > > SourceReader and SplitReader involved. And there might be other
> > > interfaces
> > > > on the JM side involved for some future features. So the relevant
> > methods
> > > > can actually be scattered over the places. That said, we may still
> use
> > > > decorative interfaces for each component, if the feature is optional,
> > > given
> > > > there is a single source of truth for the feature.
> > > >
> > > > Here I would strongly lean towards making life easier for new users,
> > > >> lowering the entry barrier, at the (imo) slight expense for the
> power
> > > >> users.
> > > >
> > > > I actually think the current approach is simpler, more extensible and
> > > more
> > > > general for all the users. Can you articulate a bit more on which
> part
> > > you
> > > > think makes users harder to understand?
> > > >
> > > >
> > > > There is another benefit of the decorative interfaces which is not
> > > > mentioned, but might be worth considering here. Usually the
> decorative
> > > > interfaces give slightly better backwards compatibility than the new
> > > > default method in the interfaces. That is when users are using a jar
> > that
> > > > was compiled with an older version of Flink which does not have the
> > > default
> > > > method in the interfaces in question. A decorative interface may
> still
> > > > provide backwards compatibility in that case, while default method
> impl
> > > > cannot.
> > > >
> > > > I think in Flink we in general do not guarantee custom components
> > > compiled
> > > > with an older version can run with a newer version of Flink. A
> > recompile
> > > > with a newer version would be required. That said, if we do care
> about
> > > > this, we can just change the "supportXXX()" method in the Source
> > > interface
> > > > to use decorative interfaces, and leave the other parts unchanged.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pn...@apache.org>
> > > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> Sorry for chipping in so late, but I was OoO for the last two weeks.
> > > >> Regarding the interfaces, I would be actually against adding those
> > > methods
> > > >> to the base interfaces for the reasons mentioned above. Clogging the
> > > base
> > > >> interface for new users with tons of methods that they do not need,
> do
> > > not
> > > >> understand and do not know what to do with them. Moreover, such
> > > decorative
> > > >> interfaces are solving a problem if a feature requires two or more
> > > methods
> > > >> to be implemented at the same time. If we have all of the methods
> with
> > > >> default implementation in the base interface, the API doesn't give
> any
> > > >> clue
> > > >> to the user which set of methods are required to be implemented at
> the
> > > >> same
> > > >> time.
> > > >>
> > > >> > a) I feel the biggest drawback of decorative interfaces is which
> > > >> interface
> > > >> > they can decorate and which combinations of multiple decorative
> > > >> interfaces
> > > >> > are valid. (...)
> > > >> > In the future, if there is a new feature added
> > > >> > (e.g. sorted or pre-partitioned data aware), are we going to
> create
> > > >> another
> > > >> > interface of SplitReader such as SortedSplitReader or
> > > >> PrePartitionedAware?
> > > >> > Can they be combined? So I think the additional decorative
> interface
> > > >> like
> > > >> > withSplitsAlignment actually increases the understanding cost of
> > users
> > > >> > because they have to know what decorative interfaces are there,
> > which
> > > >> > interface they can decorate and which combinations of the
> decorative
> > > >> > interfaces are valid and which are not. Ideally we want to avoid
> > that.
> > > >>
> > > >> I'm not sure if I understand how embedding default methods in the
> base
> > > >> interface is solving the problem: what can be combined or not? If
> > there
> > > >> are
> > > >> two conflicting features, having decorative interfaces that can not
> be
> > > >> mixed together actually makes much more sense to me rather than
> having
> > > >> them
> > > >> all in one base class. How would you allow users to implement only
> one
> > > of
> > > >> those two features?
> > > >>
> > > >> To reiterate on the issue. Yes, there are drawbacks:
> > > >> - how can a user discover what decorators are there?
> > > >> - how can a user know where the decorator can be applied to?
> > > >>
> > > >> However those are drawbacks for more power users, that can be
> > mitigated
> > > by
> > > >> the documentation. For example listing all of the decorators with
> > > >> detailed explanation both in the docs and in the java docs. More
> > > >> experienced users will be able to deal with those issues easier, as
> > they
> > > >> will already have some basic understanding of Flink. Also if user
> has
> > a
> > > >> problem that he wants to solve, he will google search a potential
> > > solution
> > > >> to his problem anyway, and while doing that he is very likely to
> > > discover
> > > >> the decorator that he needs anyway in the docs.
> > > >>
> > > >> The advantages are:
> > > >> - clean and easy to implement base interface
> > > >> - all of the methods from a single feature are grouped in a single
> > > >> decorator interface, together with their dedicated java doc. It's
> also
> > > >> easier to google search for help using the decorator name
> > > >> - if an optional feature requires two methods to be implemented at
> > once,
> > > >> decorator can guarantee that
> > > >>
> > > >> Here I would strongly lean towards making life easier for new users,
> > > >> lowering the entry barrier, at the (imo) slight expense for the
> power
> > > >> users.
> > > >>
> > > >> Best,
> > > >> Piotrek
> > > >>
> > > >>
> > > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
> > napisał(a):
> > > >>
> > > >> > Thanks for the reply Sebastian and Dawid,
> > > >> >
> > > >> > I think Sebastion has a good summary. This is a really helpful
> > > >> discussion.
> > > >> >
> > > >> > Thinking a bit more, I feel that it might still be better to add
> the
> > > >> > supportsXXX() method in the Source rather than SourceReader.
> > > >> >
> > > >> > Generally speaking, what we are trying to do here is to let the
> > Flink
> > > >> > framework know what the Source is capable of. In this FLIP, it
> > happens
> > > >> to
> > > >> > be the capability that only involves SourceReader. But in the
> > future,
> > > >> it is
> > > >> > possible that another functionality involves both the
> > SplitEnumerator
> > > >> and
> > > >> > SourceReader. In that case, following the current approach, we
> > should
> > > >> put
> > > >> > the "supportsXXX()" method in both SplitEnumerator and
> SourceReader.
> > > >> > Because if we only put this in the SourceReader, then the JM would
> > > have
> > > >> to
> > > >> > create a SourceReader in order to know whether this feature is
> > > >> supported,
> > > >> > which is a little ugly. But if we put the "supportsXXX()" method
> in
> > > the
> > > >> > Source, we will break the "symmetric" design because this FLIP
> > chose a
> > > >> > different way.
> > > >> >
> > > >> > This is also why I think supportsXXX() method seems a good thing
> to
> > > >> have,
> > > >> > because when there are a few interfaces / methods that are
> expected
> > to
> > > >> be
> > > >> > implemented at the same time in order to deliver a feature, it is
> > > always
> > > >> > good to have a single source of truth to tell the framework what
> to
> > > do,
> > > >> so
> > > >> > the framework can do consistent things in different parts.
> > > >> >
> > > >> > @Sebastian Mattheis <se...@ververica.com>
> > > >> >
> > > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
> > > >> > AlignedSplitReader, what I feel awkward about is that we are
> > > essentially
> > > >> > expecting almost all the SourceReader implementations to extend
> > > >> > SourceReaderBase, which effectively makes the SourceReader
> interface
> > > >> > without the pausing support useless. So this indicates that public
> > > >> > decorative interfaces (or sub-interfaces for the same purpose)
> only
> > > >> > make sense if the original interface is also expected to be used.
> > > >> > Otherwise, it seems makes more sense to add the method to the
> > original
> > > >> > interface itself.
> > > >> >
> > > >> > Cheers,
> > > >> >
> > > >> > Jiangjie (Becket) Qin
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> > > >> dwysakowicz@apache.org>
> > > >> > wrote:
> > > >> >
> > > >> > > Thanks @Sebastian for the nice summary.
> > > >> > >
> > > >> > > I think most of your points aligned with the suggestions I made
> to
> > > the
> > > >> > > FLIP, while you were writing your reply (I believe we hit enter
> > > >> nearly at
> > > >> > > the same time ;) )
> > > >> > >
> > > >> > > Two points after we synced offline
> > > >> > >
> > > >> > > 1. I changed also the supportsWatermarksSplitAlignment to
> > > >> > > supportsPausingSplits to express the general capability of
> > pausing.
> > > >> > >
> > > >> > > 2. As for if we should PausingSourceReader/PausingSplitReader
> > > (option
> > > >> b)
> > > >> > > or if we should just add the methods (option c), I suggest to
> > simply
> > > >> add
> > > >> > > the two methods as I felt this is much preferred approach
> Becket,
> > > >> which
> > > >> > > others do not object. Unless there is an opposition let's go
> with
> > > this
> > > >> > > option c.
> > > >> > >
> > > >> > > Best,
> > > >> > >
> > > >> > > Dawid
> > > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> > > >> > >
> > > >> > > Hi folks,
> > > >> > >
> > > >> > > Sorry for being a bit silent. Many thanks for all the input and
> > > >> > > suggestions. As I'm a bit new, I needed some time to catch up
> and
> > > >> > structure
> > > >> > > (for myself) the discussion and I wanted to find a way to
> > structure
> > > >> the
> > > >> > > conclusions. (Also because I had the feeling that some concerns
> > got
> > > >> lost
> > > >> > in
> > > >> > > the discussion.) This is my attempt and please correct me if
> > > >> something is
> > > >> > > wrong or misunderstood. I tried to collect and assemble the
> > > opinions,
> > > >> > > suggestions, and conclusions (to the best of my knowledge):
> > > >> > >
> > > >> > > # Top A: Should split alignment (pause/resume behavior) be a
> > general
> > > >> > > capability?
> > > >> > >
> > > >> > > I personally don't see any reason no to have it a general
> > capability
> > > >> > > because for the alignSplit method it is actually independent of
> > the
> > > >> > > watermarks. If we agree here to have it a general capability, we
> > > >> should
> > > >> > > also agree on the right wording. Does
> "alignSplits(splitsToResume,
> > > >> > > splitsToPause)" refer to what is then actually meant? (I see it
> as
> > > >> okay.
> > > >> > I
> > > >> > > don't have any better idea whilst Arvid suggested
> > > >> "pauseOrResumeSplits".)
> > > >> > >
> > > >> > > # Top B: Should it be possible do enable/disable split
> alignment?
> > > >> > >
> > > >> > > I would personally not disable the split alignment on the source
> > > >> reader
> > > >> > > side because if split alignment is used for some other use case
> > (see
> > > >> A)
> > > >> > it
> > > >> > > could have nasty side effects on other/future use cases.
> Instead,
> > I
> > > >> would
> > > >> > > disable "watermark split alignment" where I think it should
> > disable
> > > >> the
> > > >> > > watermark-dependent trigger for split alignment.
> > > >> > >
> > > >> > > # Top C: Should we add a supportsX method?
> > > >> > >
> > > >> > > I find it difficult to define the scope of a supportsX method
> > w.r.t.
> > > >> to
> > > >> > > the following questions: a) Where is it used? and b) What is the
> > > >> expected
> > > >> > > output? To b), it's not straight-forward to provide a meaningful
> > > >> output,
> > > >> > > e.g., if SourceReader supports split alignment but SplitReader
> > not.
> > > >> This
> > > >> > is
> > > >> > > because with the current implementation, we can determine
> whether
> > > >> split
> > > >> > > alignment is fully supported only during runtime and
> specifically
> > > >> > actually
> > > >> > > only when calling alignSplits down the call hierarchy up to the
> > > actual
> > > >> > > SplitReaders.
> > > >> > >
> > > >> > > Therefore, I would suggest to either raise an error or warning
> if
> > > the
> > > >> > > alignment is called but not supported at some point. I know we
> > > should
> > > >> > > carefully think about when this could be the case because we
> don't
> > > >> want
> > > >> > to
> > > >> > > flood anybody with such warnings. However, warnings could be an
> > > >> indicator
> > > >> > > for the user that for watermark split alignment use case split
> > > >> reading is
> > > >> > > imbalanced with the conclusion to either disable the trigger for
> > > >> > watermark
> > > >> > > split alignment (see Top B) or to use/implement a source and
> > reader
> > > >> that
> > > >> > > fully supports split alignment.
> > > >> > >
> > > >> > > # Top D: How to design interfaces?
> > > >> > >
> > > >> > > Thanks for structuring the discussion with the the various
> > > >> possibilities
> > > >> > > (a-d). From the discussion and emails, I would like to summarize
> > the
> > > >> > > following requirements:
> > > >> > > - Interfaces should be consistent ("symmetric"), i.e., similar
> > > >> semantics
> > > >> > > should have similar interfaces with similar usage.
> > > >> > > - Make explicit which implementations implement
> interfaces/support
> > > >> > > behavior.
> > > >> > > - Make clear what are default implementations and how to
> implement
> > > >> > > interfaces with desired behavior.
> > > >> > >
> > > >> > > This is a simplified view of the relations between relevant
> > classes
> > > of
> > > >> > the
> > > >> > > PoC implementation:
> > > >> > >
> > > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- ..
> > <|--
> > > >> > > MySourceReader
> > > >> > >
> > > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
> > SplitFetcher
> > > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
> > > >> > >
> > > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> > > >> > >
> > > >> > > Note that SourceReaderBase and SplitFetcherManager implement
> most
> > of
> > > >> the
> > > >> > > "logic" for split alignment just because we wanted to implement
> > > split
> > > >> > > alignment and wanted it to be available as kind of a default.
> As a
> > > >> > > consequence, we have a "default implementation" for SourceReader
> > > that
> > > >> > > implements the actual logic for split alignment. For that
> reason,
> > I
> > > >> find
> > > >> > it
> > > >> > > very confusing to have a NOOP default implementation in the
> > > interface
> > > >> for
> > > >> > > the SourceReader. As a consequence, interface strategy c) is
> > > difficult
> > > >> > > because this would require NOOP default implementations in the
> > > public
> > > >> > > interfaces of SourceReader and SplitReader. This is the same for
> > > >> strategy
> > > >> > > d) because it would require NOOP default implementation in the
> > > >> > > SourceReader. Further, as Dawid described method signatures of
> > > >> alignSplit
> > > >> > > for SourceReader and SplitReader differ and it would be
> extremely
> > > >> > difficult
> > > >> > > to make the signatures the same (with even potential performance
> > > >> impact
> > > >> > > because of additional loop-ups of split ids). Therefore, having
> a
> > > >> > symmetric
> > > >> > > decorative interface as of strategy a) is actually not possible
> > and
> > > >> > having
> > > >> > > two decorative interfaces with different method signatures is
> > > >> confusing.
> > > >> > My
> > > >> > > conclusion is that we are best with strategy b) which means to
> > have
> > > >> > > specializing sub-interfaces that inherit from the parent
> > interface:
> > > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
> > > >> > AlignedSplitReader
> > > >> > > With this option, I'm not 100% sure what the implications are
> and
> > if
> > > >> this
> > > >> > > could get nasty. I would suggest that Dawid and I just try to
> > > >> implement
> > > >> > and
> > > >> > > see if we like it. :)
> > > >> > >
> > > >> > > # Summary
> > > >> > >
> > > >> > > In conclusion, please let me know your perspectives. Please
> > correct
> > > >> me,
> > > >> > if
> > > >> > > something is wrong or if I misunderstood something. My
> perspective
> > > >> would
> > > >> > be:
> > > >> > >
> > > >> > > Top A: Yes
> > > >> > > Top B: Yes (but disable watermark trigger for split alignment)
> > > >> > > Top C: No
> > > >> > > Top D: b)
> > > >> > >
> > > >> > > Best,
> > > >> > > Sebastian
> > > >> > >
> > > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> > > >> dwysakowicz@apache.org
> > > >> > >
> > > >> > > wrote:
> > > >> > >
> > > >> > >> @Arvid:
> > > >> > >>
> > > >> > >> While I also like Becket's capability approach, I fear that it
> > > >> doesn't
> > > >> > work
> > > >> > >> for this particular use case: Sources can always be aligned
> > > >> cross-task
> > > >> > and
> > > >> > >> this is just about intra-task alignment. So it's plausible to
> put
> > > >> > sources
> > > >> > >> into an alignment group even though they do not use any of the
> > > >> presented
> > > >> > >> API of FLIP-217. They should just issue a warning, if they
> handle
> > > >> > multiple
> > > >> > >> splits (see motivation section).
> > > >> > >>
> > > >> > >> Yes, but the "supportXXX" method would be for telling if it
> > > supports
> > > >> > that
> > > >> > >> intra-task alignment. Cross-task alignment would always be
> > > supported.
> > > >> > >>
> > > >> > >> I updated interfaces to what I believe to be closest to a
> > consensus
> > > >> > >> between all participants. Do you mind taking a look?
> > > >> > >>
> > > >> > >> @Sebastian Do you mind addressing the nits?
> > > >> > >>
> > > >> > >> Best,
> > > >> > >>
> > > >> > >> Dawid
> > > >> > >>
> > > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > > >> > >>
> > > >> > >> Thanks for pushing this effort.
> > > >> > >>
> > > >> > >> I'd actually be in favor of 1b). I fully agree that decorator
> > > >> interfaces
> > > >> > >> should be avoided but I'm also not a big fan of overloading the
> > > base
> > > >> > >> interfaces (they are hard to implement as is). The usual
> feedback
> > > to
> > > >> > >> Source-related interfaces are always that they are overwhelming
> > and
> > > >> too
> > > >> > >> hard to implement. However, I'd also not oppose 1c) as
> scattered
> > > >> > interfaces
> > > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> > > >> > >> While I also like Becket's capability approach, I fear that it
> > > >> doesn't
> > > >> > work
> > > >> > >> for this particular use case: Sources can always be aligned
> > > >> cross-task
> > > >> > and
> > > >> > >> this is just about intra-task alignment. So it's plausible to
> put
> > > >> > sources
> > > >> > >> into an alignment group even though they do not use any of the
> > > >> presented
> > > >> > >> API of FLIP-217. They should just issue a warning, if they
> handle
> > > >> > multiple
> > > >> > >> splits (see motivation section).
> > > >> > >>
> > > >> > >> I think renaming alignSplits to facilitate future use cases
> makes
> > > >> sense
> > > >> > but
> > > >> > >> then all interfaces (if 1c) is chosen) should be adjusted
> > > >> accordingly.
> > > >> > >> AlignedSourceReader could be PausingSourceReader and I'd go for
> > > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could also
> > split
> > > it
> > > >> > into
> > > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may allow
> > > >> Sources
> > > >> > to
> > > >> > >> just use 1 instead of 2 library calls (as written in the
> > Javadoc),
> > > >> both
> > > >> > >> Kafka and Pulsar can't use it and I'm not sure if there is a
> > system
> > > >> that
> > > >> > >> can.
> > > >> > >>
> > > >> > >> Some nit for the FLIP:
> > > >> > >> - Please replace "stop" with "pause".
> > > >> > >> - Not sure if it's worth it in the capability section: Sources
> > that
> > > >> > adopt
> > > >> > >> this interface cannot be used in earlier versions. So it feels
> > like
> > > >> we
> > > >> > are
> > > >> > >> only forward compatible (old sources can be used after the
> > change);
> > > >> but
> > > >> > I
> > > >> > >> guess this holds for any API addition.
> > > >> > >> - You might want to add what happens when all splits are
> paused.
> > > >> > >> - You may want to describe how the 3 flavors of
> SourceReaderBase
> > > >> > interact
> > > >> > >> with the interface.
> > > >> > >> - I'm not sure if it makes sense to include Kafka and Pulsar in
> > the
> > > >> > FLIP.
> > > >> > >> For me, this is rather immediate follow-up work. (could be in
> the
> > > >> same
> > > >> > >> umbrella ticket)
> > > >> > >>
> > > >> > >> Best,
> > > >> > >>
> > > >> > >> Arvid
> > > >> > >>
> > > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >>
> > > >> > >> a) "MySourceReader implements SourceReader,
> WithSplitsAlignment",
> > > >> along
> > > >> > >> with "MySplitReader implements SplitReader,
> WithSplitsAlignment",
> > > or
> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
> > > "MySplitReader
> > > >> > >> implements AlignedSplitReader", or
> > > >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > > >> > implements
> > > >> > >> SplitReader".
> > > >> > >>
> > > >> > >> I think the latest proposal according to Dawid would be:
> > > >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > > >> > implements
> > > >> > >> AlignedSplitReader".
> > > >> > >> I am fine with this API, although personally speaking I think
> it
> > is
> > > >> > simpler
> > > >> > >> to just add a new method to the split reader with default impl.
> > > >> > >>
> > > >> > >>
> > > >> > >> I think that is a good idea to have it aligned as much as
> > possible.
> > > >> I'd
> > > >> > be
> > > >> > >> +1 for your option c). We can merge AlignedSplitReader with
> > > >> > SplitReader. We
> > > >> > >> will update the FLIP shortly.
> > > >> > >>
> > > >> > >> Best,
> > > >> > >>
> > > >> > >> Dawid
> > > >> > >>
> > > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> > > >> > >>
> > > >> > >> Thanks for the comment, Jark.
> > > >> > >>
> > > >> > >> 3. Interface/Method Name.
> > > >> > >>
> > > >> > >> Can the interface be used to align other things in the future?
> > For
> > > >> > example,
> > > >> > >> align read speed, I have
> > > >> > >> seen users requesting global rate limits. This feature may also
> > > need
> > > >> an
> > > >> > >> interface like this.
> > > >> > >> If we don't plan to extend this interface to support align
> other
> > > >> > things, I
> > > >> > >> suggest explicitly declaring
> > > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> > > >> instead
> > > >> > of
> > > >> > >> `alignSplits`.
> > > >> > >>
> > > >> > >> This is a good point. Naming wise, it would usually be more
> > > >> extensible
> > > >> > to
> > > >> > >> just describe what the method actually does, instead of
> assuming
> > > the
> > > >> > >> purpose of doing this. For example, in this case,
> > > >> pauseOrResumeSplits()
> > > >> > >> would be more extensible because this can be used for any kind
> of
> > > >> flow
> > > >> > >> control, be it watermark alignment or simple rate limiting.
> > > >> > >>
> > > >> > >> 4. Interface or Method.
> > > >> > >>
> > > >> > >> I don't have a strong opinion on this. I think they have their
> > own
> > > >> > >> advantages.
> > > >> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > > >> > >> and I prefer Interfaces rather than methods in this case. When
> > you
> > > >> have
> > > >> > a
> > > >> > >> bunch of abilities and each ability
> > > >> > >> has more than one method, Interfaces can help to organize them
> > and
> > > >> make
> > > >> > >> users clear which methods
> > > >> > >> need to implement when you want to have an ability.
> > > >> > >>
> > > >> > >> I am OK with decorative interfaces if this is a general design
> > > >> pattern
> > > >> > in
> > > >> > >> the other components in Flink. But it looks like the current
> API
> > > >> > proposal
> > > >> > >> is not symmetric.
> > > >> > >>
> > > >> > >> The current proposal is essentially "MySourceReader implements
> > > >> > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
> > > >> implements
> > > >> > >> AlignedSplitsReader".
> > > >> > >>
> > > >> > >> Should we make the API symmetric? I'd consider any one of the
> > > >> following
> > > >> > as
> > > >> > >> symmetric.
> > > >> > >>
> > > >> > >> a) "MySourceReader implements SourceReader,
> WithSplitsAlignment",
> > > >> along
> > > >> > >> with "MySplitReader implements SplitReader,
> WithSplitsAlignment",
> > > or
> > > >> > >> b) "MySourceReader implements AlignedSourceReader" and
> > > "MySplitReader
> > > >> > >> implements AlignedSplitReader", or
> > > >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > > >> > implements
> > > >> > >> SplitReader".
> > > >> > >>
> > > >> > >> I think the latest proposal according to Dawid would be:
> > > >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > > >> > implements
> > > >> > >> AlignedSplitReader".
> > > >> > >> I am fine with this API, although personally speaking I think
> it
> > is
> > > >> > simpler
> > > >> > >> to just add a new method to the split reader with default impl.
> > > >> > >>
> > > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
> > dwysakowicz@apache.org
> > > >
> > > >> <
> > > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the
> > > reply.
> > > >> > >>
> > > >> > >> Having said that, as I don't have a preference and I agree most
> > of
> > > >> the
> > > >> > >>
> > > >> > >> sources will support the alignment I am fine following your
> > > >> suggestion
> > > >> > to
> > > >> > >> have the SourceReader extending from
> > WithWatermarksSplitsAlignment,
> > > >> but
> > > >> > >> would put the "supportsXXX" there, not in the Source to keep
> the
> > > two
> > > >> > >> methods together.
> > > >> > >>
> > > >> > >> One benefit of having the "supportsXXX" in Source is that this
> > > allows
> > > >> > some
> > > >> > >> compile time check. For example, if a user enabled watermark
> > > >> alignment
> > > >> > >> while it is not supported by the Source, an exception can be
> > thrown
> > > >> at
> > > >> > >> compile time. It seems in general useful. That said, I agree
> that
> > > API
> > > >> > >> cleanliness wise it is better to put the two methods together.
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >>
> > > >> > >> Jiangjie (Becket) Qin
> > > >> > >>
> > > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
> > > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
> > > >> > >>
> > > >> > >>
> > > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
> > > >> > >> I don't have other concerns then.
> > > >> > >>
> > > >> > >> Best,
> > > >> > >> Jark
> > > >> > >>
> > > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> > > >> dwysakowicz@apache.org>
> > > >> > <dw...@apache.org> <dw...@apache.org> <
> > > >> dwysakowicz@apache.org>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >>
> > > >> > >> @Jark:
> > > >> > >>
> > > >> > >> 1. Will the framework always align with watermarks when the
> > source
> > > >> > >> implements the interface?
> > > >> > >> I'm afraid not every case needs watermark alignment even if
> Kafka
> > > >> > >> implements the interface,
> > > >> > >> and this will affect the throughput somehow. I agree with
> Becket
> > > >> > >> we may need a
> > > >> > >> `supportSplitsAlignment()` method for users to configure the
> > source
> > > >> to
> > > >> > >> enable/disable the alignment.
> > > >> > >>
> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
> affect
> > > >> > >>
> > > >> > >> throughput
> > > >> > >>
> > > >> > >> if the reader is constantly
> > > >> > >>  switching between pause and resume. Can users configure the
> > > >> alignment
> > > >> > >> offset?
> > > >> > >>
> > > >> > >>
> > > >> > >> This is covered in the previous FLIP[1] which has been already
> > > >> > >>
> > > >> > >> implemented
> > > >> > >>
> > > >> > >> in 1.15. In short, it must be enabled with the watermark
> strategy
> > > >> which
> > > >> > >> also configures drift and update interval.
> > > >> > >>
> > > >> > >> If we don't plan to extend this interface to support align
> other
> > > >> things,
> > > >> > >>
> > > >> > >> I
> > > >> > >>
> > > >> > >> suggest explicitly declaring
> > > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> > > >> instead
> > > >> > of
> > > >> > >> `alignSplits`.
> > > >> > >>
> > > >> > >>
> > > >> > >> Sure let's rename it.
> > > >> > >>
> > > >> > >> @Becket:
> > > >> > >>
> > > >> > >> I understand your point. On the other hand putting all methods,
> > > even
> > > >> > with
> > > >> > >> "supportsXXX" methods for enabling certain features, makes the
> > > entry
> > > >> > >> threshold for writing a new source higher. Instead of focusing
> on
> > > the
> > > >> > >>
> > > >> > >> basic
> > > >> > >>
> > > >> > >> and required properties of the Source, the person implementing
> a
> > > >> source
> > > >> > >> must bother with and need to figure out what all of the extra
> > > >> features
> > > >> > >>
> > > >> > >> are
> > > >> > >>
> > > >> > >> about and how to deal with them. It makes it also harder to
> > > organize
> > > >> > >> methods in coupled groups as Jark said.
> > > >> > >>
> > > >> > >> Having said that, as I don't have a preference and I agree most
> > of
> > > >> the
> > > >> > >> sources will support the alignment I am fine following your
> > > >> suggestion
> > > >> > to
> > > >> > >> have the SourceReader extending from
> > WithWatermarksSplitsAlignment,
> > > >> but
> > > >> > >> would put the "supportsXXX" there, not in the Source to keep
> the
> > > two
> > > >> > >> methods together.
> > > >> > >>
> > > >> > >> Lastly, I agree it is really unfortunate the "alignSplits"
> > methods
> > > >> > differ
> > > >> > >> slightly for SourceReader and SpitReader. The reason for that
> is
> > > >> > >> SourceReaderBase deals only with SplitIds, whereas SplitReader
> > > needs
> > > >> the
> > > >> > >> actual splits to pause them. I found the discrepancy acceptable
> > for
> > > >> the
> > > >> > >> sake of simplifying changes significantly, especially as they
> > would
> > > >> > >>
> > > >> > >> highly
> > > >> > >>
> > > >> > >> likely impact performance as we would have to perform
> additional
> > > >> > lookups.
> > > >> > >> Moreover the SplitReader is a secondary interface.
> > > >> > >>
> > > >> > >> Best,
> > > >> > >>
> > > >> > >> Dawid
> > > >> > >>
> > > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> > > >> > >>
> > > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > > >> > >>
> > > >> > >> Thanks for the effort, Dawid and Sebastian!
> > > >> > >>
> > > >> > >> I just have some minor questions (maybe I missed something).
> > > >> > >>
> > > >> > >> 1. Will the framework always align with watermarks when the
> > source
> > > >> > >> implements the interface?
> > > >> > >> I'm afraid not every case needs watermark alignment even if
> Kafka
> > > >> > >> implements the interface,
> > > >> > >> and this will affect the throughput somehow. I agree with
> Becket
> > > >> > >> we may need a
> > > >> > >> `supportSplitsAlignment()` method for users to configure the
> > source
> > > >> to
> > > >> > >> enable/disable the alignment.
> > > >> > >>
> > > >> > >> 2. How does the framework calculate maxDesiredWatermark?
> > > >> > >> I think the algorithm of maxDesiredWatermark will greatly
> affect
> > > >> > >>
> > > >> > >> throughput
> > > >> > >>
> > > >> > >> if the reader is constantly
> > > >> > >>  switching between pause and resume. Can users configure the
> > > >> alignment
> > > >> > >> offset?
> > > >> > >>
> > > >> > >> 3. Interface/Method Name.
> > > >> > >> Can the interface be used to align other things in the future?
> > For
> > > >> > >>
> > > >> > >> example,
> > > >> > >>
> > > >> > >> align read speed, I have
> > > >> > >> seen users requesting global rate limits. This feature may also
> > > need
> > > >> an
> > > >> > >> interface like this.
> > > >> > >> If we don't plan to extend this interface to support align
> other
> > > >> things,
> > > >> > >>
> > > >> > >> I
> > > >> > >>
> > > >> > >> suggest explicitly declaring
> > > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> > > >> instead
> > > >> > of
> > > >> > >> `alignSplits`.
> > > >> > >>
> > > >> > >> 4. Interface or Method.
> > > >> > >> I don't have a strong opinion on this. I think they have their
> > own
> > > >> > >> advantages.
> > > >> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > > >> > >> and I prefer Interfaces rather than methods in this case. When
> > you
> > > >> have
> > > >> > a
> > > >> > >> bunch of abilities and each ability
> > > >> > >> has more than one method, Interfaces can help to organize them
> > and
> > > >> make
> > > >> > >> users clear which methods
> > > >> > >> need to implement when you want to have an ability.
> > > >> > >>
> > > >> > >>
> > > >> > >> Best,
> > > >> > >> Jark
> > > >> > >>
> > > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <becket.qin@gmail.com
> >
> > <
> > > >> > becket.qin@gmail.com> <be...@gmail.com> <
> becket.qin@gmail.com>
> > <
> > > >> > >> becket.qin@gmail.com> wrote:
> > > >> > >>
> > > >> > >> Hi Dawid,
> > > >> > >>
> > > >> > >> Thanks for the explanation. Apologies that I somehow misread a
> > > bunch
> > > >> of
> > > >> > >> "align" and thought they were "assign".
> > > >> > >>
> > > >> > >> Regarding 1, by default implementation, I was thinking of the
> > > default
> > > >> > >>
> > > >> > >> no-op
> > > >> > >>
> > > >> > >> implementation. I am a little worried about the proliferation
> of
> > > >> > >>
> > > >> > >> decorative
> > > >> > >>
> > > >> > >> interfaces. I think the most important thing about interfaces
> is
> > > that
> > > >> > >>
> > > >> > >> they
> > > >> > >>
> > > >> > >> are easy to understand. In this case, I prefer adding new
> method
> > to
> > > >> the
> > > >> > >> existing interface for the following reasons:
> > > >> > >>
> > > >> > >> a) I feel the biggest drawback of decorative interfaces is
> which
> > > >> > >>
> > > >> > >> interface
> > > >> > >>
> > > >> > >> they can decorate and which combinations of multiple decorative
> > > >> > >>
> > > >> > >> interfaces
> > > >> > >>
> > > >> > >> are valid. In the current FLIP, the withSplitsAlignment
> interface
> > > is
> > > >> > only
> > > >> > >> applicable to the SourceReader which means it can't decorate
> any
> > > >> other
> > > >> > >> interface. From an interface design perspective, a natural
> > question
> > > >> is
> > > >> > >>
> > > >> > >> why
> > > >> > >>
> > > >> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"? And
> it
> > > is
> > > >> > also
> > > >> > >> natural to assume that a split reader implementing both
> > SplitReader
> > > >> and
> > > >> > >> WithSplitAlignment would work, because a source reader
> > implementing
> > > >> > >> SourceReader and withSplitsAlignment works. So why isn't there
> an
> > > >> > >>
> > > >> > >> interface
> > > >> > >>
> > > >> > >> of AlignedSourceReader? In the future, if there is a new
> feature
> > > >> added
> > > >> > >> (e.g. sorted or pre-partitioned data aware), are we going to
> > create
> > > >> > >>
> > > >> > >> another
> > > >> > >>
> > > >> > >> interface of SplitReader such as SortedSplitReader or
> > > >> > >>
> > > >> > >> PrePartitionedAware?
> > > >> > >>
> > > >> > >> Can they be combined? So I think the additional decorative
> > > interface
> > > >> > like
> > > >> > >> withSplitsAlignment actually increases the understanding cost
> of
> > > >> users
> > > >> > >> because they have to know what decorative interfaces are there,
> > > which
> > > >> > >> interface they can decorate and which combinations of the
> > > decorative
> > > >> > >> interfaces are valid and which are not. Ideally we want to
> avoid
> > > >> that.
> > > >> > To
> > > >> > >> be clear, I am not opposing having an interface of
> > > >> withSplitsAlignment,
> > > >> > >>
> > > >> > >> it
> > > >> > >>
> > > >> > >> is completely OK to have it as an internal interface and let
> > > >> > SourceReader
> > > >> > >> and SplitReader both extend it.
> > > >> > >>
> > > >> > >> b) Adding a new method to the SourceReader with a default
> > > >> implementation
> > > >> > >>
> > > >> > >> of
> > > >> > >>
> > > >> > >> no-op would help avoid logic branching in the source logic,
> > > >> especially
> > > >> > >> given that we agree that the vast majority of the SourceReader
> > > >> > >> implementations, if not all, would just extend from the
> > > >> > SourceReaderBase.
> > > >> > >> That means adding a new method to the interface would
> effectively
> > > >> give
> > > >> > >>
> > > >> > >> the
> > > >> > >>
> > > >> > >> same user experience, but simpler.
> > > >> > >>
> > > >> > >> c) A related design principle that may be worth discussing is
> how
> > > do
> > > >> we
> > > >> > >>
> > > >> > >> let
> > > >> > >>
> > > >> > >> the Source implementations tell Flink what capability is
> > supported
> > > >> and
> > > >> > >>
> > > >> > >> what
> > > >> > >>
> > > >> > >> is not. Personally speaking I feel the most intuitive place to
> me
> > > is
> > > >> in
> > > >> > >>
> > > >> > >> the
> > > >> > >>
> > > >> > >> Source itself, because that is the entrance of the entire
> Source
> > > >> > >>
> > > >> > >> connector
> > > >> > >>
> > > >> > >> logic.
> > > >> > >>
> > > >> > >> Based on the above thoughts, I am wondering if the following
> > > >> interface
> > > >> > >> would be easier to understand by the users.
> > > >> > >>
> > > >> > >> - Change "withSplitsAlignment" to internal interface, let both
> > > >> > >>
> > > >> > >> SourceReader
> > > >> > >>
> > > >> > >> and SplitReader extend from it, with a default no-op
> > > implementation.
> > > >> > >> - Add a new method "boolean supportSplitsAlignment()" to the
> > Source
> > > >> > >> interface, with a default implementation returning false.
> Sources
> > > >> that
> > > >> > >>
> > > >> > >> have
> > > >> > >>
> > > >> > >> implemented the alignment logic can change this to return true,
> > and
> > > >> > >> override the alignSplits() methods in the SourceReader /
> > > SplitReader
> > > >> if
> > > >> > >> needed.
> > > >> > >> - In the future, if a new optional feature is going to be added
> > to
> > > >> the
> > > >> > >> Source, and that feature requires the awareness from Flink, we
> > can
> > > >> add
> > > >> > >>
> > > >> > >> more
> > > >> > >>
> > > >> > >> such methods to the Source.
> > > >> > >>
> > > >> > >> What do you think?
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >>
> > > >> > >> Jiangjie (Becket) Qin
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> > > >> > dwysakowicz@apache.org> <dw...@apache.org> <
> > > >> dwysakowicz@apache.org>
> > > >> > <dw...@apache.org>
> > > >> > >> <dw...@apache.org> <dw...@apache.org> <
> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > > >> > >>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >>
> > > >> > >> @Konstantin:
> > > >> > >>
> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka
> the
> > > >> stop &
> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> > > >> correct?
> > > >> > >>
> > > >> > >> Correct, as far as I know though, those are the only sources
> > which
> > > >> > >>
> > > >> > >> consume
> > > >> > >>
> > > >> > >> concurrently from multiple splits and thus alignment applies.
> > > >> > >>
> > > >> > >> @Thomas:
> > > >> > >>
> > > >> > >> I wonder if "supporting" split alignment in SourceReaderBase
> and
> > > then
> > > >> > >>
> > > >> > >> doing
> > > >> > >>
> > > >> > >> nothing if the split reader does not implement
> AlignedSplitReader
> > > >> could
> > > >> > >>
> > > >> > >> be
> > > >> > >>
> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added to
> > the
> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
> > explicit
> > > >> that
> > > >> > >> the source actually supports it.
> > > >> > >>
> > > >> > >> I understand your concern. Hmm, I think we could actually do
> > that.
> > > >> Given
> > > >> > >> the actual implementation of the SourceReaderBase#alignSplits
> is
> > > >> rather
> > > >> > >> short (just a forward to the corresponding method of
> > SplitFetcher),
> > > >> we
> > > >> > >> could reimplement it in the actual source implementations. This
> > > >> solution
> > > >> > >> has the downside though. Authors of new sources would have to
> do
> > > two
> > > >> > >> things: extend from AlignedSplitReader and implement
> > > >> > >>
> > > >> > >> WithSplitsAssignment,
> > > >> > >>
> > > >> > >> instead of just extending AlignedSplitReader. I would be fine
> > with
> > > >> such
> > > >> > a
> > > >> > >> tradeoff though. What others think?
> > > >> > >>
> > > >> > >> @Steven:
> > > >> > >>
> > > >> > >> For this part from the motivation section, is it accurate?
> Let's
> > > >> assume
> > > >> > >>
> > > >> > >> one
> > > >> > >>
> > > >> > >> source task consumes from 3 partitions and one of the partition
> > is
> > > >> > >> significantly slower. In this situation, watermark for this
> > source
> > > >> task
> > > >> > >> won't hold back as it is reading recent data from other two
> Kafka
> > > >> > >> partitions. As a result, it won't hold back the overall
> > watermark.
> > > I
> > > >> > >> thought the problem is that we may have late data for this slow
> > > >> > >>
> > > >> > >> partition.
> > > >> > >>
> > > >> > >> It will hold back the watermark. Watermark of an operator is
> the
> > > >> minimum
> > > >> > >> of watermarks of all splits[1]
> > > >> > >>
> > > >> > >> I have another question about the restart. Say split alignment
> is
> > > >> > >> triggered. checkpoint is completed. job failed and restored
> from
> > > the
> > > >> > last
> > > >> > >> checkpoint. because alignment decision is not checkpointed,
> > > initially
> > > >> > >> alignment won't be enforced until we get a cycle of watermark
> > > >> > aggregation
> > > >> > >> and propagation, right? Not saying this corner is a problem.
> Just
> > > >> want
> > > >> > to
> > > >> > >> understand it more.
> > > >> > >>
> > > >> > >> Your understanding is correct.
> > > >> > >>
> > > >> > >> @Becket:
> > > >> > >>
> > > >> > >> 1. I think watermark alignment is sort of a general use case,
> so
> > > >> should
> > > >> > >>
> > > >> > >> we
> > > >> > >>
> > > >> > >> just add the related methods to SourceReader directly instead
> of
> > > >> > >> introducing the new interface of WithSplitAssignment? We can
> > > provide
> > > >> > >> default implementations, so backwards compatibility won't be an
> > > >> issue.
> > > >> > >>
> > > >> > >> I don't think we can provide a default implementation. How
> would
> > we
> > > >> do
> > > >> > >> that? Would it be just a no-op? Is it better than having an
> > opt-in
> > > >> > >> interface? The default implementation would have to be added
> > > >> exclusively
> > > >> > >>
> > > >> > >> in
> > > >> > >>
> > > >> > >> a *Public* SourceReader interface. By the way notice
> > > SourceReaderBase
> > > >> > >> does extend from WithSplitsAlignment, so effectively all
> > > >> implementations
> > > >> > >>
> > > >> > >> do
> > > >> > >>
> > > >> > >> handle the alignment case. To be honest I think it is
> impossible
> > to
> > > >> > >> implement the SourceReader interface directly by end users.
> > > >> > >>
> > > >> > >> 2. As you mentioned, the SplitReader interface probably also
> > needs
> > > >> some
> > > >> > >> change to support throttling at the split granularity. Can you
> > add
> > > >> that
> > > >> > >> interface change into the public interface section as well?
> > > >> > >>
> > > >> > >> It has been added from the beginning. See *AlignedSplitReader.*
> > > >> > >>
> > > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
> > given
> > > >> that
> > > >> > >>
> > > >> > >> it
> > > >> > >>
> > > >> > >> is not actually changing the split assignments? It seems
> > something
> > > >> like
> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
> > > accurate.
> > > >> > >>
> > > >> > >> The method's called *alignSplits*, not assign. Do you still
> > prefer
> > > a
> > > >> > >> different name for that? Personally, I am open for suggestions
> > > here.
> > > >> > >>
> > > >> > >> Best,
> > > >> > >>
> > > >> > >> Dawid
> > > >> > >>
> > > >> > >> [1]
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > >> > >>
> > > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> > > >> > >>
> > > >> > >> Thanks for driving the effort, Sebastion. I think the
> motivation
> > > >> makes a
> > > >> > >> lot of sense. Just a few suggestions / questions.
> > > >> > >>
> > > >> > >> 1. I think watermark alignment is sort of a general use case,
> so
> > > >> should
> > > >> > >>
> > > >> > >> we
> > > >> > >>
> > > >> > >> just add the related methods to SourceReader directly instead
> of
> > > >> > >> introducing the new interface of WithSplitAssignment? We can
> > > provide
> > > >> > >> default implementations, so backwards compatibility won't be an
> > > >> issue.
> > > >> > >>
> > > >> > >> 2. As you mentioned, the SplitReader interface probably also
> > needs
> > > >> some
> > > >> > >> change to support throttling at the split granularity. Can you
> > add
> > > >> that
> > > >> > >> interface change into the public interface section as well?
> > > >> > >>
> > > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
> > given
> > > >> that
> > > >> > >>
> > > >> > >> it
> > > >> > >>
> > > >> > >> is not actually changing the split assignments? It seems
> > something
> > > >> like
> > > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
> > > accurate.
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >>
> > > >> > >> Jiangjie (Becket) Qin
> > > >> > >>
> > > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <
> stevenz3wu@gmail.com
> > >
> > > <
> > > >> > stevenz3wu@gmail.com> <st...@gmail.com> <
> stevenz3wu@gmail.com>
> > <
> > > >> > >> stevenz3wu@gmail.com> <
> > > >> > >> stevenz3wu@gmail.com> wrote:
> > > >> > >>
> > > >> > >> However, a single source operator may read data from multiple
> > > >> > >>
> > > >> > >> splits/partitions, e.g., multiple Kafka partitions, such that
> > even
> > > >> with
> > > >> > >> watermark alignment the source operator may need to buffer
> > > excessive
> > > >> > >>
> > > >> > >> amount
> > > >> > >>
> > > >> > >> of data if one split emits data faster than another.
> > > >> > >>
> > > >> > >> For this part from the motivation section, is it accurate?
> Let's
> > > >> assume
> > > >> > >>
> > > >> > >> one
> > > >> > >>
> > > >> > >> source task consumes from 3 partitions and one of the partition
> > is
> > > >> > >> significantly slower. In this situation, watermark for this
> > source
> > > >> task
> > > >> > >> won't hold back as it is reading recent data from other two
> Kafka
> > > >> > >> partitions. As a result, it won't hold back the overall
> > watermark.
> > > I
> > > >> > >> thought the problem is that we may have late data for this slow
> > > >> > >>
> > > >> > >> partition.
> > > >> > >>
> > > >> > >> I have another question about the restart. Say split alignment
> is
> > > >> > >> triggered. checkpoint is completed. job failed and restored
> from
> > > the
> > > >> > last
> > > >> > >> checkpoint. because alignment decision is not checkpointed,
> > > initially
> > > >> > >> alignment won't be enforced until we get a cycle of watermark
> > > >> > aggregation
> > > >> > >> and propagation, right? Not saying this corner is a problem.
> Just
> > > >> want
> > > >> > to
> > > >> > >> understand it more.
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org>
> <
> > > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> > > >> > >> thw@apache.org> <
> > > >> > >> thw@apache.org> wrote:
> > > >> > >>
> > > >> > >> Thanks for working on this!
> > > >> > >>
> > > >> > >> I wonder if "supporting" split alignment in SourceReaderBase
> and
> > > then
> > > >> > >>
> > > >> > >> doing
> > > >> > >>
> > > >> > >> nothing if the split reader does not implement
> AlignedSplitReader
> > > >> could
> > > >> > >>
> > > >> > >> be
> > > >> > >>
> > > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added to
> > the
> > > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
> > explicit
> > > >> that
> > > >> > >> the source actually supports it.
> > > >> > >>
> > > >> > >> Thanks,
> > > >> > >> Thomas
> > > >> > >>
> > > >> > >>
> > > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
> > > knaufk@apache.org>
> > > >> <
> > > >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
> > > >> > >> knaufk@apache.org> <
> > > >> > >> knaufk@apache.org>
> > > >> > >>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >>
> > > >> > >> Hi Sebastian, Hi Dawid,
> > > >> > >>
> > > >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka
> the
> > > >> stop
> > > >> > >>
> > > >> > >> &
> > > >> > >>
> > > >> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> > > >> > >>
> > > >> > >> correct?
> > > >> > >>
> > > >> > >> +1 in general. I believe it is valuable to complete the
> watermark
> > > >> > >>
> > > >> > >> aligned
> > > >> > >>
> > > >> > >> story with this FLIP.
> > > >> > >>
> > > >> > >> Cheers,
> > > >> > >>
> > > >> > >> Konstantin
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> > > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > > >> > >>
> > > >> > >> wrote:
> > > >> > >>
> > > >> > >>
> > > >> > >> To be explicit, having worked on it, I support it ;) I think we
> > can
> > > >> > >> start a vote thread soonish, as there are no concerns so far.
> > > >> > >>
> > > >> > >> Best,
> > > >> > >>
> > > >> > >> Dawid
> > > >> > >>
> > > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > > >> > >>
> > > >> > >> Dear Flink developers,
> > > >> > >>
> > > >> > >> I would like to open a discussion on FLIP 217 [1] for an
> > extension
> > > >> > >>
> > > >> > >> of
> > > >> > >>
> > > >> > >> Watermark Alignment to perform alignment also in SplitReaders.
> To
> > > >> > >>
> > > >> > >> do
> > > >> > >>
> > > >> > >> so,
> > > >> > >>
> > > >> > >> SplitReaders must be able to suspend and resume reading from
> > split
> > > >> > >>
> > > >> > >> sources
> > > >> > >>
> > > >> > >> where the SourceOperator coordinates and controlls suspend and
> > > >> > >>
> > > >> > >> resume.
> > > >> > >>
> > > >> > >> To
> > > >> > >>
> > > >> > >> gather information about current watermarks of the
> SplitReaders,
> > we
> > > >> > >>
> > > >> > >> extend
> > > >> > >>
> > > >> > >> the internal WatermarkOutputMulitplexer and report watermarks
> to
> > > >> > >>
> > > >> > >> the
> > > >> > >>
> > > >> > >> SourceOperator.
> > > >> > >>
> > > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
> > > >> > >>
> > > >> > >> revised
> > > >> > >>
> > > >> > >> and
> > > >> > >>
> > > >> > >> reworked by Dawid Wysakowicz (He did most of the work.) and me.
> > The
> > > >> > >>
> > > >> > >> changes
> > > >> > >>
> > > >> > >> are backwards compatible in a way that if affected components
> do
> > > >> > >>
> > > >> > >> not
> > > >> > >>
> > > >> > >> support split alignment the behavior is as before.
> > > >> > >>
> > > >> > >> Best,
> > > >> > >> Sebastian
> > > >> > >>
> > > >> > >> [1]
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > >> > >>
> > > >> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
> > > >> > >>
> > > >> > >> --
> > > >> > >>
> > > >> > >> Konstantin Knaufhttps://
> > > >> twitter.com/snntrablehttps://github.com/knaufk
> > > >> > >>
> > > >> > >>
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Guowei Ma <gu...@gmail.com>.
Hi,

We know that in the case of Bounded input Flink supports the Batch
execution mode. Currently in Batch execution mode, flink is executed on a
stage-by-stage basis. In this way, perhaps watermark alignment might not
gain much.

So my question is: Is watermark alignment the default behavior(for
implemented source only)? If so, have you considered evaluating the impact
of this behavior on the Batch execution mode? Or thinks it is not necessary.

Correct me if I miss something.

Best,
Guowei


On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski <pi...@gmail.com>
wrote:

> Hi Becket and Dawid,
>
> > I feel that no matter which option we choose this can not be solved
> entirely in either of the options, because of the point above and because
> the signature of SplitReader#pauseOrResumeSplits and
> SourceReader#pauseOrResumeSplits are slightly different (one identifies
> splits with splitId the other one passes the splits directly).
>
> Yes, that's a good point in this case and for features that need to be
> implemented in more than one place.
>
> > Is there any reason for pausing reading from a split an optional feature,
> > other than that this was not included in the original interface?
>
> An additional argument in favor of making it optional is to simplify source
> implementation. But on its own I'm not sure if that would be enough to
> justify making this feature optional. Maybe.
>
> > I think it would be way simpler and clearer to just let end users and
> Flink
> > assume all the connectors will implement this feature.
>
> As I wrote above that would be an interesting choice to make (ease of
> implementation for new users, vs system consistency). Regardless of that,
> yes, for me the main argument is the API backward compatibility. But let's
> clear a couple of points:
> - The current proposal adding methods to the base interface with default
> implementations is an OPTIONAL feature. Same as the decorative version
> would be.
> - Decorative version could implement "throw UnsupportedOperationException"
> if user enabled watermark alignment just as well and I agree that's a
> better option compared to logging a warning.
>
> Best,
> Piotrek
>
>
> śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com> napisał(a):
>
> > Thanks for the reply and patient discussion, Piotr and Dawid.
> >
> > Is there any reason for pausing reading from a split an optional feature,
> > other than that this was not included in the original interface?
> >
> > To be honest I am really worried about the complexity of the user story
> > here. Optional features like this have a high overhead. Imagine this
> > feature is optional, now a user enabled watermark alignment and defined a
> > few watermark groups. Would it work? Hmm, that depends on whether the
> > involved Source has implmemented this feature. If the Sources are well
> > documented, good luck. Otherwise end users may have to look into the code
> > of the Source to see whether the feature is supported. Which is something
> > they shouldn't have to do.
> >
> > I think it would be way simpler and clearer to just let end users and
> Flink
> > assume all the connectors will implement this feature. After all the
> > watermark group is not optinoal to the end users. If in some rare cases,
> > the feature cannot be supported, a clear UnsupportedOperationException
> will
> > be thrown to tell users to explicitly remove this Source from the
> watermark
> > group. I don't think we should have a warning message here, as they tend
> to
> > be ignored in many cases. If we do this, we don't even need the
> supportXXX
> > method in the Source for this feature. In fact this is exactly how many
> > interfaces works today. For example, SplitEnumerator#addSplitsBack() is
> not
> > supported by Pravega source because it does not support partial failover.
> > In that case, it simply throws an exception to trigger a global recovery.
> >
> > The reason we add a default implementation in this case would just for
> the
> > sake of backwards compatibility so the old source can still compile.
> Sure,
> > in short term, this feature might not be supported by many existing
> > sources. That is OK, and it is quite visible to the source developers
> that
> > they did not override the default impl which throws an
> > UnsupportedOperationException.
> >
> > @Dawid,
> >
> > the Java doc of the SupportXXX() method in the Source would be the single
> > >> source of truth regarding how to implement this feature.
> > >
> > >
> >
> > I also don't find it entirely true. Half of the classes are theoretically
> > > optional and are utility classes from the point of view how the
> > interfaces
> > > are organized. Theoretically users do not need to use any of
> > > SourceReaderBase & SplitReader. Would be weird to list their methods in
> > the
> > > Source interface.
> >
> > I think the ultimate goal of java docs is to guide users to implement the
> > Source. If SourceReaderBase is the preferred way to implement a
> > SourceReader, it seems worth mentioning that. Even the Java language
> > documentation interfaces lists the konwn implementations [1] so people
> can
> > leverage them. But for this particular case, if we make the feature
> > non-optional, we don't even need the supportXXX() method for now.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <dw...@apache.org>
> > wrote:
> >
> > > Hey Piotr and Becket,
> > >
> > > First of all, let me say I am happy with whichever option is agreed in
> > the
> > > discussion.
> > >
> > > I wanted to clarify a few points from the discussion though:
> > >
> > > @Becket:
> > >
> > > The main argument for adding the methods to the SourceReader is that
> > these
> > > methods are effectively NON-OPTIONAL to SourceReader impl, i.e.
> starting
> > > from this FLIP, all the SourceReaders impl are expected to support this
> > > method, although some old implementations may not have implemented this
> > > feature. I think we should distinguish the new features from the
> optional
> > > features. While the public decorative interface is a solution to the
> > > optional features. We should not use it for the features that are
> > > non-optional.
> > >
> > > I don't think that this feature is NON-OPTIONAL. Even though
> preferred, I
> > > still think it can be simply optional.
> > >
> > > the Java doc of the SupportXXX() method in the Source would be the
> single
> > > source of truth regarding how to implement this feature.
> > >
> > > I also don't find it entirely true. Half of the classes are
> theoretically
> > > optional and are utility classes from the point of view how the
> > interfaces
> > > are organized. Theoretically users do not need to use any of
> > > SourceReaderBase & SplitReader. Would be weird to list their methods in
> > the
> > > Source interface.
> > >
> > > @Piotr
> > >
> > > If we have all of the methods with default implementation in the base
> > > interface, the API doesn't give any clue to the user which set of
> methods
> > > are required to be implemented at the same time.
> > >
> > > I feel that no matter which option we choose this can not be solved
> > > entirely in either of the options, because of the point above and
> because
> > > the signature of SplitReader#pauseOrResumeSplits and
> > > SourceReader#pauseOrResumeSplits are slightly different (one identifies
> > > splits with splitId the other one passes the splits directly).
> > >
> > > Best,
> > >
> > > Dawid
> > > On 03/05/2022 14:30, Becket Qin wrote:
> > >
> > > Hi Piotr,
> > >
> > > Thanks for the comment.
> > >
> > > Just to clarify, I am not against the decorative interfaces, but I do
> > > think we should use them with caution. The main argument for adding the
> > > methods to the SourceReader is that these methods are
> > > effectively NON-OPTIONAL to SourceReader impl, i.e. starting from this
> > > FLIP, all the SourceReaders impl are expected to support this
> > > method, although some old implementations may not have implemented this
> > > feature. I think we should distinguish the new features from the
> optional
> > > features. While the public decorative interface is a solution to the
> > > optional features. We should not use it for the features that are
> > > non-optional.
> > >
> > > That said, this feature is optional for SplitReaders. Arguably we can
> > have
> > > a decorative interface for that, but for simplicity and symmetry of the
> > > interface, personally I prefer just adding a new method.
> > >
> > > Regarding the advantages you mentioned about the decorative interfaces,
> > > they would make sense if:
> > > 1. The feature is optional.
> > > 2. There is only one decorative interface involved for a feature.
> > > Otherwise the argument that all the methods are grouped together will
> not
> > > stand.
> > >
> > > Compared with that, I think the current solution works fine in all
> cases,
> > > i.e. "having supportXXX() method in Source, and default methods /
> > > decorative interfaces in base interfaces.".
> > >
> > > The advantages are:
> > >> - clean and easy to implement base interface
> > >
> > > In the current approach, the Java doc of the SupportXXX() method in the
> > > Source would be the single source of truth regarding how to implement
> > this
> > > feature. It lists the method that has to be implemented to support this
> > > feature, regardless of how many classes / interfaces are involved.
> > >
> > > When implementing the base interface, users do not need to implement a
> > > method with default implementation. If they are curious what the method
> > is
> > > for, the java doc of that method simply points users to the
> SupportXXX()
> > > method in the Source. It seems not adding work to the users compared
> with
> > > decorative interfaces, but gives much better discoverability.
> > >
> > > - all of the methods from a single feature are grouped in a single
> > >> decorator interface, together with their dedicated java doc. It's also
> > >> easier to google search for help using the decorator name
> > >
> > > - if an optional feature requires two methods to be implemented at
> once,
> > >> decorator can guarantee that
> > >
> > > These two points are not true when multiple components and classes are
> > > involved collaboratively to provide a feature. In our case, we have
> both
> > > SourceReader and SplitReader involved. And there might be other
> > interfaces
> > > on the JM side involved for some future features. So the relevant
> methods
> > > can actually be scattered over the places. That said, we may still use
> > > decorative interfaces for each component, if the feature is optional,
> > given
> > > there is a single source of truth for the feature.
> > >
> > > Here I would strongly lean towards making life easier for new users,
> > >> lowering the entry barrier, at the (imo) slight expense for the power
> > >> users.
> > >
> > > I actually think the current approach is simpler, more extensible and
> > more
> > > general for all the users. Can you articulate a bit more on which part
> > you
> > > think makes users harder to understand?
> > >
> > >
> > > There is another benefit of the decorative interfaces which is not
> > > mentioned, but might be worth considering here. Usually the decorative
> > > interfaces give slightly better backwards compatibility than the new
> > > default method in the interfaces. That is when users are using a jar
> that
> > > was compiled with an older version of Flink which does not have the
> > default
> > > method in the interfaces in question. A decorative interface may still
> > > provide backwards compatibility in that case, while default method impl
> > > cannot.
> > >
> > > I think in Flink we in general do not guarantee custom components
> > compiled
> > > with an older version can run with a newer version of Flink. A
> recompile
> > > with a newer version would be required. That said, if we do care about
> > > this, we can just change the "supportXXX()" method in the Source
> > interface
> > > to use decorative interfaces, and leave the other parts unchanged.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > >
> > > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pn...@apache.org>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> Sorry for chipping in so late, but I was OoO for the last two weeks.
> > >> Regarding the interfaces, I would be actually against adding those
> > methods
> > >> to the base interfaces for the reasons mentioned above. Clogging the
> > base
> > >> interface for new users with tons of methods that they do not need, do
> > not
> > >> understand and do not know what to do with them. Moreover, such
> > decorative
> > >> interfaces are solving a problem if a feature requires two or more
> > methods
> > >> to be implemented at the same time. If we have all of the methods with
> > >> default implementation in the base interface, the API doesn't give any
> > >> clue
> > >> to the user which set of methods are required to be implemented at the
> > >> same
> > >> time.
> > >>
> > >> > a) I feel the biggest drawback of decorative interfaces is which
> > >> interface
> > >> > they can decorate and which combinations of multiple decorative
> > >> interfaces
> > >> > are valid. (...)
> > >> > In the future, if there is a new feature added
> > >> > (e.g. sorted or pre-partitioned data aware), are we going to create
> > >> another
> > >> > interface of SplitReader such as SortedSplitReader or
> > >> PrePartitionedAware?
> > >> > Can they be combined? So I think the additional decorative interface
> > >> like
> > >> > withSplitsAlignment actually increases the understanding cost of
> users
> > >> > because they have to know what decorative interfaces are there,
> which
> > >> > interface they can decorate and which combinations of the decorative
> > >> > interfaces are valid and which are not. Ideally we want to avoid
> that.
> > >>
> > >> I'm not sure if I understand how embedding default methods in the base
> > >> interface is solving the problem: what can be combined or not? If
> there
> > >> are
> > >> two conflicting features, having decorative interfaces that can not be
> > >> mixed together actually makes much more sense to me rather than having
> > >> them
> > >> all in one base class. How would you allow users to implement only one
> > of
> > >> those two features?
> > >>
> > >> To reiterate on the issue. Yes, there are drawbacks:
> > >> - how can a user discover what decorators are there?
> > >> - how can a user know where the decorator can be applied to?
> > >>
> > >> However those are drawbacks for more power users, that can be
> mitigated
> > by
> > >> the documentation. For example listing all of the decorators with
> > >> detailed explanation both in the docs and in the java docs. More
> > >> experienced users will be able to deal with those issues easier, as
> they
> > >> will already have some basic understanding of Flink. Also if user has
> a
> > >> problem that he wants to solve, he will google search a potential
> > solution
> > >> to his problem anyway, and while doing that he is very likely to
> > discover
> > >> the decorator that he needs anyway in the docs.
> > >>
> > >> The advantages are:
> > >> - clean and easy to implement base interface
> > >> - all of the methods from a single feature are grouped in a single
> > >> decorator interface, together with their dedicated java doc. It's also
> > >> easier to google search for help using the decorator name
> > >> - if an optional feature requires two methods to be implemented at
> once,
> > >> decorator can guarantee that
> > >>
> > >> Here I would strongly lean towards making life easier for new users,
> > >> lowering the entry barrier, at the (imo) slight expense for the power
> > >> users.
> > >>
> > >> Best,
> > >> Piotrek
> > >>
> > >>
> > >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com>
> napisał(a):
> > >>
> > >> > Thanks for the reply Sebastian and Dawid,
> > >> >
> > >> > I think Sebastion has a good summary. This is a really helpful
> > >> discussion.
> > >> >
> > >> > Thinking a bit more, I feel that it might still be better to add the
> > >> > supportsXXX() method in the Source rather than SourceReader.
> > >> >
> > >> > Generally speaking, what we are trying to do here is to let the
> Flink
> > >> > framework know what the Source is capable of. In this FLIP, it
> happens
> > >> to
> > >> > be the capability that only involves SourceReader. But in the
> future,
> > >> it is
> > >> > possible that another functionality involves both the
> SplitEnumerator
> > >> and
> > >> > SourceReader. In that case, following the current approach, we
> should
> > >> put
> > >> > the "supportsXXX()" method in both SplitEnumerator and SourceReader.
> > >> > Because if we only put this in the SourceReader, then the JM would
> > have
> > >> to
> > >> > create a SourceReader in order to know whether this feature is
> > >> supported,
> > >> > which is a little ugly. But if we put the "supportsXXX()" method in
> > the
> > >> > Source, we will break the "symmetric" design because this FLIP
> chose a
> > >> > different way.
> > >> >
> > >> > This is also why I think supportsXXX() method seems a good thing to
> > >> have,
> > >> > because when there are a few interfaces / methods that are expected
> to
> > >> be
> > >> > implemented at the same time in order to deliver a feature, it is
> > always
> > >> > good to have a single source of truth to tell the framework what to
> > do,
> > >> so
> > >> > the framework can do consistent things in different parts.
> > >> >
> > >> > @Sebastian Mattheis <se...@ververica.com>
> > >> >
> > >> > Regarding interface flavor b), i.e. AlignedSourceReader +
> > >> > AlignedSplitReader, what I feel awkward about is that we are
> > essentially
> > >> > expecting almost all the SourceReader implementations to extend
> > >> > SourceReaderBase, which effectively makes the SourceReader interface
> > >> > without the pausing support useless. So this indicates that public
> > >> > decorative interfaces (or sub-interfaces for the same purpose) only
> > >> > make sense if the original interface is also expected to be used.
> > >> > Otherwise, it seems makes more sense to add the method to the
> original
> > >> > interface itself.
> > >> >
> > >> > Cheers,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> > >> dwysakowicz@apache.org>
> > >> > wrote:
> > >> >
> > >> > > Thanks @Sebastian for the nice summary.
> > >> > >
> > >> > > I think most of your points aligned with the suggestions I made to
> > the
> > >> > > FLIP, while you were writing your reply (I believe we hit enter
> > >> nearly at
> > >> > > the same time ;) )
> > >> > >
> > >> > > Two points after we synced offline
> > >> > >
> > >> > > 1. I changed also the supportsWatermarksSplitAlignment to
> > >> > > supportsPausingSplits to express the general capability of
> pausing.
> > >> > >
> > >> > > 2. As for if we should PausingSourceReader/PausingSplitReader
> > (option
> > >> b)
> > >> > > or if we should just add the methods (option c), I suggest to
> simply
> > >> add
> > >> > > the two methods as I felt this is much preferred approach Becket,
> > >> which
> > >> > > others do not object. Unless there is an opposition let's go with
> > this
> > >> > > option c.
> > >> > >
> > >> > > Best,
> > >> > >
> > >> > > Dawid
> > >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> > >> > >
> > >> > > Hi folks,
> > >> > >
> > >> > > Sorry for being a bit silent. Many thanks for all the input and
> > >> > > suggestions. As I'm a bit new, I needed some time to catch up and
> > >> > structure
> > >> > > (for myself) the discussion and I wanted to find a way to
> structure
> > >> the
> > >> > > conclusions. (Also because I had the feeling that some concerns
> got
> > >> lost
> > >> > in
> > >> > > the discussion.) This is my attempt and please correct me if
> > >> something is
> > >> > > wrong or misunderstood. I tried to collect and assemble the
> > opinions,
> > >> > > suggestions, and conclusions (to the best of my knowledge):
> > >> > >
> > >> > > # Top A: Should split alignment (pause/resume behavior) be a
> general
> > >> > > capability?
> > >> > >
> > >> > > I personally don't see any reason no to have it a general
> capability
> > >> > > because for the alignSplit method it is actually independent of
> the
> > >> > > watermarks. If we agree here to have it a general capability, we
> > >> should
> > >> > > also agree on the right wording. Does "alignSplits(splitsToResume,
> > >> > > splitsToPause)" refer to what is then actually meant? (I see it as
> > >> okay.
> > >> > I
> > >> > > don't have any better idea whilst Arvid suggested
> > >> "pauseOrResumeSplits".)
> > >> > >
> > >> > > # Top B: Should it be possible do enable/disable split alignment?
> > >> > >
> > >> > > I would personally not disable the split alignment on the source
> > >> reader
> > >> > > side because if split alignment is used for some other use case
> (see
> > >> A)
> > >> > it
> > >> > > could have nasty side effects on other/future use cases. Instead,
> I
> > >> would
> > >> > > disable "watermark split alignment" where I think it should
> disable
> > >> the
> > >> > > watermark-dependent trigger for split alignment.
> > >> > >
> > >> > > # Top C: Should we add a supportsX method?
> > >> > >
> > >> > > I find it difficult to define the scope of a supportsX method
> w.r.t.
> > >> to
> > >> > > the following questions: a) Where is it used? and b) What is the
> > >> expected
> > >> > > output? To b), it's not straight-forward to provide a meaningful
> > >> output,
> > >> > > e.g., if SourceReader supports split alignment but SplitReader
> not.
> > >> This
> > >> > is
> > >> > > because with the current implementation, we can determine whether
> > >> split
> > >> > > alignment is fully supported only during runtime and specifically
> > >> > actually
> > >> > > only when calling alignSplits down the call hierarchy up to the
> > actual
> > >> > > SplitReaders.
> > >> > >
> > >> > > Therefore, I would suggest to either raise an error or warning if
> > the
> > >> > > alignment is called but not supported at some point. I know we
> > should
> > >> > > carefully think about when this could be the case because we don't
> > >> want
> > >> > to
> > >> > > flood anybody with such warnings. However, warnings could be an
> > >> indicator
> > >> > > for the user that for watermark split alignment use case split
> > >> reading is
> > >> > > imbalanced with the conclusion to either disable the trigger for
> > >> > watermark
> > >> > > split alignment (see Top B) or to use/implement a source and
> reader
> > >> that
> > >> > > fully supports split alignment.
> > >> > >
> > >> > > # Top D: How to design interfaces?
> > >> > >
> > >> > > Thanks for structuring the discussion with the the various
> > >> possibilities
> > >> > > (a-d). From the discussion and emails, I would like to summarize
> the
> > >> > > following requirements:
> > >> > > - Interfaces should be consistent ("symmetric"), i.e., similar
> > >> semantics
> > >> > > should have similar interfaces with similar usage.
> > >> > > - Make explicit which implementations implement interfaces/support
> > >> > > behavior.
> > >> > > - Make clear what are default implementations and how to implement
> > >> > > interfaces with desired behavior.
> > >> > >
> > >> > > This is a simplified view of the relations between relevant
> classes
> > of
> > >> > the
> > >> > > PoC implementation:
> > >> > >
> > >> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- ..
> <|--
> > >> > > MySourceReader
> > >> > >
> > >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
> SplitFetcher
> > >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
> > >> > >
> > >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> > >> > >
> > >> > > Note that SourceReaderBase and SplitFetcherManager implement most
> of
> > >> the
> > >> > > "logic" for split alignment just because we wanted to implement
> > split
> > >> > > alignment and wanted it to be available as kind of a default. As a
> > >> > > consequence, we have a "default implementation" for SourceReader
> > that
> > >> > > implements the actual logic for split alignment. For that reason,
> I
> > >> find
> > >> > it
> > >> > > very confusing to have a NOOP default implementation in the
> > interface
> > >> for
> > >> > > the SourceReader. As a consequence, interface strategy c) is
> > difficult
> > >> > > because this would require NOOP default implementations in the
> > public
> > >> > > interfaces of SourceReader and SplitReader. This is the same for
> > >> strategy
> > >> > > d) because it would require NOOP default implementation in the
> > >> > > SourceReader. Further, as Dawid described method signatures of
> > >> alignSplit
> > >> > > for SourceReader and SplitReader differ and it would be extremely
> > >> > difficult
> > >> > > to make the signatures the same (with even potential performance
> > >> impact
> > >> > > because of additional loop-ups of split ids). Therefore, having a
> > >> > symmetric
> > >> > > decorative interface as of strategy a) is actually not possible
> and
> > >> > having
> > >> > > two decorative interfaces with different method signatures is
> > >> confusing.
> > >> > My
> > >> > > conclusion is that we are best with strategy b) which means to
> have
> > >> > > specializing sub-interfaces that inherit from the parent
> interface:
> > >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
> > >> > AlignedSplitReader
> > >> > > With this option, I'm not 100% sure what the implications are and
> if
> > >> this
> > >> > > could get nasty. I would suggest that Dawid and I just try to
> > >> implement
> > >> > and
> > >> > > see if we like it. :)
> > >> > >
> > >> > > # Summary
> > >> > >
> > >> > > In conclusion, please let me know your perspectives. Please
> correct
> > >> me,
> > >> > if
> > >> > > something is wrong or if I misunderstood something. My perspective
> > >> would
> > >> > be:
> > >> > >
> > >> > > Top A: Yes
> > >> > > Top B: Yes (but disable watermark trigger for split alignment)
> > >> > > Top C: No
> > >> > > Top D: b)
> > >> > >
> > >> > > Best,
> > >> > > Sebastian
> > >> > >
> > >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> > >> dwysakowicz@apache.org
> > >> > >
> > >> > > wrote:
> > >> > >
> > >> > >> @Arvid:
> > >> > >>
> > >> > >> While I also like Becket's capability approach, I fear that it
> > >> doesn't
> > >> > work
> > >> > >> for this particular use case: Sources can always be aligned
> > >> cross-task
> > >> > and
> > >> > >> this is just about intra-task alignment. So it's plausible to put
> > >> > sources
> > >> > >> into an alignment group even though they do not use any of the
> > >> presented
> > >> > >> API of FLIP-217. They should just issue a warning, if they handle
> > >> > multiple
> > >> > >> splits (see motivation section).
> > >> > >>
> > >> > >> Yes, but the "supportXXX" method would be for telling if it
> > supports
> > >> > that
> > >> > >> intra-task alignment. Cross-task alignment would always be
> > supported.
> > >> > >>
> > >> > >> I updated interfaces to what I believe to be closest to a
> consensus
> > >> > >> between all participants. Do you mind taking a look?
> > >> > >>
> > >> > >> @Sebastian Do you mind addressing the nits?
> > >> > >>
> > >> > >> Best,
> > >> > >>
> > >> > >> Dawid
> > >> > >>
> > >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > >> > >>
> > >> > >> Thanks for pushing this effort.
> > >> > >>
> > >> > >> I'd actually be in favor of 1b). I fully agree that decorator
> > >> interfaces
> > >> > >> should be avoided but I'm also not a big fan of overloading the
> > base
> > >> > >> interfaces (they are hard to implement as is). The usual feedback
> > to
> > >> > >> Source-related interfaces are always that they are overwhelming
> and
> > >> too
> > >> > >> hard to implement. However, I'd also not oppose 1c) as scattered
> > >> > interfaces
> > >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> > >> > >> While I also like Becket's capability approach, I fear that it
> > >> doesn't
> > >> > work
> > >> > >> for this particular use case: Sources can always be aligned
> > >> cross-task
> > >> > and
> > >> > >> this is just about intra-task alignment. So it's plausible to put
> > >> > sources
> > >> > >> into an alignment group even though they do not use any of the
> > >> presented
> > >> > >> API of FLIP-217. They should just issue a warning, if they handle
> > >> > multiple
> > >> > >> splits (see motivation section).
> > >> > >>
> > >> > >> I think renaming alignSplits to facilitate future use cases makes
> > >> sense
> > >> > but
> > >> > >> then all interfaces (if 1c) is chosen) should be adjusted
> > >> accordingly.
> > >> > >> AlignedSourceReader could be PausingSourceReader and I'd go for
> > >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could also
> split
> > it
> > >> > into
> > >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may allow
> > >> Sources
> > >> > to
> > >> > >> just use 1 instead of 2 library calls (as written in the
> Javadoc),
> > >> both
> > >> > >> Kafka and Pulsar can't use it and I'm not sure if there is a
> system
> > >> that
> > >> > >> can.
> > >> > >>
> > >> > >> Some nit for the FLIP:
> > >> > >> - Please replace "stop" with "pause".
> > >> > >> - Not sure if it's worth it in the capability section: Sources
> that
> > >> > adopt
> > >> > >> this interface cannot be used in earlier versions. So it feels
> like
> > >> we
> > >> > are
> > >> > >> only forward compatible (old sources can be used after the
> change);
> > >> but
> > >> > I
> > >> > >> guess this holds for any API addition.
> > >> > >> - You might want to add what happens when all splits are paused.
> > >> > >> - You may want to describe how the 3 flavors of SourceReaderBase
> > >> > interact
> > >> > >> with the interface.
> > >> > >> - I'm not sure if it makes sense to include Kafka and Pulsar in
> the
> > >> > FLIP.
> > >> > >> For me, this is rather immediate follow-up work. (could be in the
> > >> same
> > >> > >> umbrella ticket)
> > >> > >>
> > >> > >> Best,
> > >> > >>
> > >> > >> Arvid
> > >> > >>
> > >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >> > >> wrote:
> > >> > >>
> > >> > >>
> > >> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> > >> along
> > >> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment",
> > or
> > >> > >> b) "MySourceReader implements AlignedSourceReader" and
> > "MySplitReader
> > >> > >> implements AlignedSplitReader", or
> > >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > >> > implements
> > >> > >> SplitReader".
> > >> > >>
> > >> > >> I think the latest proposal according to Dawid would be:
> > >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > >> > implements
> > >> > >> AlignedSplitReader".
> > >> > >> I am fine with this API, although personally speaking I think it
> is
> > >> > simpler
> > >> > >> to just add a new method to the split reader with default impl.
> > >> > >>
> > >> > >>
> > >> > >> I think that is a good idea to have it aligned as much as
> possible.
> > >> I'd
> > >> > be
> > >> > >> +1 for your option c). We can merge AlignedSplitReader with
> > >> > SplitReader. We
> > >> > >> will update the FLIP shortly.
> > >> > >>
> > >> > >> Best,
> > >> > >>
> > >> > >> Dawid
> > >> > >>
> > >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> > >> > >>
> > >> > >> Thanks for the comment, Jark.
> > >> > >>
> > >> > >> 3. Interface/Method Name.
> > >> > >>
> > >> > >> Can the interface be used to align other things in the future?
> For
> > >> > example,
> > >> > >> align read speed, I have
> > >> > >> seen users requesting global rate limits. This feature may also
> > need
> > >> an
> > >> > >> interface like this.
> > >> > >> If we don't plan to extend this interface to support align other
> > >> > things, I
> > >> > >> suggest explicitly declaring
> > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> > >> instead
> > >> > of
> > >> > >> `alignSplits`.
> > >> > >>
> > >> > >> This is a good point. Naming wise, it would usually be more
> > >> extensible
> > >> > to
> > >> > >> just describe what the method actually does, instead of assuming
> > the
> > >> > >> purpose of doing this. For example, in this case,
> > >> pauseOrResumeSplits()
> > >> > >> would be more extensible because this can be used for any kind of
> > >> flow
> > >> > >> control, be it watermark alignment or simple rate limiting.
> > >> > >>
> > >> > >> 4. Interface or Method.
> > >> > >>
> > >> > >> I don't have a strong opinion on this. I think they have their
> own
> > >> > >> advantages.
> > >> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > >> > >> and I prefer Interfaces rather than methods in this case. When
> you
> > >> have
> > >> > a
> > >> > >> bunch of abilities and each ability
> > >> > >> has more than one method, Interfaces can help to organize them
> and
> > >> make
> > >> > >> users clear which methods
> > >> > >> need to implement when you want to have an ability.
> > >> > >>
> > >> > >> I am OK with decorative interfaces if this is a general design
> > >> pattern
> > >> > in
> > >> > >> the other components in Flink. But it looks like the current API
> > >> > proposal
> > >> > >> is not symmetric.
> > >> > >>
> > >> > >> The current proposal is essentially "MySourceReader implements
> > >> > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
> > >> implements
> > >> > >> AlignedSplitsReader".
> > >> > >>
> > >> > >> Should we make the API symmetric? I'd consider any one of the
> > >> following
> > >> > as
> > >> > >> symmetric.
> > >> > >>
> > >> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> > >> along
> > >> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment",
> > or
> > >> > >> b) "MySourceReader implements AlignedSourceReader" and
> > "MySplitReader
> > >> > >> implements AlignedSplitReader", or
> > >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > >> > implements
> > >> > >> SplitReader".
> > >> > >>
> > >> > >> I think the latest proposal according to Dawid would be:
> > >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > >> > implements
> > >> > >> AlignedSplitReader".
> > >> > >> I am fine with this API, although personally speaking I think it
> is
> > >> > simpler
> > >> > >> to just add a new method to the split reader with default impl.
> > >> > >>
> > >> > >> @Dawid Wysakowicz <dw...@apache.org> <
> dwysakowicz@apache.org
> > >
> > >> <
> > >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the
> > reply.
> > >> > >>
> > >> > >> Having said that, as I don't have a preference and I agree most
> of
> > >> the
> > >> > >>
> > >> > >> sources will support the alignment I am fine following your
> > >> suggestion
> > >> > to
> > >> > >> have the SourceReader extending from
> WithWatermarksSplitsAlignment,
> > >> but
> > >> > >> would put the "supportsXXX" there, not in the Source to keep the
> > two
> > >> > >> methods together.
> > >> > >>
> > >> > >> One benefit of having the "supportsXXX" in Source is that this
> > allows
> > >> > some
> > >> > >> compile time check. For example, if a user enabled watermark
> > >> alignment
> > >> > >> while it is not supported by the Source, an exception can be
> thrown
> > >> at
> > >> > >> compile time. It seems in general useful. That said, I agree that
> > API
> > >> > >> cleanliness wise it is better to put the two methods together.
> > >> > >>
> > >> > >> Thanks,
> > >> > >>
> > >> > >> Jiangjie (Becket) Qin
> > >> > >>
> > >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
> > >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
> > >> > >>
> > >> > >>
> > >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
> > >> > >> I don't have other concerns then.
> > >> > >>
> > >> > >> Best,
> > >> > >> Jark
> > >> > >>
> > >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> > >> dwysakowicz@apache.org>
> > >> > <dw...@apache.org> <dw...@apache.org> <
> > >> dwysakowicz@apache.org>
> > >> > >> wrote:
> > >> > >>
> > >> > >>
> > >> > >> @Jark:
> > >> > >>
> > >> > >> 1. Will the framework always align with watermarks when the
> source
> > >> > >> implements the interface?
> > >> > >> I'm afraid not every case needs watermark alignment even if Kafka
> > >> > >> implements the interface,
> > >> > >> and this will affect the throughput somehow. I agree with Becket
> > >> > >> we may need a
> > >> > >> `supportSplitsAlignment()` method for users to configure the
> source
> > >> to
> > >> > >> enable/disable the alignment.
> > >> > >>
> > >> > >> 2. How does the framework calculate maxDesiredWatermark?
> > >> > >> I think the algorithm of maxDesiredWatermark will greatly affect
> > >> > >>
> > >> > >> throughput
> > >> > >>
> > >> > >> if the reader is constantly
> > >> > >>  switching between pause and resume. Can users configure the
> > >> alignment
> > >> > >> offset?
> > >> > >>
> > >> > >>
> > >> > >> This is covered in the previous FLIP[1] which has been already
> > >> > >>
> > >> > >> implemented
> > >> > >>
> > >> > >> in 1.15. In short, it must be enabled with the watermark strategy
> > >> which
> > >> > >> also configures drift and update interval.
> > >> > >>
> > >> > >> If we don't plan to extend this interface to support align other
> > >> things,
> > >> > >>
> > >> > >> I
> > >> > >>
> > >> > >> suggest explicitly declaring
> > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> > >> instead
> > >> > of
> > >> > >> `alignSplits`.
> > >> > >>
> > >> > >>
> > >> > >> Sure let's rename it.
> > >> > >>
> > >> > >> @Becket:
> > >> > >>
> > >> > >> I understand your point. On the other hand putting all methods,
> > even
> > >> > with
> > >> > >> "supportsXXX" methods for enabling certain features, makes the
> > entry
> > >> > >> threshold for writing a new source higher. Instead of focusing on
> > the
> > >> > >>
> > >> > >> basic
> > >> > >>
> > >> > >> and required properties of the Source, the person implementing a
> > >> source
> > >> > >> must bother with and need to figure out what all of the extra
> > >> features
> > >> > >>
> > >> > >> are
> > >> > >>
> > >> > >> about and how to deal with them. It makes it also harder to
> > organize
> > >> > >> methods in coupled groups as Jark said.
> > >> > >>
> > >> > >> Having said that, as I don't have a preference and I agree most
> of
> > >> the
> > >> > >> sources will support the alignment I am fine following your
> > >> suggestion
> > >> > to
> > >> > >> have the SourceReader extending from
> WithWatermarksSplitsAlignment,
> > >> but
> > >> > >> would put the "supportsXXX" there, not in the Source to keep the
> > two
> > >> > >> methods together.
> > >> > >>
> > >> > >> Lastly, I agree it is really unfortunate the "alignSplits"
> methods
> > >> > differ
> > >> > >> slightly for SourceReader and SpitReader. The reason for that is
> > >> > >> SourceReaderBase deals only with SplitIds, whereas SplitReader
> > needs
> > >> the
> > >> > >> actual splits to pause them. I found the discrepancy acceptable
> for
> > >> the
> > >> > >> sake of simplifying changes significantly, especially as they
> would
> > >> > >>
> > >> > >> highly
> > >> > >>
> > >> > >> likely impact performance as we would have to perform additional
> > >> > lookups.
> > >> > >> Moreover the SplitReader is a secondary interface.
> > >> > >>
> > >> > >> Best,
> > >> > >>
> > >> > >> Dawid
> > >> > >>
> > >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> > >> > >>
> > >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > >> > >>
> > >> > >> Thanks for the effort, Dawid and Sebastian!
> > >> > >>
> > >> > >> I just have some minor questions (maybe I missed something).
> > >> > >>
> > >> > >> 1. Will the framework always align with watermarks when the
> source
> > >> > >> implements the interface?
> > >> > >> I'm afraid not every case needs watermark alignment even if Kafka
> > >> > >> implements the interface,
> > >> > >> and this will affect the throughput somehow. I agree with Becket
> > >> > >> we may need a
> > >> > >> `supportSplitsAlignment()` method for users to configure the
> source
> > >> to
> > >> > >> enable/disable the alignment.
> > >> > >>
> > >> > >> 2. How does the framework calculate maxDesiredWatermark?
> > >> > >> I think the algorithm of maxDesiredWatermark will greatly affect
> > >> > >>
> > >> > >> throughput
> > >> > >>
> > >> > >> if the reader is constantly
> > >> > >>  switching between pause and resume. Can users configure the
> > >> alignment
> > >> > >> offset?
> > >> > >>
> > >> > >> 3. Interface/Method Name.
> > >> > >> Can the interface be used to align other things in the future?
> For
> > >> > >>
> > >> > >> example,
> > >> > >>
> > >> > >> align read speed, I have
> > >> > >> seen users requesting global rate limits. This feature may also
> > need
> > >> an
> > >> > >> interface like this.
> > >> > >> If we don't plan to extend this interface to support align other
> > >> things,
> > >> > >>
> > >> > >> I
> > >> > >>
> > >> > >> suggest explicitly declaring
> > >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> > >> instead
> > >> > of
> > >> > >> `alignSplits`.
> > >> > >>
> > >> > >> 4. Interface or Method.
> > >> > >> I don't have a strong opinion on this. I think they have their
> own
> > >> > >> advantages.
> > >> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > >> > >> (SupportsXxxx) for TableSource/TableSink,
> > >> > >> and I prefer Interfaces rather than methods in this case. When
> you
> > >> have
> > >> > a
> > >> > >> bunch of abilities and each ability
> > >> > >> has more than one method, Interfaces can help to organize them
> and
> > >> make
> > >> > >> users clear which methods
> > >> > >> need to implement when you want to have an ability.
> > >> > >>
> > >> > >>
> > >> > >> Best,
> > >> > >> Jark
> > >> > >>
> > >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <be...@gmail.com>
> <
> > >> > becket.qin@gmail.com> <be...@gmail.com> <be...@gmail.com>
> <
> > >> > >> becket.qin@gmail.com> wrote:
> > >> > >>
> > >> > >> Hi Dawid,
> > >> > >>
> > >> > >> Thanks for the explanation. Apologies that I somehow misread a
> > bunch
> > >> of
> > >> > >> "align" and thought they were "assign".
> > >> > >>
> > >> > >> Regarding 1, by default implementation, I was thinking of the
> > default
> > >> > >>
> > >> > >> no-op
> > >> > >>
> > >> > >> implementation. I am a little worried about the proliferation of
> > >> > >>
> > >> > >> decorative
> > >> > >>
> > >> > >> interfaces. I think the most important thing about interfaces is
> > that
> > >> > >>
> > >> > >> they
> > >> > >>
> > >> > >> are easy to understand. In this case, I prefer adding new method
> to
> > >> the
> > >> > >> existing interface for the following reasons:
> > >> > >>
> > >> > >> a) I feel the biggest drawback of decorative interfaces is which
> > >> > >>
> > >> > >> interface
> > >> > >>
> > >> > >> they can decorate and which combinations of multiple decorative
> > >> > >>
> > >> > >> interfaces
> > >> > >>
> > >> > >> are valid. In the current FLIP, the withSplitsAlignment interface
> > is
> > >> > only
> > >> > >> applicable to the SourceReader which means it can't decorate any
> > >> other
> > >> > >> interface. From an interface design perspective, a natural
> question
> > >> is
> > >> > >>
> > >> > >> why
> > >> > >>
> > >> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"? And it
> > is
> > >> > also
> > >> > >> natural to assume that a split reader implementing both
> SplitReader
> > >> and
> > >> > >> WithSplitAlignment would work, because a source reader
> implementing
> > >> > >> SourceReader and withSplitsAlignment works. So why isn't there an
> > >> > >>
> > >> > >> interface
> > >> > >>
> > >> > >> of AlignedSourceReader? In the future, if there is a new feature
> > >> added
> > >> > >> (e.g. sorted or pre-partitioned data aware), are we going to
> create
> > >> > >>
> > >> > >> another
> > >> > >>
> > >> > >> interface of SplitReader such as SortedSplitReader or
> > >> > >>
> > >> > >> PrePartitionedAware?
> > >> > >>
> > >> > >> Can they be combined? So I think the additional decorative
> > interface
> > >> > like
> > >> > >> withSplitsAlignment actually increases the understanding cost of
> > >> users
> > >> > >> because they have to know what decorative interfaces are there,
> > which
> > >> > >> interface they can decorate and which combinations of the
> > decorative
> > >> > >> interfaces are valid and which are not. Ideally we want to avoid
> > >> that.
> > >> > To
> > >> > >> be clear, I am not opposing having an interface of
> > >> withSplitsAlignment,
> > >> > >>
> > >> > >> it
> > >> > >>
> > >> > >> is completely OK to have it as an internal interface and let
> > >> > SourceReader
> > >> > >> and SplitReader both extend it.
> > >> > >>
> > >> > >> b) Adding a new method to the SourceReader with a default
> > >> implementation
> > >> > >>
> > >> > >> of
> > >> > >>
> > >> > >> no-op would help avoid logic branching in the source logic,
> > >> especially
> > >> > >> given that we agree that the vast majority of the SourceReader
> > >> > >> implementations, if not all, would just extend from the
> > >> > SourceReaderBase.
> > >> > >> That means adding a new method to the interface would effectively
> > >> give
> > >> > >>
> > >> > >> the
> > >> > >>
> > >> > >> same user experience, but simpler.
> > >> > >>
> > >> > >> c) A related design principle that may be worth discussing is how
> > do
> > >> we
> > >> > >>
> > >> > >> let
> > >> > >>
> > >> > >> the Source implementations tell Flink what capability is
> supported
> > >> and
> > >> > >>
> > >> > >> what
> > >> > >>
> > >> > >> is not. Personally speaking I feel the most intuitive place to me
> > is
> > >> in
> > >> > >>
> > >> > >> the
> > >> > >>
> > >> > >> Source itself, because that is the entrance of the entire Source
> > >> > >>
> > >> > >> connector
> > >> > >>
> > >> > >> logic.
> > >> > >>
> > >> > >> Based on the above thoughts, I am wondering if the following
> > >> interface
> > >> > >> would be easier to understand by the users.
> > >> > >>
> > >> > >> - Change "withSplitsAlignment" to internal interface, let both
> > >> > >>
> > >> > >> SourceReader
> > >> > >>
> > >> > >> and SplitReader extend from it, with a default no-op
> > implementation.
> > >> > >> - Add a new method "boolean supportSplitsAlignment()" to the
> Source
> > >> > >> interface, with a default implementation returning false. Sources
> > >> that
> > >> > >>
> > >> > >> have
> > >> > >>
> > >> > >> implemented the alignment logic can change this to return true,
> and
> > >> > >> override the alignSplits() methods in the SourceReader /
> > SplitReader
> > >> if
> > >> > >> needed.
> > >> > >> - In the future, if a new optional feature is going to be added
> to
> > >> the
> > >> > >> Source, and that feature requires the awareness from Flink, we
> can
> > >> add
> > >> > >>
> > >> > >> more
> > >> > >>
> > >> > >> such methods to the Source.
> > >> > >>
> > >> > >> What do you think?
> > >> > >>
> > >> > >> Thanks,
> > >> > >>
> > >> > >> Jiangjie (Becket) Qin
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> > >> > dwysakowicz@apache.org> <dw...@apache.org> <
> > >> dwysakowicz@apache.org>
> > >> > <dw...@apache.org>
> > >> > >> <dw...@apache.org> <dw...@apache.org> <
> > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >> > >>
> > >> > >> wrote:
> > >> > >>
> > >> > >>
> > >> > >> @Konstantin:
> > >> > >>
> > >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
> > >> stop &
> > >> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> > >> correct?
> > >> > >>
> > >> > >> Correct, as far as I know though, those are the only sources
> which
> > >> > >>
> > >> > >> consume
> > >> > >>
> > >> > >> concurrently from multiple splits and thus alignment applies.
> > >> > >>
> > >> > >> @Thomas:
> > >> > >>
> > >> > >> I wonder if "supporting" split alignment in SourceReaderBase and
> > then
> > >> > >>
> > >> > >> doing
> > >> > >>
> > >> > >> nothing if the split reader does not implement AlignedSplitReader
> > >> could
> > >> > >>
> > >> > >> be
> > >> > >>
> > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added to
> the
> > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
> explicit
> > >> that
> > >> > >> the source actually supports it.
> > >> > >>
> > >> > >> I understand your concern. Hmm, I think we could actually do
> that.
> > >> Given
> > >> > >> the actual implementation of the SourceReaderBase#alignSplits is
> > >> rather
> > >> > >> short (just a forward to the corresponding method of
> SplitFetcher),
> > >> we
> > >> > >> could reimplement it in the actual source implementations. This
> > >> solution
> > >> > >> has the downside though. Authors of new sources would have to do
> > two
> > >> > >> things: extend from AlignedSplitReader and implement
> > >> > >>
> > >> > >> WithSplitsAssignment,
> > >> > >>
> > >> > >> instead of just extending AlignedSplitReader. I would be fine
> with
> > >> such
> > >> > a
> > >> > >> tradeoff though. What others think?
> > >> > >>
> > >> > >> @Steven:
> > >> > >>
> > >> > >> For this part from the motivation section, is it accurate? Let's
> > >> assume
> > >> > >>
> > >> > >> one
> > >> > >>
> > >> > >> source task consumes from 3 partitions and one of the partition
> is
> > >> > >> significantly slower. In this situation, watermark for this
> source
> > >> task
> > >> > >> won't hold back as it is reading recent data from other two Kafka
> > >> > >> partitions. As a result, it won't hold back the overall
> watermark.
> > I
> > >> > >> thought the problem is that we may have late data for this slow
> > >> > >>
> > >> > >> partition.
> > >> > >>
> > >> > >> It will hold back the watermark. Watermark of an operator is the
> > >> minimum
> > >> > >> of watermarks of all splits[1]
> > >> > >>
> > >> > >> I have another question about the restart. Say split alignment is
> > >> > >> triggered. checkpoint is completed. job failed and restored from
> > the
> > >> > last
> > >> > >> checkpoint. because alignment decision is not checkpointed,
> > initially
> > >> > >> alignment won't be enforced until we get a cycle of watermark
> > >> > aggregation
> > >> > >> and propagation, right? Not saying this corner is a problem. Just
> > >> want
> > >> > to
> > >> > >> understand it more.
> > >> > >>
> > >> > >> Your understanding is correct.
> > >> > >>
> > >> > >> @Becket:
> > >> > >>
> > >> > >> 1. I think watermark alignment is sort of a general use case, so
> > >> should
> > >> > >>
> > >> > >> we
> > >> > >>
> > >> > >> just add the related methods to SourceReader directly instead of
> > >> > >> introducing the new interface of WithSplitAssignment? We can
> > provide
> > >> > >> default implementations, so backwards compatibility won't be an
> > >> issue.
> > >> > >>
> > >> > >> I don't think we can provide a default implementation. How would
> we
> > >> do
> > >> > >> that? Would it be just a no-op? Is it better than having an
> opt-in
> > >> > >> interface? The default implementation would have to be added
> > >> exclusively
> > >> > >>
> > >> > >> in
> > >> > >>
> > >> > >> a *Public* SourceReader interface. By the way notice
> > SourceReaderBase
> > >> > >> does extend from WithSplitsAlignment, so effectively all
> > >> implementations
> > >> > >>
> > >> > >> do
> > >> > >>
> > >> > >> handle the alignment case. To be honest I think it is impossible
> to
> > >> > >> implement the SourceReader interface directly by end users.
> > >> > >>
> > >> > >> 2. As you mentioned, the SplitReader interface probably also
> needs
> > >> some
> > >> > >> change to support throttling at the split granularity. Can you
> add
> > >> that
> > >> > >> interface change into the public interface section as well?
> > >> > >>
> > >> > >> It has been added from the beginning. See *AlignedSplitReader.*
> > >> > >>
> > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
> given
> > >> that
> > >> > >>
> > >> > >> it
> > >> > >>
> > >> > >> is not actually changing the split assignments? It seems
> something
> > >> like
> > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
> > accurate.
> > >> > >>
> > >> > >> The method's called *alignSplits*, not assign. Do you still
> prefer
> > a
> > >> > >> different name for that? Personally, I am open for suggestions
> > here.
> > >> > >>
> > >> > >> Best,
> > >> > >>
> > >> > >> Dawid
> > >> > >>
> > >> > >> [1]
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > >> > >>
> > >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> > >> > >>
> > >> > >> Thanks for driving the effort, Sebastion. I think the motivation
> > >> makes a
> > >> > >> lot of sense. Just a few suggestions / questions.
> > >> > >>
> > >> > >> 1. I think watermark alignment is sort of a general use case, so
> > >> should
> > >> > >>
> > >> > >> we
> > >> > >>
> > >> > >> just add the related methods to SourceReader directly instead of
> > >> > >> introducing the new interface of WithSplitAssignment? We can
> > provide
> > >> > >> default implementations, so backwards compatibility won't be an
> > >> issue.
> > >> > >>
> > >> > >> 2. As you mentioned, the SplitReader interface probably also
> needs
> > >> some
> > >> > >> change to support throttling at the split granularity. Can you
> add
> > >> that
> > >> > >> interface change into the public interface section as well?
> > >> > >>
> > >> > >> 3. Nit, can we avoid using the method name assignSplits here,
> given
> > >> that
> > >> > >>
> > >> > >> it
> > >> > >>
> > >> > >> is not actually changing the split assignments? It seems
> something
> > >> like
> > >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
> > accurate.
> > >> > >>
> > >> > >> Thanks,
> > >> > >>
> > >> > >> Jiangjie (Becket) Qin
> > >> > >>
> > >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <stevenz3wu@gmail.com
> >
> > <
> > >> > stevenz3wu@gmail.com> <st...@gmail.com> <st...@gmail.com>
> <
> > >> > >> stevenz3wu@gmail.com> <
> > >> > >> stevenz3wu@gmail.com> wrote:
> > >> > >>
> > >> > >> However, a single source operator may read data from multiple
> > >> > >>
> > >> > >> splits/partitions, e.g., multiple Kafka partitions, such that
> even
> > >> with
> > >> > >> watermark alignment the source operator may need to buffer
> > excessive
> > >> > >>
> > >> > >> amount
> > >> > >>
> > >> > >> of data if one split emits data faster than another.
> > >> > >>
> > >> > >> For this part from the motivation section, is it accurate? Let's
> > >> assume
> > >> > >>
> > >> > >> one
> > >> > >>
> > >> > >> source task consumes from 3 partitions and one of the partition
> is
> > >> > >> significantly slower. In this situation, watermark for this
> source
> > >> task
> > >> > >> won't hold back as it is reading recent data from other two Kafka
> > >> > >> partitions. As a result, it won't hold back the overall
> watermark.
> > I
> > >> > >> thought the problem is that we may have late data for this slow
> > >> > >>
> > >> > >> partition.
> > >> > >>
> > >> > >> I have another question about the restart. Say split alignment is
> > >> > >> triggered. checkpoint is completed. job failed and restored from
> > the
> > >> > last
> > >> > >> checkpoint. because alignment decision is not checkpointed,
> > initially
> > >> > >> alignment won't be enforced until we get a cycle of watermark
> > >> > aggregation
> > >> > >> and propagation, right? Not saying this corner is a problem. Just
> > >> want
> > >> > to
> > >> > >> understand it more.
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org> <
> > >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> > >> > >> thw@apache.org> <
> > >> > >> thw@apache.org> wrote:
> > >> > >>
> > >> > >> Thanks for working on this!
> > >> > >>
> > >> > >> I wonder if "supporting" split alignment in SourceReaderBase and
> > then
> > >> > >>
> > >> > >> doing
> > >> > >>
> > >> > >> nothing if the split reader does not implement AlignedSplitReader
> > >> could
> > >> > >>
> > >> > >> be
> > >> > >>
> > >> > >> misleading? Perhaps WithSplitsAlignment can instead be added to
> the
> > >> > >> specific source reader (i.e. KafkaSourceReader) to make it
> explicit
> > >> that
> > >> > >> the source actually supports it.
> > >> > >>
> > >> > >> Thanks,
> > >> > >> Thomas
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
> > knaufk@apache.org>
> > >> <
> > >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
> > >> > >> knaufk@apache.org> <
> > >> > >> knaufk@apache.org>
> > >> > >>
> > >> > >> wrote:
> > >> > >>
> > >> > >>
> > >> > >> Hi Sebastian, Hi Dawid,
> > >> > >>
> > >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
> > >> stop
> > >> > >>
> > >> > >> &
> > >> > >>
> > >> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> > >> > >>
> > >> > >> correct?
> > >> > >>
> > >> > >> +1 in general. I believe it is valuable to complete the watermark
> > >> > >>
> > >> > >> aligned
> > >> > >>
> > >> > >> story with this FLIP.
> > >> > >>
> > >> > >> Cheers,
> > >> > >>
> > >> > >> Konstantin
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> > >> > dwysakowicz@apache.org> <dw...@apache.org>
> > >> > >>
> > >> > >> wrote:
> > >> > >>
> > >> > >>
> > >> > >> To be explicit, having worked on it, I support it ;) I think we
> can
> > >> > >> start a vote thread soonish, as there are no concerns so far.
> > >> > >>
> > >> > >> Best,
> > >> > >>
> > >> > >> Dawid
> > >> > >>
> > >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > >> > >>
> > >> > >> Dear Flink developers,
> > >> > >>
> > >> > >> I would like to open a discussion on FLIP 217 [1] for an
> extension
> > >> > >>
> > >> > >> of
> > >> > >>
> > >> > >> Watermark Alignment to perform alignment also in SplitReaders. To
> > >> > >>
> > >> > >> do
> > >> > >>
> > >> > >> so,
> > >> > >>
> > >> > >> SplitReaders must be able to suspend and resume reading from
> split
> > >> > >>
> > >> > >> sources
> > >> > >>
> > >> > >> where the SourceOperator coordinates and controlls suspend and
> > >> > >>
> > >> > >> resume.
> > >> > >>
> > >> > >> To
> > >> > >>
> > >> > >> gather information about current watermarks of the SplitReaders,
> we
> > >> > >>
> > >> > >> extend
> > >> > >>
> > >> > >> the internal WatermarkOutputMulitplexer and report watermarks to
> > >> > >>
> > >> > >> the
> > >> > >>
> > >> > >> SourceOperator.
> > >> > >>
> > >> > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
> > >> > >>
> > >> > >> revised
> > >> > >>
> > >> > >> and
> > >> > >>
> > >> > >> reworked by Dawid Wysakowicz (He did most of the work.) and me.
> The
> > >> > >>
> > >> > >> changes
> > >> > >>
> > >> > >> are backwards compatible in a way that if affected components do
> > >> > >>
> > >> > >> not
> > >> > >>
> > >> > >> support split alignment the behavior is as before.
> > >> > >>
> > >> > >> Best,
> > >> > >> Sebastian
> > >> > >>
> > >> > >> [1]
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > >> > >>
> > >> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
> > >> > >>
> > >> > >> --
> > >> > >>
> > >> > >> Konstantin Knaufhttps://
> > >> twitter.com/snntrablehttps://github.com/knaufk
> > >> > >>
> > >> > >>
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pi...@gmail.com>.
Hi Becket and Dawid,

> I feel that no matter which option we choose this can not be solved
entirely in either of the options, because of the point above and because
the signature of SplitReader#pauseOrResumeSplits and
SourceReader#pauseOrResumeSplits are slightly different (one identifies
splits with splitId the other one passes the splits directly).

Yes, that's a good point in this case and for features that need to be
implemented in more than one place.

> Is there any reason for pausing reading from a split an optional feature,
> other than that this was not included in the original interface?

An additional argument in favor of making it optional is to simplify source
implementation. But on its own I'm not sure if that would be enough to
justify making this feature optional. Maybe.

> I think it would be way simpler and clearer to just let end users and
Flink
> assume all the connectors will implement this feature.

As I wrote above that would be an interesting choice to make (ease of
implementation for new users, vs system consistency). Regardless of that,
yes, for me the main argument is the API backward compatibility. But let's
clear a couple of points:
- The current proposal adding methods to the base interface with default
implementations is an OPTIONAL feature. Same as the decorative version
would be.
- Decorative version could implement "throw UnsupportedOperationException"
if user enabled watermark alignment just as well and I agree that's a
better option compared to logging a warning.

Best,
Piotrek


śr., 4 maj 2022 o 15:40 Becket Qin <be...@gmail.com> napisał(a):

> Thanks for the reply and patient discussion, Piotr and Dawid.
>
> Is there any reason for pausing reading from a split an optional feature,
> other than that this was not included in the original interface?
>
> To be honest I am really worried about the complexity of the user story
> here. Optional features like this have a high overhead. Imagine this
> feature is optional, now a user enabled watermark alignment and defined a
> few watermark groups. Would it work? Hmm, that depends on whether the
> involved Source has implmemented this feature. If the Sources are well
> documented, good luck. Otherwise end users may have to look into the code
> of the Source to see whether the feature is supported. Which is something
> they shouldn't have to do.
>
> I think it would be way simpler and clearer to just let end users and Flink
> assume all the connectors will implement this feature. After all the
> watermark group is not optinoal to the end users. If in some rare cases,
> the feature cannot be supported, a clear UnsupportedOperationException will
> be thrown to tell users to explicitly remove this Source from the watermark
> group. I don't think we should have a warning message here, as they tend to
> be ignored in many cases. If we do this, we don't even need the supportXXX
> method in the Source for this feature. In fact this is exactly how many
> interfaces works today. For example, SplitEnumerator#addSplitsBack() is not
> supported by Pravega source because it does not support partial failover.
> In that case, it simply throws an exception to trigger a global recovery.
>
> The reason we add a default implementation in this case would just for the
> sake of backwards compatibility so the old source can still compile. Sure,
> in short term, this feature might not be supported by many existing
> sources. That is OK, and it is quite visible to the source developers that
> they did not override the default impl which throws an
> UnsupportedOperationException.
>
> @Dawid,
>
> the Java doc of the SupportXXX() method in the Source would be the single
> >> source of truth regarding how to implement this feature.
> >
> >
>
> I also don't find it entirely true. Half of the classes are theoretically
> > optional and are utility classes from the point of view how the
> interfaces
> > are organized. Theoretically users do not need to use any of
> > SourceReaderBase & SplitReader. Would be weird to list their methods in
> the
> > Source interface.
>
> I think the ultimate goal of java docs is to guide users to implement the
> Source. If SourceReaderBase is the preferred way to implement a
> SourceReader, it seems worth mentioning that. Even the Java language
> documentation interfaces lists the konwn implementations [1] so people can
> leverage them. But for this particular case, if we make the feature
> non-optional, we don't even need the supportXXX() method for now.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <dw...@apache.org>
> wrote:
>
> > Hey Piotr and Becket,
> >
> > First of all, let me say I am happy with whichever option is agreed in
> the
> > discussion.
> >
> > I wanted to clarify a few points from the discussion though:
> >
> > @Becket:
> >
> > The main argument for adding the methods to the SourceReader is that
> these
> > methods are effectively NON-OPTIONAL to SourceReader impl, i.e. starting
> > from this FLIP, all the SourceReaders impl are expected to support this
> > method, although some old implementations may not have implemented this
> > feature. I think we should distinguish the new features from the optional
> > features. While the public decorative interface is a solution to the
> > optional features. We should not use it for the features that are
> > non-optional.
> >
> > I don't think that this feature is NON-OPTIONAL. Even though preferred, I
> > still think it can be simply optional.
> >
> > the Java doc of the SupportXXX() method in the Source would be the single
> > source of truth regarding how to implement this feature.
> >
> > I also don't find it entirely true. Half of the classes are theoretically
> > optional and are utility classes from the point of view how the
> interfaces
> > are organized. Theoretically users do not need to use any of
> > SourceReaderBase & SplitReader. Would be weird to list their methods in
> the
> > Source interface.
> >
> > @Piotr
> >
> > If we have all of the methods with default implementation in the base
> > interface, the API doesn't give any clue to the user which set of methods
> > are required to be implemented at the same time.
> >
> > I feel that no matter which option we choose this can not be solved
> > entirely in either of the options, because of the point above and because
> > the signature of SplitReader#pauseOrResumeSplits and
> > SourceReader#pauseOrResumeSplits are slightly different (one identifies
> > splits with splitId the other one passes the splits directly).
> >
> > Best,
> >
> > Dawid
> > On 03/05/2022 14:30, Becket Qin wrote:
> >
> > Hi Piotr,
> >
> > Thanks for the comment.
> >
> > Just to clarify, I am not against the decorative interfaces, but I do
> > think we should use them with caution. The main argument for adding the
> > methods to the SourceReader is that these methods are
> > effectively NON-OPTIONAL to SourceReader impl, i.e. starting from this
> > FLIP, all the SourceReaders impl are expected to support this
> > method, although some old implementations may not have implemented this
> > feature. I think we should distinguish the new features from the optional
> > features. While the public decorative interface is a solution to the
> > optional features. We should not use it for the features that are
> > non-optional.
> >
> > That said, this feature is optional for SplitReaders. Arguably we can
> have
> > a decorative interface for that, but for simplicity and symmetry of the
> > interface, personally I prefer just adding a new method.
> >
> > Regarding the advantages you mentioned about the decorative interfaces,
> > they would make sense if:
> > 1. The feature is optional.
> > 2. There is only one decorative interface involved for a feature.
> > Otherwise the argument that all the methods are grouped together will not
> > stand.
> >
> > Compared with that, I think the current solution works fine in all cases,
> > i.e. "having supportXXX() method in Source, and default methods /
> > decorative interfaces in base interfaces.".
> >
> > The advantages are:
> >> - clean and easy to implement base interface
> >
> > In the current approach, the Java doc of the SupportXXX() method in the
> > Source would be the single source of truth regarding how to implement
> this
> > feature. It lists the method that has to be implemented to support this
> > feature, regardless of how many classes / interfaces are involved.
> >
> > When implementing the base interface, users do not need to implement a
> > method with default implementation. If they are curious what the method
> is
> > for, the java doc of that method simply points users to the SupportXXX()
> > method in the Source. It seems not adding work to the users compared with
> > decorative interfaces, but gives much better discoverability.
> >
> > - all of the methods from a single feature are grouped in a single
> >> decorator interface, together with their dedicated java doc. It's also
> >> easier to google search for help using the decorator name
> >
> > - if an optional feature requires two methods to be implemented at once,
> >> decorator can guarantee that
> >
> > These two points are not true when multiple components and classes are
> > involved collaboratively to provide a feature. In our case, we have both
> > SourceReader and SplitReader involved. And there might be other
> interfaces
> > on the JM side involved for some future features. So the relevant methods
> > can actually be scattered over the places. That said, we may still use
> > decorative interfaces for each component, if the feature is optional,
> given
> > there is a single source of truth for the feature.
> >
> > Here I would strongly lean towards making life easier for new users,
> >> lowering the entry barrier, at the (imo) slight expense for the power
> >> users.
> >
> > I actually think the current approach is simpler, more extensible and
> more
> > general for all the users. Can you articulate a bit more on which part
> you
> > think makes users harder to understand?
> >
> >
> > There is another benefit of the decorative interfaces which is not
> > mentioned, but might be worth considering here. Usually the decorative
> > interfaces give slightly better backwards compatibility than the new
> > default method in the interfaces. That is when users are using a jar that
> > was compiled with an older version of Flink which does not have the
> default
> > method in the interfaces in question. A decorative interface may still
> > provide backwards compatibility in that case, while default method impl
> > cannot.
> >
> > I think in Flink we in general do not guarantee custom components
> compiled
> > with an older version can run with a newer version of Flink. A recompile
> > with a newer version would be required. That said, if we do care about
> > this, we can just change the "supportXXX()" method in the Source
> interface
> > to use decorative interfaces, and leave the other parts unchanged.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> >
> > On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pn...@apache.org>
> > wrote:
> >
> >> Hi,
> >>
> >> Sorry for chipping in so late, but I was OoO for the last two weeks.
> >> Regarding the interfaces, I would be actually against adding those
> methods
> >> to the base interfaces for the reasons mentioned above. Clogging the
> base
> >> interface for new users with tons of methods that they do not need, do
> not
> >> understand and do not know what to do with them. Moreover, such
> decorative
> >> interfaces are solving a problem if a feature requires two or more
> methods
> >> to be implemented at the same time. If we have all of the methods with
> >> default implementation in the base interface, the API doesn't give any
> >> clue
> >> to the user which set of methods are required to be implemented at the
> >> same
> >> time.
> >>
> >> > a) I feel the biggest drawback of decorative interfaces is which
> >> interface
> >> > they can decorate and which combinations of multiple decorative
> >> interfaces
> >> > are valid. (...)
> >> > In the future, if there is a new feature added
> >> > (e.g. sorted or pre-partitioned data aware), are we going to create
> >> another
> >> > interface of SplitReader such as SortedSplitReader or
> >> PrePartitionedAware?
> >> > Can they be combined? So I think the additional decorative interface
> >> like
> >> > withSplitsAlignment actually increases the understanding cost of users
> >> > because they have to know what decorative interfaces are there, which
> >> > interface they can decorate and which combinations of the decorative
> >> > interfaces are valid and which are not. Ideally we want to avoid that.
> >>
> >> I'm not sure if I understand how embedding default methods in the base
> >> interface is solving the problem: what can be combined or not? If there
> >> are
> >> two conflicting features, having decorative interfaces that can not be
> >> mixed together actually makes much more sense to me rather than having
> >> them
> >> all in one base class. How would you allow users to implement only one
> of
> >> those two features?
> >>
> >> To reiterate on the issue. Yes, there are drawbacks:
> >> - how can a user discover what decorators are there?
> >> - how can a user know where the decorator can be applied to?
> >>
> >> However those are drawbacks for more power users, that can be mitigated
> by
> >> the documentation. For example listing all of the decorators with
> >> detailed explanation both in the docs and in the java docs. More
> >> experienced users will be able to deal with those issues easier, as they
> >> will already have some basic understanding of Flink. Also if user has a
> >> problem that he wants to solve, he will google search a potential
> solution
> >> to his problem anyway, and while doing that he is very likely to
> discover
> >> the decorator that he needs anyway in the docs.
> >>
> >> The advantages are:
> >> - clean and easy to implement base interface
> >> - all of the methods from a single feature are grouped in a single
> >> decorator interface, together with their dedicated java doc. It's also
> >> easier to google search for help using the decorator name
> >> - if an optional feature requires two methods to be implemented at once,
> >> decorator can guarantee that
> >>
> >> Here I would strongly lean towards making life easier for new users,
> >> lowering the entry barrier, at the (imo) slight expense for the power
> >> users.
> >>
> >> Best,
> >> Piotrek
> >>
> >>
> >> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com> napisał(a):
> >>
> >> > Thanks for the reply Sebastian and Dawid,
> >> >
> >> > I think Sebastion has a good summary. This is a really helpful
> >> discussion.
> >> >
> >> > Thinking a bit more, I feel that it might still be better to add the
> >> > supportsXXX() method in the Source rather than SourceReader.
> >> >
> >> > Generally speaking, what we are trying to do here is to let the Flink
> >> > framework know what the Source is capable of. In this FLIP, it happens
> >> to
> >> > be the capability that only involves SourceReader. But in the future,
> >> it is
> >> > possible that another functionality involves both the SplitEnumerator
> >> and
> >> > SourceReader. In that case, following the current approach, we should
> >> put
> >> > the "supportsXXX()" method in both SplitEnumerator and SourceReader.
> >> > Because if we only put this in the SourceReader, then the JM would
> have
> >> to
> >> > create a SourceReader in order to know whether this feature is
> >> supported,
> >> > which is a little ugly. But if we put the "supportsXXX()" method in
> the
> >> > Source, we will break the "symmetric" design because this FLIP chose a
> >> > different way.
> >> >
> >> > This is also why I think supportsXXX() method seems a good thing to
> >> have,
> >> > because when there are a few interfaces / methods that are expected to
> >> be
> >> > implemented at the same time in order to deliver a feature, it is
> always
> >> > good to have a single source of truth to tell the framework what to
> do,
> >> so
> >> > the framework can do consistent things in different parts.
> >> >
> >> > @Sebastian Mattheis <se...@ververica.com>
> >> >
> >> > Regarding interface flavor b), i.e. AlignedSourceReader +
> >> > AlignedSplitReader, what I feel awkward about is that we are
> essentially
> >> > expecting almost all the SourceReader implementations to extend
> >> > SourceReaderBase, which effectively makes the SourceReader interface
> >> > without the pausing support useless. So this indicates that public
> >> > decorative interfaces (or sub-interfaces for the same purpose) only
> >> > make sense if the original interface is also expected to be used.
> >> > Otherwise, it seems makes more sense to add the method to the original
> >> > interface itself.
> >> >
> >> > Cheers,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> >> dwysakowicz@apache.org>
> >> > wrote:
> >> >
> >> > > Thanks @Sebastian for the nice summary.
> >> > >
> >> > > I think most of your points aligned with the suggestions I made to
> the
> >> > > FLIP, while you were writing your reply (I believe we hit enter
> >> nearly at
> >> > > the same time ;) )
> >> > >
> >> > > Two points after we synced offline
> >> > >
> >> > > 1. I changed also the supportsWatermarksSplitAlignment to
> >> > > supportsPausingSplits to express the general capability of pausing.
> >> > >
> >> > > 2. As for if we should PausingSourceReader/PausingSplitReader
> (option
> >> b)
> >> > > or if we should just add the methods (option c), I suggest to simply
> >> add
> >> > > the two methods as I felt this is much preferred approach Becket,
> >> which
> >> > > others do not object. Unless there is an opposition let's go with
> this
> >> > > option c.
> >> > >
> >> > > Best,
> >> > >
> >> > > Dawid
> >> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> >> > >
> >> > > Hi folks,
> >> > >
> >> > > Sorry for being a bit silent. Many thanks for all the input and
> >> > > suggestions. As I'm a bit new, I needed some time to catch up and
> >> > structure
> >> > > (for myself) the discussion and I wanted to find a way to structure
> >> the
> >> > > conclusions. (Also because I had the feeling that some concerns got
> >> lost
> >> > in
> >> > > the discussion.) This is my attempt and please correct me if
> >> something is
> >> > > wrong or misunderstood. I tried to collect and assemble the
> opinions,
> >> > > suggestions, and conclusions (to the best of my knowledge):
> >> > >
> >> > > # Top A: Should split alignment (pause/resume behavior) be a general
> >> > > capability?
> >> > >
> >> > > I personally don't see any reason no to have it a general capability
> >> > > because for the alignSplit method it is actually independent of the
> >> > > watermarks. If we agree here to have it a general capability, we
> >> should
> >> > > also agree on the right wording. Does "alignSplits(splitsToResume,
> >> > > splitsToPause)" refer to what is then actually meant? (I see it as
> >> okay.
> >> > I
> >> > > don't have any better idea whilst Arvid suggested
> >> "pauseOrResumeSplits".)
> >> > >
> >> > > # Top B: Should it be possible do enable/disable split alignment?
> >> > >
> >> > > I would personally not disable the split alignment on the source
> >> reader
> >> > > side because if split alignment is used for some other use case (see
> >> A)
> >> > it
> >> > > could have nasty side effects on other/future use cases. Instead, I
> >> would
> >> > > disable "watermark split alignment" where I think it should disable
> >> the
> >> > > watermark-dependent trigger for split alignment.
> >> > >
> >> > > # Top C: Should we add a supportsX method?
> >> > >
> >> > > I find it difficult to define the scope of a supportsX method w.r.t.
> >> to
> >> > > the following questions: a) Where is it used? and b) What is the
> >> expected
> >> > > output? To b), it's not straight-forward to provide a meaningful
> >> output,
> >> > > e.g., if SourceReader supports split alignment but SplitReader not.
> >> This
> >> > is
> >> > > because with the current implementation, we can determine whether
> >> split
> >> > > alignment is fully supported only during runtime and specifically
> >> > actually
> >> > > only when calling alignSplits down the call hierarchy up to the
> actual
> >> > > SplitReaders.
> >> > >
> >> > > Therefore, I would suggest to either raise an error or warning if
> the
> >> > > alignment is called but not supported at some point. I know we
> should
> >> > > carefully think about when this could be the case because we don't
> >> want
> >> > to
> >> > > flood anybody with such warnings. However, warnings could be an
> >> indicator
> >> > > for the user that for watermark split alignment use case split
> >> reading is
> >> > > imbalanced with the conclusion to either disable the trigger for
> >> > watermark
> >> > > split alignment (see Top B) or to use/implement a source and reader
> >> that
> >> > > fully supports split alignment.
> >> > >
> >> > > # Top D: How to design interfaces?
> >> > >
> >> > > Thanks for structuring the discussion with the the various
> >> possibilities
> >> > > (a-d). From the discussion and emails, I would like to summarize the
> >> > > following requirements:
> >> > > - Interfaces should be consistent ("symmetric"), i.e., similar
> >> semantics
> >> > > should have similar interfaces with similar usage.
> >> > > - Make explicit which implementations implement interfaces/support
> >> > > behavior.
> >> > > - Make clear what are default implementations and how to implement
> >> > > interfaces with desired behavior.
> >> > >
> >> > > This is a simplified view of the relations between relevant classes
> of
> >> > the
> >> > > PoC implementation:
> >> > >
> >> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- .. <|--
> >> > > MySourceReader
> >> > >
> >> > > MySourceReader <>-- SplitFetcherManager (Internal) <>-- SplitFetcher
> >> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
> >> > >
> >> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> >> > >
> >> > > Note that SourceReaderBase and SplitFetcherManager implement most of
> >> the
> >> > > "logic" for split alignment just because we wanted to implement
> split
> >> > > alignment and wanted it to be available as kind of a default. As a
> >> > > consequence, we have a "default implementation" for SourceReader
> that
> >> > > implements the actual logic for split alignment. For that reason, I
> >> find
> >> > it
> >> > > very confusing to have a NOOP default implementation in the
> interface
> >> for
> >> > > the SourceReader. As a consequence, interface strategy c) is
> difficult
> >> > > because this would require NOOP default implementations in the
> public
> >> > > interfaces of SourceReader and SplitReader. This is the same for
> >> strategy
> >> > > d) because it would require NOOP default implementation in the
> >> > > SourceReader. Further, as Dawid described method signatures of
> >> alignSplit
> >> > > for SourceReader and SplitReader differ and it would be extremely
> >> > difficult
> >> > > to make the signatures the same (with even potential performance
> >> impact
> >> > > because of additional loop-ups of split ids). Therefore, having a
> >> > symmetric
> >> > > decorative interface as of strategy a) is actually not possible and
> >> > having
> >> > > two decorative interfaces with different method signatures is
> >> confusing.
> >> > My
> >> > > conclusion is that we are best with strategy b) which means to have
> >> > > specializing sub-interfaces that inherit from the parent interface:
> >> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
> >> > AlignedSplitReader
> >> > > With this option, I'm not 100% sure what the implications are and if
> >> this
> >> > > could get nasty. I would suggest that Dawid and I just try to
> >> implement
> >> > and
> >> > > see if we like it. :)
> >> > >
> >> > > # Summary
> >> > >
> >> > > In conclusion, please let me know your perspectives. Please correct
> >> me,
> >> > if
> >> > > something is wrong or if I misunderstood something. My perspective
> >> would
> >> > be:
> >> > >
> >> > > Top A: Yes
> >> > > Top B: Yes (but disable watermark trigger for split alignment)
> >> > > Top C: No
> >> > > Top D: b)
> >> > >
> >> > > Best,
> >> > > Sebastian
> >> > >
> >> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> >> dwysakowicz@apache.org
> >> > >
> >> > > wrote:
> >> > >
> >> > >> @Arvid:
> >> > >>
> >> > >> While I also like Becket's capability approach, I fear that it
> >> doesn't
> >> > work
> >> > >> for this particular use case: Sources can always be aligned
> >> cross-task
> >> > and
> >> > >> this is just about intra-task alignment. So it's plausible to put
> >> > sources
> >> > >> into an alignment group even though they do not use any of the
> >> presented
> >> > >> API of FLIP-217. They should just issue a warning, if they handle
> >> > multiple
> >> > >> splits (see motivation section).
> >> > >>
> >> > >> Yes, but the "supportXXX" method would be for telling if it
> supports
> >> > that
> >> > >> intra-task alignment. Cross-task alignment would always be
> supported.
> >> > >>
> >> > >> I updated interfaces to what I believe to be closest to a consensus
> >> > >> between all participants. Do you mind taking a look?
> >> > >>
> >> > >> @Sebastian Do you mind addressing the nits?
> >> > >>
> >> > >> Best,
> >> > >>
> >> > >> Dawid
> >> > >>
> >> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> >> > >>
> >> > >> Thanks for pushing this effort.
> >> > >>
> >> > >> I'd actually be in favor of 1b). I fully agree that decorator
> >> interfaces
> >> > >> should be avoided but I'm also not a big fan of overloading the
> base
> >> > >> interfaces (they are hard to implement as is). The usual feedback
> to
> >> > >> Source-related interfaces are always that they are overwhelming and
> >> too
> >> > >> hard to implement. However, I'd also not oppose 1c) as scattered
> >> > interfaces
> >> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> >> > >> While I also like Becket's capability approach, I fear that it
> >> doesn't
> >> > work
> >> > >> for this particular use case: Sources can always be aligned
> >> cross-task
> >> > and
> >> > >> this is just about intra-task alignment. So it's plausible to put
> >> > sources
> >> > >> into an alignment group even though they do not use any of the
> >> presented
> >> > >> API of FLIP-217. They should just issue a warning, if they handle
> >> > multiple
> >> > >> splits (see motivation section).
> >> > >>
> >> > >> I think renaming alignSplits to facilitate future use cases makes
> >> sense
> >> > but
> >> > >> then all interfaces (if 1c) is chosen) should be adjusted
> >> accordingly.
> >> > >> AlignedSourceReader could be PausingSourceReader and I'd go for
> >> > >> pauseOrResumeSplits (Becket's proposal afaik). We could also split
> it
> >> > into
> >> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may allow
> >> Sources
> >> > to
> >> > >> just use 1 instead of 2 library calls (as written in the Javadoc),
> >> both
> >> > >> Kafka and Pulsar can't use it and I'm not sure if there is a system
> >> that
> >> > >> can.
> >> > >>
> >> > >> Some nit for the FLIP:
> >> > >> - Please replace "stop" with "pause".
> >> > >> - Not sure if it's worth it in the capability section: Sources that
> >> > adopt
> >> > >> this interface cannot be used in earlier versions. So it feels like
> >> we
> >> > are
> >> > >> only forward compatible (old sources can be used after the change);
> >> but
> >> > I
> >> > >> guess this holds for any API addition.
> >> > >> - You might want to add what happens when all splits are paused.
> >> > >> - You may want to describe how the 3 flavors of SourceReaderBase
> >> > interact
> >> > >> with the interface.
> >> > >> - I'm not sure if it makes sense to include Kafka and Pulsar in the
> >> > FLIP.
> >> > >> For me, this is rather immediate follow-up work. (could be in the
> >> same
> >> > >> umbrella ticket)
> >> > >>
> >> > >> Best,
> >> > >>
> >> > >> Arvid
> >> > >>
> >> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> >> > dwysakowicz@apache.org> <dw...@apache.org>
> >> > >> wrote:
> >> > >>
> >> > >>
> >> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> >> along
> >> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment",
> or
> >> > >> b) "MySourceReader implements AlignedSourceReader" and
> "MySplitReader
> >> > >> implements AlignedSplitReader", or
> >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> >> > implements
> >> > >> SplitReader".
> >> > >>
> >> > >> I think the latest proposal according to Dawid would be:
> >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> >> > implements
> >> > >> AlignedSplitReader".
> >> > >> I am fine with this API, although personally speaking I think it is
> >> > simpler
> >> > >> to just add a new method to the split reader with default impl.
> >> > >>
> >> > >>
> >> > >> I think that is a good idea to have it aligned as much as possible.
> >> I'd
> >> > be
> >> > >> +1 for your option c). We can merge AlignedSplitReader with
> >> > SplitReader. We
> >> > >> will update the FLIP shortly.
> >> > >>
> >> > >> Best,
> >> > >>
> >> > >> Dawid
> >> > >>
> >> > >> On 25/04/2022 12:43, Becket Qin wrote:
> >> > >>
> >> > >> Thanks for the comment, Jark.
> >> > >>
> >> > >> 3. Interface/Method Name.
> >> > >>
> >> > >> Can the interface be used to align other things in the future? For
> >> > example,
> >> > >> align read speed, I have
> >> > >> seen users requesting global rate limits. This feature may also
> need
> >> an
> >> > >> interface like this.
> >> > >> If we don't plan to extend this interface to support align other
> >> > things, I
> >> > >> suggest explicitly declaring
> >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> >> instead
> >> > of
> >> > >> `alignSplits`.
> >> > >>
> >> > >> This is a good point. Naming wise, it would usually be more
> >> extensible
> >> > to
> >> > >> just describe what the method actually does, instead of assuming
> the
> >> > >> purpose of doing this. For example, in this case,
> >> pauseOrResumeSplits()
> >> > >> would be more extensible because this can be used for any kind of
> >> flow
> >> > >> control, be it watermark alignment or simple rate limiting.
> >> > >>
> >> > >> 4. Interface or Method.
> >> > >>
> >> > >> I don't have a strong opinion on this. I think they have their own
> >> > >> advantages.
> >> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> >> > >> (SupportsXxxx) for TableSource/TableSink,
> >> > >> and I prefer Interfaces rather than methods in this case. When you
> >> have
> >> > a
> >> > >> bunch of abilities and each ability
> >> > >> has more than one method, Interfaces can help to organize them and
> >> make
> >> > >> users clear which methods
> >> > >> need to implement when you want to have an ability.
> >> > >>
> >> > >> I am OK with decorative interfaces if this is a general design
> >> pattern
> >> > in
> >> > >> the other components in Flink. But it looks like the current API
> >> > proposal
> >> > >> is not symmetric.
> >> > >>
> >> > >> The current proposal is essentially "MySourceReader implements
> >> > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
> >> implements
> >> > >> AlignedSplitsReader".
> >> > >>
> >> > >> Should we make the API symmetric? I'd consider any one of the
> >> following
> >> > as
> >> > >> symmetric.
> >> > >>
> >> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> >> along
> >> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment",
> or
> >> > >> b) "MySourceReader implements AlignedSourceReader" and
> "MySplitReader
> >> > >> implements AlignedSplitReader", or
> >> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> >> > implements
> >> > >> SplitReader".
> >> > >>
> >> > >> I think the latest proposal according to Dawid would be:
> >> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> >> > implements
> >> > >> AlignedSplitReader".
> >> > >> I am fine with this API, although personally speaking I think it is
> >> > simpler
> >> > >> to just add a new method to the split reader with default impl.
> >> > >>
> >> > >> @Dawid Wysakowicz <dw...@apache.org> <dwysakowicz@apache.org
> >
> >> <
> >> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the
> reply.
> >> > >>
> >> > >> Having said that, as I don't have a preference and I agree most of
> >> the
> >> > >>
> >> > >> sources will support the alignment I am fine following your
> >> suggestion
> >> > to
> >> > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
> >> but
> >> > >> would put the "supportsXXX" there, not in the Source to keep the
> two
> >> > >> methods together.
> >> > >>
> >> > >> One benefit of having the "supportsXXX" in Source is that this
> allows
> >> > some
> >> > >> compile time check. For example, if a user enabled watermark
> >> alignment
> >> > >> while it is not supported by the Source, an exception can be thrown
> >> at
> >> > >> compile time. It seems in general useful. That said, I agree that
> API
> >> > >> cleanliness wise it is better to put the two methods together.
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> Jiangjie (Becket) Qin
> >> > >>
> >> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
> >> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
> >> > >>
> >> > >>
> >> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
> >> > >> I don't have other concerns then.
> >> > >>
> >> > >> Best,
> >> > >> Jark
> >> > >>
> >> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> >> dwysakowicz@apache.org>
> >> > <dw...@apache.org> <dw...@apache.org> <
> >> dwysakowicz@apache.org>
> >> > >> wrote:
> >> > >>
> >> > >>
> >> > >> @Jark:
> >> > >>
> >> > >> 1. Will the framework always align with watermarks when the source
> >> > >> implements the interface?
> >> > >> I'm afraid not every case needs watermark alignment even if Kafka
> >> > >> implements the interface,
> >> > >> and this will affect the throughput somehow. I agree with Becket
> >> > >> we may need a
> >> > >> `supportSplitsAlignment()` method for users to configure the source
> >> to
> >> > >> enable/disable the alignment.
> >> > >>
> >> > >> 2. How does the framework calculate maxDesiredWatermark?
> >> > >> I think the algorithm of maxDesiredWatermark will greatly affect
> >> > >>
> >> > >> throughput
> >> > >>
> >> > >> if the reader is constantly
> >> > >>  switching between pause and resume. Can users configure the
> >> alignment
> >> > >> offset?
> >> > >>
> >> > >>
> >> > >> This is covered in the previous FLIP[1] which has been already
> >> > >>
> >> > >> implemented
> >> > >>
> >> > >> in 1.15. In short, it must be enabled with the watermark strategy
> >> which
> >> > >> also configures drift and update interval.
> >> > >>
> >> > >> If we don't plan to extend this interface to support align other
> >> things,
> >> > >>
> >> > >> I
> >> > >>
> >> > >> suggest explicitly declaring
> >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> >> instead
> >> > of
> >> > >> `alignSplits`.
> >> > >>
> >> > >>
> >> > >> Sure let's rename it.
> >> > >>
> >> > >> @Becket:
> >> > >>
> >> > >> I understand your point. On the other hand putting all methods,
> even
> >> > with
> >> > >> "supportsXXX" methods for enabling certain features, makes the
> entry
> >> > >> threshold for writing a new source higher. Instead of focusing on
> the
> >> > >>
> >> > >> basic
> >> > >>
> >> > >> and required properties of the Source, the person implementing a
> >> source
> >> > >> must bother with and need to figure out what all of the extra
> >> features
> >> > >>
> >> > >> are
> >> > >>
> >> > >> about and how to deal with them. It makes it also harder to
> organize
> >> > >> methods in coupled groups as Jark said.
> >> > >>
> >> > >> Having said that, as I don't have a preference and I agree most of
> >> the
> >> > >> sources will support the alignment I am fine following your
> >> suggestion
> >> > to
> >> > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
> >> but
> >> > >> would put the "supportsXXX" there, not in the Source to keep the
> two
> >> > >> methods together.
> >> > >>
> >> > >> Lastly, I agree it is really unfortunate the "alignSplits" methods
> >> > differ
> >> > >> slightly for SourceReader and SpitReader. The reason for that is
> >> > >> SourceReaderBase deals only with SplitIds, whereas SplitReader
> needs
> >> the
> >> > >> actual splits to pause them. I found the discrepancy acceptable for
> >> the
> >> > >> sake of simplifying changes significantly, especially as they would
> >> > >>
> >> > >> highly
> >> > >>
> >> > >> likely impact performance as we would have to perform additional
> >> > lookups.
> >> > >> Moreover the SplitReader is a secondary interface.
> >> > >>
> >> > >> Best,
> >> > >>
> >> > >> Dawid
> >> > >>
> >> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> >> > >>
> >> > >> On 24/04/2022 17:15, Jark Wu wrote:
> >> > >>
> >> > >> Thanks for the effort, Dawid and Sebastian!
> >> > >>
> >> > >> I just have some minor questions (maybe I missed something).
> >> > >>
> >> > >> 1. Will the framework always align with watermarks when the source
> >> > >> implements the interface?
> >> > >> I'm afraid not every case needs watermark alignment even if Kafka
> >> > >> implements the interface,
> >> > >> and this will affect the throughput somehow. I agree with Becket
> >> > >> we may need a
> >> > >> `supportSplitsAlignment()` method for users to configure the source
> >> to
> >> > >> enable/disable the alignment.
> >> > >>
> >> > >> 2. How does the framework calculate maxDesiredWatermark?
> >> > >> I think the algorithm of maxDesiredWatermark will greatly affect
> >> > >>
> >> > >> throughput
> >> > >>
> >> > >> if the reader is constantly
> >> > >>  switching between pause and resume. Can users configure the
> >> alignment
> >> > >> offset?
> >> > >>
> >> > >> 3. Interface/Method Name.
> >> > >> Can the interface be used to align other things in the future? For
> >> > >>
> >> > >> example,
> >> > >>
> >> > >> align read speed, I have
> >> > >> seen users requesting global rate limits. This feature may also
> need
> >> an
> >> > >> interface like this.
> >> > >> If we don't plan to extend this interface to support align other
> >> things,
> >> > >>
> >> > >> I
> >> > >>
> >> > >> suggest explicitly declaring
> >> > >> the purpose of the methods, such as `alignWatermarksForSplits`
> >> instead
> >> > of
> >> > >> `alignSplits`.
> >> > >>
> >> > >> 4. Interface or Method.
> >> > >> I don't have a strong opinion on this. I think they have their own
> >> > >> advantages.
> >> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> >> > >> (SupportsXxxx) for TableSource/TableSink,
> >> > >> and I prefer Interfaces rather than methods in this case. When you
> >> have
> >> > a
> >> > >> bunch of abilities and each ability
> >> > >> has more than one method, Interfaces can help to organize them and
> >> make
> >> > >> users clear which methods
> >> > >> need to implement when you want to have an ability.
> >> > >>
> >> > >>
> >> > >> Best,
> >> > >> Jark
> >> > >>
> >> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <be...@gmail.com> <
> >> > becket.qin@gmail.com> <be...@gmail.com> <be...@gmail.com> <
> >> > >> becket.qin@gmail.com> wrote:
> >> > >>
> >> > >> Hi Dawid,
> >> > >>
> >> > >> Thanks for the explanation. Apologies that I somehow misread a
> bunch
> >> of
> >> > >> "align" and thought they were "assign".
> >> > >>
> >> > >> Regarding 1, by default implementation, I was thinking of the
> default
> >> > >>
> >> > >> no-op
> >> > >>
> >> > >> implementation. I am a little worried about the proliferation of
> >> > >>
> >> > >> decorative
> >> > >>
> >> > >> interfaces. I think the most important thing about interfaces is
> that
> >> > >>
> >> > >> they
> >> > >>
> >> > >> are easy to understand. In this case, I prefer adding new method to
> >> the
> >> > >> existing interface for the following reasons:
> >> > >>
> >> > >> a) I feel the biggest drawback of decorative interfaces is which
> >> > >>
> >> > >> interface
> >> > >>
> >> > >> they can decorate and which combinations of multiple decorative
> >> > >>
> >> > >> interfaces
> >> > >>
> >> > >> are valid. In the current FLIP, the withSplitsAlignment interface
> is
> >> > only
> >> > >> applicable to the SourceReader which means it can't decorate any
> >> other
> >> > >> interface. From an interface design perspective, a natural question
> >> is
> >> > >>
> >> > >> why
> >> > >>
> >> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"? And it
> is
> >> > also
> >> > >> natural to assume that a split reader implementing both SplitReader
> >> and
> >> > >> WithSplitAlignment would work, because a source reader implementing
> >> > >> SourceReader and withSplitsAlignment works. So why isn't there an
> >> > >>
> >> > >> interface
> >> > >>
> >> > >> of AlignedSourceReader? In the future, if there is a new feature
> >> added
> >> > >> (e.g. sorted or pre-partitioned data aware), are we going to create
> >> > >>
> >> > >> another
> >> > >>
> >> > >> interface of SplitReader such as SortedSplitReader or
> >> > >>
> >> > >> PrePartitionedAware?
> >> > >>
> >> > >> Can they be combined? So I think the additional decorative
> interface
> >> > like
> >> > >> withSplitsAlignment actually increases the understanding cost of
> >> users
> >> > >> because they have to know what decorative interfaces are there,
> which
> >> > >> interface they can decorate and which combinations of the
> decorative
> >> > >> interfaces are valid and which are not. Ideally we want to avoid
> >> that.
> >> > To
> >> > >> be clear, I am not opposing having an interface of
> >> withSplitsAlignment,
> >> > >>
> >> > >> it
> >> > >>
> >> > >> is completely OK to have it as an internal interface and let
> >> > SourceReader
> >> > >> and SplitReader both extend it.
> >> > >>
> >> > >> b) Adding a new method to the SourceReader with a default
> >> implementation
> >> > >>
> >> > >> of
> >> > >>
> >> > >> no-op would help avoid logic branching in the source logic,
> >> especially
> >> > >> given that we agree that the vast majority of the SourceReader
> >> > >> implementations, if not all, would just extend from the
> >> > SourceReaderBase.
> >> > >> That means adding a new method to the interface would effectively
> >> give
> >> > >>
> >> > >> the
> >> > >>
> >> > >> same user experience, but simpler.
> >> > >>
> >> > >> c) A related design principle that may be worth discussing is how
> do
> >> we
> >> > >>
> >> > >> let
> >> > >>
> >> > >> the Source implementations tell Flink what capability is supported
> >> and
> >> > >>
> >> > >> what
> >> > >>
> >> > >> is not. Personally speaking I feel the most intuitive place to me
> is
> >> in
> >> > >>
> >> > >> the
> >> > >>
> >> > >> Source itself, because that is the entrance of the entire Source
> >> > >>
> >> > >> connector
> >> > >>
> >> > >> logic.
> >> > >>
> >> > >> Based on the above thoughts, I am wondering if the following
> >> interface
> >> > >> would be easier to understand by the users.
> >> > >>
> >> > >> - Change "withSplitsAlignment" to internal interface, let both
> >> > >>
> >> > >> SourceReader
> >> > >>
> >> > >> and SplitReader extend from it, with a default no-op
> implementation.
> >> > >> - Add a new method "boolean supportSplitsAlignment()" to the Source
> >> > >> interface, with a default implementation returning false. Sources
> >> that
> >> > >>
> >> > >> have
> >> > >>
> >> > >> implemented the alignment logic can change this to return true, and
> >> > >> override the alignSplits() methods in the SourceReader /
> SplitReader
> >> if
> >> > >> needed.
> >> > >> - In the future, if a new optional feature is going to be added to
> >> the
> >> > >> Source, and that feature requires the awareness from Flink, we can
> >> add
> >> > >>
> >> > >> more
> >> > >>
> >> > >> such methods to the Source.
> >> > >>
> >> > >> What do you think?
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> Jiangjie (Becket) Qin
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> >> > dwysakowicz@apache.org> <dw...@apache.org> <
> >> dwysakowicz@apache.org>
> >> > <dw...@apache.org>
> >> > >> <dw...@apache.org> <dw...@apache.org> <
> >> > dwysakowicz@apache.org> <dw...@apache.org>
> >> > >>
> >> > >> wrote:
> >> > >>
> >> > >>
> >> > >> @Konstantin:
> >> > >>
> >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
> >> stop &
> >> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> >> correct?
> >> > >>
> >> > >> Correct, as far as I know though, those are the only sources which
> >> > >>
> >> > >> consume
> >> > >>
> >> > >> concurrently from multiple splits and thus alignment applies.
> >> > >>
> >> > >> @Thomas:
> >> > >>
> >> > >> I wonder if "supporting" split alignment in SourceReaderBase and
> then
> >> > >>
> >> > >> doing
> >> > >>
> >> > >> nothing if the split reader does not implement AlignedSplitReader
> >> could
> >> > >>
> >> > >> be
> >> > >>
> >> > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> >> > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
> >> that
> >> > >> the source actually supports it.
> >> > >>
> >> > >> I understand your concern. Hmm, I think we could actually do that.
> >> Given
> >> > >> the actual implementation of the SourceReaderBase#alignSplits is
> >> rather
> >> > >> short (just a forward to the corresponding method of SplitFetcher),
> >> we
> >> > >> could reimplement it in the actual source implementations. This
> >> solution
> >> > >> has the downside though. Authors of new sources would have to do
> two
> >> > >> things: extend from AlignedSplitReader and implement
> >> > >>
> >> > >> WithSplitsAssignment,
> >> > >>
> >> > >> instead of just extending AlignedSplitReader. I would be fine with
> >> such
> >> > a
> >> > >> tradeoff though. What others think?
> >> > >>
> >> > >> @Steven:
> >> > >>
> >> > >> For this part from the motivation section, is it accurate? Let's
> >> assume
> >> > >>
> >> > >> one
> >> > >>
> >> > >> source task consumes from 3 partitions and one of the partition is
> >> > >> significantly slower. In this situation, watermark for this source
> >> task
> >> > >> won't hold back as it is reading recent data from other two Kafka
> >> > >> partitions. As a result, it won't hold back the overall watermark.
> I
> >> > >> thought the problem is that we may have late data for this slow
> >> > >>
> >> > >> partition.
> >> > >>
> >> > >> It will hold back the watermark. Watermark of an operator is the
> >> minimum
> >> > >> of watermarks of all splits[1]
> >> > >>
> >> > >> I have another question about the restart. Say split alignment is
> >> > >> triggered. checkpoint is completed. job failed and restored from
> the
> >> > last
> >> > >> checkpoint. because alignment decision is not checkpointed,
> initially
> >> > >> alignment won't be enforced until we get a cycle of watermark
> >> > aggregation
> >> > >> and propagation, right? Not saying this corner is a problem. Just
> >> want
> >> > to
> >> > >> understand it more.
> >> > >>
> >> > >> Your understanding is correct.
> >> > >>
> >> > >> @Becket:
> >> > >>
> >> > >> 1. I think watermark alignment is sort of a general use case, so
> >> should
> >> > >>
> >> > >> we
> >> > >>
> >> > >> just add the related methods to SourceReader directly instead of
> >> > >> introducing the new interface of WithSplitAssignment? We can
> provide
> >> > >> default implementations, so backwards compatibility won't be an
> >> issue.
> >> > >>
> >> > >> I don't think we can provide a default implementation. How would we
> >> do
> >> > >> that? Would it be just a no-op? Is it better than having an opt-in
> >> > >> interface? The default implementation would have to be added
> >> exclusively
> >> > >>
> >> > >> in
> >> > >>
> >> > >> a *Public* SourceReader interface. By the way notice
> SourceReaderBase
> >> > >> does extend from WithSplitsAlignment, so effectively all
> >> implementations
> >> > >>
> >> > >> do
> >> > >>
> >> > >> handle the alignment case. To be honest I think it is impossible to
> >> > >> implement the SourceReader interface directly by end users.
> >> > >>
> >> > >> 2. As you mentioned, the SplitReader interface probably also needs
> >> some
> >> > >> change to support throttling at the split granularity. Can you add
> >> that
> >> > >> interface change into the public interface section as well?
> >> > >>
> >> > >> It has been added from the beginning. See *AlignedSplitReader.*
> >> > >>
> >> > >> 3. Nit, can we avoid using the method name assignSplits here, given
> >> that
> >> > >>
> >> > >> it
> >> > >>
> >> > >> is not actually changing the split assignments? It seems something
> >> like
> >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
> accurate.
> >> > >>
> >> > >> The method's called *alignSplits*, not assign. Do you still prefer
> a
> >> > >> different name for that? Personally, I am open for suggestions
> here.
> >> > >>
> >> > >> Best,
> >> > >>
> >> > >> Dawid
> >> > >>
> >> > >> [1]
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> >> > >>
> >> > >> On 22/04/2022 05:59, Becket Qin wrote:
> >> > >>
> >> > >> Thanks for driving the effort, Sebastion. I think the motivation
> >> makes a
> >> > >> lot of sense. Just a few suggestions / questions.
> >> > >>
> >> > >> 1. I think watermark alignment is sort of a general use case, so
> >> should
> >> > >>
> >> > >> we
> >> > >>
> >> > >> just add the related methods to SourceReader directly instead of
> >> > >> introducing the new interface of WithSplitAssignment? We can
> provide
> >> > >> default implementations, so backwards compatibility won't be an
> >> issue.
> >> > >>
> >> > >> 2. As you mentioned, the SplitReader interface probably also needs
> >> some
> >> > >> change to support throttling at the split granularity. Can you add
> >> that
> >> > >> interface change into the public interface section as well?
> >> > >>
> >> > >> 3. Nit, can we avoid using the method name assignSplits here, given
> >> that
> >> > >>
> >> > >> it
> >> > >>
> >> > >> is not actually changing the split assignments? It seems something
> >> like
> >> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
> accurate.
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> Jiangjie (Becket) Qin
> >> > >>
> >> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <st...@gmail.com>
> <
> >> > stevenz3wu@gmail.com> <st...@gmail.com> <st...@gmail.com> <
> >> > >> stevenz3wu@gmail.com> <
> >> > >> stevenz3wu@gmail.com> wrote:
> >> > >>
> >> > >> However, a single source operator may read data from multiple
> >> > >>
> >> > >> splits/partitions, e.g., multiple Kafka partitions, such that even
> >> with
> >> > >> watermark alignment the source operator may need to buffer
> excessive
> >> > >>
> >> > >> amount
> >> > >>
> >> > >> of data if one split emits data faster than another.
> >> > >>
> >> > >> For this part from the motivation section, is it accurate? Let's
> >> assume
> >> > >>
> >> > >> one
> >> > >>
> >> > >> source task consumes from 3 partitions and one of the partition is
> >> > >> significantly slower. In this situation, watermark for this source
> >> task
> >> > >> won't hold back as it is reading recent data from other two Kafka
> >> > >> partitions. As a result, it won't hold back the overall watermark.
> I
> >> > >> thought the problem is that we may have late data for this slow
> >> > >>
> >> > >> partition.
> >> > >>
> >> > >> I have another question about the restart. Say split alignment is
> >> > >> triggered. checkpoint is completed. job failed and restored from
> the
> >> > last
> >> > >> checkpoint. because alignment decision is not checkpointed,
> initially
> >> > >> alignment won't be enforced until we get a cycle of watermark
> >> > aggregation
> >> > >> and propagation, right? Not saying this corner is a problem. Just
> >> want
> >> > to
> >> > >> understand it more.
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org> <
> >> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> >> > >> thw@apache.org> <
> >> > >> thw@apache.org> wrote:
> >> > >>
> >> > >> Thanks for working on this!
> >> > >>
> >> > >> I wonder if "supporting" split alignment in SourceReaderBase and
> then
> >> > >>
> >> > >> doing
> >> > >>
> >> > >> nothing if the split reader does not implement AlignedSplitReader
> >> could
> >> > >>
> >> > >> be
> >> > >>
> >> > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> >> > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
> >> that
> >> > >> the source actually supports it.
> >> > >>
> >> > >> Thanks,
> >> > >> Thomas
> >> > >>
> >> > >>
> >> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <
> knaufk@apache.org>
> >> <
> >> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
> >> > >> knaufk@apache.org> <
> >> > >> knaufk@apache.org>
> >> > >>
> >> > >> wrote:
> >> > >>
> >> > >>
> >> > >> Hi Sebastian, Hi Dawid,
> >> > >>
> >> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
> >> stop
> >> > >>
> >> > >> &
> >> > >>
> >> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> >> > >>
> >> > >> correct?
> >> > >>
> >> > >> +1 in general. I believe it is valuable to complete the watermark
> >> > >>
> >> > >> aligned
> >> > >>
> >> > >> story with this FLIP.
> >> > >>
> >> > >> Cheers,
> >> > >>
> >> > >> Konstantin
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> >> > dwysakowicz@apache.org> <dw...@apache.org>
> >> > >>
> >> > >> wrote:
> >> > >>
> >> > >>
> >> > >> To be explicit, having worked on it, I support it ;) I think we can
> >> > >> start a vote thread soonish, as there are no concerns so far.
> >> > >>
> >> > >> Best,
> >> > >>
> >> > >> Dawid
> >> > >>
> >> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> >> > >>
> >> > >> Dear Flink developers,
> >> > >>
> >> > >> I would like to open a discussion on FLIP 217 [1] for an extension
> >> > >>
> >> > >> of
> >> > >>
> >> > >> Watermark Alignment to perform alignment also in SplitReaders. To
> >> > >>
> >> > >> do
> >> > >>
> >> > >> so,
> >> > >>
> >> > >> SplitReaders must be able to suspend and resume reading from split
> >> > >>
> >> > >> sources
> >> > >>
> >> > >> where the SourceOperator coordinates and controlls suspend and
> >> > >>
> >> > >> resume.
> >> > >>
> >> > >> To
> >> > >>
> >> > >> gather information about current watermarks of the SplitReaders, we
> >> > >>
> >> > >> extend
> >> > >>
> >> > >> the internal WatermarkOutputMulitplexer and report watermarks to
> >> > >>
> >> > >> the
> >> > >>
> >> > >> SourceOperator.
> >> > >>
> >> > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
> >> > >>
> >> > >> revised
> >> > >>
> >> > >> and
> >> > >>
> >> > >> reworked by Dawid Wysakowicz (He did most of the work.) and me. The
> >> > >>
> >> > >> changes
> >> > >>
> >> > >> are backwards compatible in a way that if affected components do
> >> > >>
> >> > >> not
> >> > >>
> >> > >> support split alignment the behavior is as before.
> >> > >>
> >> > >> Best,
> >> > >> Sebastian
> >> > >>
> >> > >> [1]
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> >> > >>
> >> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
> >> > >>
> >> > >> --
> >> > >>
> >> > >> Konstantin Knaufhttps://
> >> twitter.com/snntrablehttps://github.com/knaufk
> >> > >>
> >> > >>
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Thanks for the reply and patient discussion, Piotr and Dawid.

Is there any reason for pausing reading from a split an optional feature,
other than that this was not included in the original interface?

To be honest I am really worried about the complexity of the user story
here. Optional features like this have a high overhead. Imagine this
feature is optional, now a user enabled watermark alignment and defined a
few watermark groups. Would it work? Hmm, that depends on whether the
involved Source has implmemented this feature. If the Sources are well
documented, good luck. Otherwise end users may have to look into the code
of the Source to see whether the feature is supported. Which is something
they shouldn't have to do.

I think it would be way simpler and clearer to just let end users and Flink
assume all the connectors will implement this feature. After all the
watermark group is not optinoal to the end users. If in some rare cases,
the feature cannot be supported, a clear UnsupportedOperationException will
be thrown to tell users to explicitly remove this Source from the watermark
group. I don't think we should have a warning message here, as they tend to
be ignored in many cases. If we do this, we don't even need the supportXXX
method in the Source for this feature. In fact this is exactly how many
interfaces works today. For example, SplitEnumerator#addSplitsBack() is not
supported by Pravega source because it does not support partial failover.
In that case, it simply throws an exception to trigger a global recovery.

The reason we add a default implementation in this case would just for the
sake of backwards compatibility so the old source can still compile. Sure,
in short term, this feature might not be supported by many existing
sources. That is OK, and it is quite visible to the source developers that
they did not override the default impl which throws an
UnsupportedOperationException.

@Dawid,

the Java doc of the SupportXXX() method in the Source would be the single
>> source of truth regarding how to implement this feature.
>
>

I also don't find it entirely true. Half of the classes are theoretically
> optional and are utility classes from the point of view how the interfaces
> are organized. Theoretically users do not need to use any of
> SourceReaderBase & SplitReader. Would be weird to list their methods in the
> Source interface.

I think the ultimate goal of java docs is to guide users to implement the
Source. If SourceReaderBase is the preferred way to implement a
SourceReader, it seems worth mentioning that. Even the Java language
documentation interfaces lists the konwn implementations [1] so people can
leverage them. But for this particular case, if we make the feature
non-optional, we don't even need the supportXXX() method for now.

Thanks,

Jiangjie (Becket) Qin



On Wed, May 4, 2022 at 4:37 PM Dawid Wysakowicz <dw...@apache.org>
wrote:

> Hey Piotr and Becket,
>
> First of all, let me say I am happy with whichever option is agreed in the
> discussion.
>
> I wanted to clarify a few points from the discussion though:
>
> @Becket:
>
> The main argument for adding the methods to the SourceReader is that these
> methods are effectively NON-OPTIONAL to SourceReader impl, i.e. starting
> from this FLIP, all the SourceReaders impl are expected to support this
> method, although some old implementations may not have implemented this
> feature. I think we should distinguish the new features from the optional
> features. While the public decorative interface is a solution to the
> optional features. We should not use it for the features that are
> non-optional.
>
> I don't think that this feature is NON-OPTIONAL. Even though preferred, I
> still think it can be simply optional.
>
> the Java doc of the SupportXXX() method in the Source would be the single
> source of truth regarding how to implement this feature.
>
> I also don't find it entirely true. Half of the classes are theoretically
> optional and are utility classes from the point of view how the interfaces
> are organized. Theoretically users do not need to use any of
> SourceReaderBase & SplitReader. Would be weird to list their methods in the
> Source interface.
>
> @Piotr
>
> If we have all of the methods with default implementation in the base
> interface, the API doesn't give any clue to the user which set of methods
> are required to be implemented at the same time.
>
> I feel that no matter which option we choose this can not be solved
> entirely in either of the options, because of the point above and because
> the signature of SplitReader#pauseOrResumeSplits and
> SourceReader#pauseOrResumeSplits are slightly different (one identifies
> splits with splitId the other one passes the splits directly).
>
> Best,
>
> Dawid
> On 03/05/2022 14:30, Becket Qin wrote:
>
> Hi Piotr,
>
> Thanks for the comment.
>
> Just to clarify, I am not against the decorative interfaces, but I do
> think we should use them with caution. The main argument for adding the
> methods to the SourceReader is that these methods are
> effectively NON-OPTIONAL to SourceReader impl, i.e. starting from this
> FLIP, all the SourceReaders impl are expected to support this
> method, although some old implementations may not have implemented this
> feature. I think we should distinguish the new features from the optional
> features. While the public decorative interface is a solution to the
> optional features. We should not use it for the features that are
> non-optional.
>
> That said, this feature is optional for SplitReaders. Arguably we can have
> a decorative interface for that, but for simplicity and symmetry of the
> interface, personally I prefer just adding a new method.
>
> Regarding the advantages you mentioned about the decorative interfaces,
> they would make sense if:
> 1. The feature is optional.
> 2. There is only one decorative interface involved for a feature.
> Otherwise the argument that all the methods are grouped together will not
> stand.
>
> Compared with that, I think the current solution works fine in all cases,
> i.e. "having supportXXX() method in Source, and default methods /
> decorative interfaces in base interfaces.".
>
> The advantages are:
>> - clean and easy to implement base interface
>
> In the current approach, the Java doc of the SupportXXX() method in the
> Source would be the single source of truth regarding how to implement this
> feature. It lists the method that has to be implemented to support this
> feature, regardless of how many classes / interfaces are involved.
>
> When implementing the base interface, users do not need to implement a
> method with default implementation. If they are curious what the method is
> for, the java doc of that method simply points users to the SupportXXX()
> method in the Source. It seems not adding work to the users compared with
> decorative interfaces, but gives much better discoverability.
>
> - all of the methods from a single feature are grouped in a single
>> decorator interface, together with their dedicated java doc. It's also
>> easier to google search for help using the decorator name
>
> - if an optional feature requires two methods to be implemented at once,
>> decorator can guarantee that
>
> These two points are not true when multiple components and classes are
> involved collaboratively to provide a feature. In our case, we have both
> SourceReader and SplitReader involved. And there might be other interfaces
> on the JM side involved for some future features. So the relevant methods
> can actually be scattered over the places. That said, we may still use
> decorative interfaces for each component, if the feature is optional, given
> there is a single source of truth for the feature.
>
> Here I would strongly lean towards making life easier for new users,
>> lowering the entry barrier, at the (imo) slight expense for the power
>> users.
>
> I actually think the current approach is simpler, more extensible and more
> general for all the users. Can you articulate a bit more on which part you
> think makes users harder to understand?
>
>
> There is another benefit of the decorative interfaces which is not
> mentioned, but might be worth considering here. Usually the decorative
> interfaces give slightly better backwards compatibility than the new
> default method in the interfaces. That is when users are using a jar that
> was compiled with an older version of Flink which does not have the default
> method in the interfaces in question. A decorative interface may still
> provide backwards compatibility in that case, while default method impl
> cannot.
>
> I think in Flink we in general do not guarantee custom components compiled
> with an older version can run with a newer version of Flink. A recompile
> with a newer version would be required. That said, if we do care about
> this, we can just change the "supportXXX()" method in the Source interface
> to use decorative interfaces, and leave the other parts unchanged.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
> On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
>> Hi,
>>
>> Sorry for chipping in so late, but I was OoO for the last two weeks.
>> Regarding the interfaces, I would be actually against adding those methods
>> to the base interfaces for the reasons mentioned above. Clogging the base
>> interface for new users with tons of methods that they do not need, do not
>> understand and do not know what to do with them. Moreover, such decorative
>> interfaces are solving a problem if a feature requires two or more methods
>> to be implemented at the same time. If we have all of the methods with
>> default implementation in the base interface, the API doesn't give any
>> clue
>> to the user which set of methods are required to be implemented at the
>> same
>> time.
>>
>> > a) I feel the biggest drawback of decorative interfaces is which
>> interface
>> > they can decorate and which combinations of multiple decorative
>> interfaces
>> > are valid. (...)
>> > In the future, if there is a new feature added
>> > (e.g. sorted or pre-partitioned data aware), are we going to create
>> another
>> > interface of SplitReader such as SortedSplitReader or
>> PrePartitionedAware?
>> > Can they be combined? So I think the additional decorative interface
>> like
>> > withSplitsAlignment actually increases the understanding cost of users
>> > because they have to know what decorative interfaces are there, which
>> > interface they can decorate and which combinations of the decorative
>> > interfaces are valid and which are not. Ideally we want to avoid that.
>>
>> I'm not sure if I understand how embedding default methods in the base
>> interface is solving the problem: what can be combined or not? If there
>> are
>> two conflicting features, having decorative interfaces that can not be
>> mixed together actually makes much more sense to me rather than having
>> them
>> all in one base class. How would you allow users to implement only one of
>> those two features?
>>
>> To reiterate on the issue. Yes, there are drawbacks:
>> - how can a user discover what decorators are there?
>> - how can a user know where the decorator can be applied to?
>>
>> However those are drawbacks for more power users, that can be mitigated by
>> the documentation. For example listing all of the decorators with
>> detailed explanation both in the docs and in the java docs. More
>> experienced users will be able to deal with those issues easier, as they
>> will already have some basic understanding of Flink. Also if user has a
>> problem that he wants to solve, he will google search a potential solution
>> to his problem anyway, and while doing that he is very likely to discover
>> the decorator that he needs anyway in the docs.
>>
>> The advantages are:
>> - clean and easy to implement base interface
>> - all of the methods from a single feature are grouped in a single
>> decorator interface, together with their dedicated java doc. It's also
>> easier to google search for help using the decorator name
>> - if an optional feature requires two methods to be implemented at once,
>> decorator can guarantee that
>>
>> Here I would strongly lean towards making life easier for new users,
>> lowering the entry barrier, at the (imo) slight expense for the power
>> users.
>>
>> Best,
>> Piotrek
>>
>>
>> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com> napisał(a):
>>
>> > Thanks for the reply Sebastian and Dawid,
>> >
>> > I think Sebastion has a good summary. This is a really helpful
>> discussion.
>> >
>> > Thinking a bit more, I feel that it might still be better to add the
>> > supportsXXX() method in the Source rather than SourceReader.
>> >
>> > Generally speaking, what we are trying to do here is to let the Flink
>> > framework know what the Source is capable of. In this FLIP, it happens
>> to
>> > be the capability that only involves SourceReader. But in the future,
>> it is
>> > possible that another functionality involves both the SplitEnumerator
>> and
>> > SourceReader. In that case, following the current approach, we should
>> put
>> > the "supportsXXX()" method in both SplitEnumerator and SourceReader.
>> > Because if we only put this in the SourceReader, then the JM would have
>> to
>> > create a SourceReader in order to know whether this feature is
>> supported,
>> > which is a little ugly. But if we put the "supportsXXX()" method in the
>> > Source, we will break the "symmetric" design because this FLIP chose a
>> > different way.
>> >
>> > This is also why I think supportsXXX() method seems a good thing to
>> have,
>> > because when there are a few interfaces / methods that are expected to
>> be
>> > implemented at the same time in order to deliver a feature, it is always
>> > good to have a single source of truth to tell the framework what to do,
>> so
>> > the framework can do consistent things in different parts.
>> >
>> > @Sebastian Mattheis <se...@ververica.com>
>> >
>> > Regarding interface flavor b), i.e. AlignedSourceReader +
>> > AlignedSplitReader, what I feel awkward about is that we are essentially
>> > expecting almost all the SourceReader implementations to extend
>> > SourceReaderBase, which effectively makes the SourceReader interface
>> > without the pausing support useless. So this indicates that public
>> > decorative interfaces (or sub-interfaces for the same purpose) only
>> > make sense if the original interface is also expected to be used.
>> > Otherwise, it seems makes more sense to add the method to the original
>> > interface itself.
>> >
>> > Cheers,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> >
>> >
>> >
>> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
>> dwysakowicz@apache.org>
>> > wrote:
>> >
>> > > Thanks @Sebastian for the nice summary.
>> > >
>> > > I think most of your points aligned with the suggestions I made to the
>> > > FLIP, while you were writing your reply (I believe we hit enter
>> nearly at
>> > > the same time ;) )
>> > >
>> > > Two points after we synced offline
>> > >
>> > > 1. I changed also the supportsWatermarksSplitAlignment to
>> > > supportsPausingSplits to express the general capability of pausing.
>> > >
>> > > 2. As for if we should PausingSourceReader/PausingSplitReader (option
>> b)
>> > > or if we should just add the methods (option c), I suggest to simply
>> add
>> > > the two methods as I felt this is much preferred approach Becket,
>> which
>> > > others do not object. Unless there is an opposition let's go with this
>> > > option c.
>> > >
>> > > Best,
>> > >
>> > > Dawid
>> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>> > >
>> > > Hi folks,
>> > >
>> > > Sorry for being a bit silent. Many thanks for all the input and
>> > > suggestions. As I'm a bit new, I needed some time to catch up and
>> > structure
>> > > (for myself) the discussion and I wanted to find a way to structure
>> the
>> > > conclusions. (Also because I had the feeling that some concerns got
>> lost
>> > in
>> > > the discussion.) This is my attempt and please correct me if
>> something is
>> > > wrong or misunderstood. I tried to collect and assemble the opinions,
>> > > suggestions, and conclusions (to the best of my knowledge):
>> > >
>> > > # Top A: Should split alignment (pause/resume behavior) be a general
>> > > capability?
>> > >
>> > > I personally don't see any reason no to have it a general capability
>> > > because for the alignSplit method it is actually independent of the
>> > > watermarks. If we agree here to have it a general capability, we
>> should
>> > > also agree on the right wording. Does "alignSplits(splitsToResume,
>> > > splitsToPause)" refer to what is then actually meant? (I see it as
>> okay.
>> > I
>> > > don't have any better idea whilst Arvid suggested
>> "pauseOrResumeSplits".)
>> > >
>> > > # Top B: Should it be possible do enable/disable split alignment?
>> > >
>> > > I would personally not disable the split alignment on the source
>> reader
>> > > side because if split alignment is used for some other use case (see
>> A)
>> > it
>> > > could have nasty side effects on other/future use cases. Instead, I
>> would
>> > > disable "watermark split alignment" where I think it should disable
>> the
>> > > watermark-dependent trigger for split alignment.
>> > >
>> > > # Top C: Should we add a supportsX method?
>> > >
>> > > I find it difficult to define the scope of a supportsX method w.r.t.
>> to
>> > > the following questions: a) Where is it used? and b) What is the
>> expected
>> > > output? To b), it's not straight-forward to provide a meaningful
>> output,
>> > > e.g., if SourceReader supports split alignment but SplitReader not.
>> This
>> > is
>> > > because with the current implementation, we can determine whether
>> split
>> > > alignment is fully supported only during runtime and specifically
>> > actually
>> > > only when calling alignSplits down the call hierarchy up to the actual
>> > > SplitReaders.
>> > >
>> > > Therefore, I would suggest to either raise an error or warning if the
>> > > alignment is called but not supported at some point. I know we should
>> > > carefully think about when this could be the case because we don't
>> want
>> > to
>> > > flood anybody with such warnings. However, warnings could be an
>> indicator
>> > > for the user that for watermark split alignment use case split
>> reading is
>> > > imbalanced with the conclusion to either disable the trigger for
>> > watermark
>> > > split alignment (see Top B) or to use/implement a source and reader
>> that
>> > > fully supports split alignment.
>> > >
>> > > # Top D: How to design interfaces?
>> > >
>> > > Thanks for structuring the discussion with the the various
>> possibilities
>> > > (a-d). From the discussion and emails, I would like to summarize the
>> > > following requirements:
>> > > - Interfaces should be consistent ("symmetric"), i.e., similar
>> semantics
>> > > should have similar interfaces with similar usage.
>> > > - Make explicit which implementations implement interfaces/support
>> > > behavior.
>> > > - Make clear what are default implementations and how to implement
>> > > interfaces with desired behavior.
>> > >
>> > > This is a simplified view of the relations between relevant classes of
>> > the
>> > > PoC implementation:
>> > >
>> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- .. <|--
>> > > MySourceReader
>> > >
>> > > MySourceReader <>-- SplitFetcherManager (Internal) <>-- SplitFetcher
>> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
>> > >
>> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
>> > >
>> > > Note that SourceReaderBase and SplitFetcherManager implement most of
>> the
>> > > "logic" for split alignment just because we wanted to implement split
>> > > alignment and wanted it to be available as kind of a default. As a
>> > > consequence, we have a "default implementation" for SourceReader that
>> > > implements the actual logic for split alignment. For that reason, I
>> find
>> > it
>> > > very confusing to have a NOOP default implementation in the interface
>> for
>> > > the SourceReader. As a consequence, interface strategy c) is difficult
>> > > because this would require NOOP default implementations in the public
>> > > interfaces of SourceReader and SplitReader. This is the same for
>> strategy
>> > > d) because it would require NOOP default implementation in the
>> > > SourceReader. Further, as Dawid described method signatures of
>> alignSplit
>> > > for SourceReader and SplitReader differ and it would be extremely
>> > difficult
>> > > to make the signatures the same (with even potential performance
>> impact
>> > > because of additional loop-ups of split ids). Therefore, having a
>> > symmetric
>> > > decorative interface as of strategy a) is actually not possible and
>> > having
>> > > two decorative interfaces with different method signatures is
>> confusing.
>> > My
>> > > conclusion is that we are best with strategy b) which means to have
>> > > specializing sub-interfaces that inherit from the parent interface:
>> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
>> > AlignedSplitReader
>> > > With this option, I'm not 100% sure what the implications are and if
>> this
>> > > could get nasty. I would suggest that Dawid and I just try to
>> implement
>> > and
>> > > see if we like it. :)
>> > >
>> > > # Summary
>> > >
>> > > In conclusion, please let me know your perspectives. Please correct
>> me,
>> > if
>> > > something is wrong or if I misunderstood something. My perspective
>> would
>> > be:
>> > >
>> > > Top A: Yes
>> > > Top B: Yes (but disable watermark trigger for split alignment)
>> > > Top C: No
>> > > Top D: b)
>> > >
>> > > Best,
>> > > Sebastian
>> > >
>> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
>> dwysakowicz@apache.org
>> > >
>> > > wrote:
>> > >
>> > >> @Arvid:
>> > >>
>> > >> While I also like Becket's capability approach, I fear that it
>> doesn't
>> > work
>> > >> for this particular use case: Sources can always be aligned
>> cross-task
>> > and
>> > >> this is just about intra-task alignment. So it's plausible to put
>> > sources
>> > >> into an alignment group even though they do not use any of the
>> presented
>> > >> API of FLIP-217. They should just issue a warning, if they handle
>> > multiple
>> > >> splits (see motivation section).
>> > >>
>> > >> Yes, but the "supportXXX" method would be for telling if it supports
>> > that
>> > >> intra-task alignment. Cross-task alignment would always be supported.
>> > >>
>> > >> I updated interfaces to what I believe to be closest to a consensus
>> > >> between all participants. Do you mind taking a look?
>> > >>
>> > >> @Sebastian Do you mind addressing the nits?
>> > >>
>> > >> Best,
>> > >>
>> > >> Dawid
>> > >>
>> > >> On 25/04/2022 13:39, Arvid Heise wrote:
>> > >>
>> > >> Thanks for pushing this effort.
>> > >>
>> > >> I'd actually be in favor of 1b). I fully agree that decorator
>> interfaces
>> > >> should be avoided but I'm also not a big fan of overloading the base
>> > >> interfaces (they are hard to implement as is). The usual feedback to
>> > >> Source-related interfaces are always that they are overwhelming and
>> too
>> > >> hard to implement. However, I'd also not oppose 1c) as scattered
>> > interfaces
>> > >> also have drawbacks. I'd just dislike 1a) and 1d).
>> > >> While I also like Becket's capability approach, I fear that it
>> doesn't
>> > work
>> > >> for this particular use case: Sources can always be aligned
>> cross-task
>> > and
>> > >> this is just about intra-task alignment. So it's plausible to put
>> > sources
>> > >> into an alignment group even though they do not use any of the
>> presented
>> > >> API of FLIP-217. They should just issue a warning, if they handle
>> > multiple
>> > >> splits (see motivation section).
>> > >>
>> > >> I think renaming alignSplits to facilitate future use cases makes
>> sense
>> > but
>> > >> then all interfaces (if 1c) is chosen) should be adjusted
>> accordingly.
>> > >> AlignedSourceReader could be PausingSourceReader and I'd go for
>> > >> pauseOrResumeSplits (Becket's proposal afaik). We could also split it
>> > into
>> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may allow
>> Sources
>> > to
>> > >> just use 1 instead of 2 library calls (as written in the Javadoc),
>> both
>> > >> Kafka and Pulsar can't use it and I'm not sure if there is a system
>> that
>> > >> can.
>> > >>
>> > >> Some nit for the FLIP:
>> > >> - Please replace "stop" with "pause".
>> > >> - Not sure if it's worth it in the capability section: Sources that
>> > adopt
>> > >> this interface cannot be used in earlier versions. So it feels like
>> we
>> > are
>> > >> only forward compatible (old sources can be used after the change);
>> but
>> > I
>> > >> guess this holds for any API addition.
>> > >> - You might want to add what happens when all splits are paused.
>> > >> - You may want to describe how the 3 flavors of SourceReaderBase
>> > interact
>> > >> with the interface.
>> > >> - I'm not sure if it makes sense to include Kafka and Pulsar in the
>> > FLIP.
>> > >> For me, this is rather immediate follow-up work. (could be in the
>> same
>> > >> umbrella ticket)
>> > >>
>> > >> Best,
>> > >>
>> > >> Arvid
>> > >>
>> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
>> > dwysakowicz@apache.org> <dw...@apache.org>
>> > >> wrote:
>> > >>
>> > >>
>> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
>> along
>> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
>> > >> b) "MySourceReader implements AlignedSourceReader" and "MySplitReader
>> > >> implements AlignedSplitReader", or
>> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
>> > implements
>> > >> SplitReader".
>> > >>
>> > >> I think the latest proposal according to Dawid would be:
>> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
>> > implements
>> > >> AlignedSplitReader".
>> > >> I am fine with this API, although personally speaking I think it is
>> > simpler
>> > >> to just add a new method to the split reader with default impl.
>> > >>
>> > >>
>> > >> I think that is a good idea to have it aligned as much as possible.
>> I'd
>> > be
>> > >> +1 for your option c). We can merge AlignedSplitReader with
>> > SplitReader. We
>> > >> will update the FLIP shortly.
>> > >>
>> > >> Best,
>> > >>
>> > >> Dawid
>> > >>
>> > >> On 25/04/2022 12:43, Becket Qin wrote:
>> > >>
>> > >> Thanks for the comment, Jark.
>> > >>
>> > >> 3. Interface/Method Name.
>> > >>
>> > >> Can the interface be used to align other things in the future? For
>> > example,
>> > >> align read speed, I have
>> > >> seen users requesting global rate limits. This feature may also need
>> an
>> > >> interface like this.
>> > >> If we don't plan to extend this interface to support align other
>> > things, I
>> > >> suggest explicitly declaring
>> > >> the purpose of the methods, such as `alignWatermarksForSplits`
>> instead
>> > of
>> > >> `alignSplits`.
>> > >>
>> > >> This is a good point. Naming wise, it would usually be more
>> extensible
>> > to
>> > >> just describe what the method actually does, instead of assuming the
>> > >> purpose of doing this. For example, in this case,
>> pauseOrResumeSplits()
>> > >> would be more extensible because this can be used for any kind of
>> flow
>> > >> control, be it watermark alignment or simple rate limiting.
>> > >>
>> > >> 4. Interface or Method.
>> > >>
>> > >> I don't have a strong opinion on this. I think they have their own
>> > >> advantages.
>> > >> In Flink SQL, we heavily use Interfaces for extending abilities
>> > >> (SupportsXxxx) for TableSource/TableSink,
>> > >> and I prefer Interfaces rather than methods in this case. When you
>> have
>> > a
>> > >> bunch of abilities and each ability
>> > >> has more than one method, Interfaces can help to organize them and
>> make
>> > >> users clear which methods
>> > >> need to implement when you want to have an ability.
>> > >>
>> > >> I am OK with decorative interfaces if this is a general design
>> pattern
>> > in
>> > >> the other components in Flink. But it looks like the current API
>> > proposal
>> > >> is not symmetric.
>> > >>
>> > >> The current proposal is essentially "MySourceReader implements
>> > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
>> implements
>> > >> AlignedSplitsReader".
>> > >>
>> > >> Should we make the API symmetric? I'd consider any one of the
>> following
>> > as
>> > >> symmetric.
>> > >>
>> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
>> along
>> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
>> > >> b) "MySourceReader implements AlignedSourceReader" and "MySplitReader
>> > >> implements AlignedSplitReader", or
>> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
>> > implements
>> > >> SplitReader".
>> > >>
>> > >> I think the latest proposal according to Dawid would be:
>> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
>> > implements
>> > >> AlignedSplitReader".
>> > >> I am fine with this API, although personally speaking I think it is
>> > simpler
>> > >> to just add a new method to the split reader with default impl.
>> > >>
>> > >> @Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org>
>> <
>> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the reply.
>> > >>
>> > >> Having said that, as I don't have a preference and I agree most of
>> the
>> > >>
>> > >> sources will support the alignment I am fine following your
>> suggestion
>> > to
>> > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
>> but
>> > >> would put the "supportsXXX" there, not in the Source to keep the two
>> > >> methods together.
>> > >>
>> > >> One benefit of having the "supportsXXX" in Source is that this allows
>> > some
>> > >> compile time check. For example, if a user enabled watermark
>> alignment
>> > >> while it is not supported by the Source, an exception can be thrown
>> at
>> > >> compile time. It seems in general useful. That said, I agree that API
>> > >> cleanliness wise it is better to put the two methods together.
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Jiangjie (Becket) Qin
>> > >>
>> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
>> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
>> > >>
>> > >>
>> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
>> > >> I don't have other concerns then.
>> > >>
>> > >> Best,
>> > >> Jark
>> > >>
>> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
>> dwysakowicz@apache.org>
>> > <dw...@apache.org> <dw...@apache.org> <
>> dwysakowicz@apache.org>
>> > >> wrote:
>> > >>
>> > >>
>> > >> @Jark:
>> > >>
>> > >> 1. Will the framework always align with watermarks when the source
>> > >> implements the interface?
>> > >> I'm afraid not every case needs watermark alignment even if Kafka
>> > >> implements the interface,
>> > >> and this will affect the throughput somehow. I agree with Becket
>> > >> we may need a
>> > >> `supportSplitsAlignment()` method for users to configure the source
>> to
>> > >> enable/disable the alignment.
>> > >>
>> > >> 2. How does the framework calculate maxDesiredWatermark?
>> > >> I think the algorithm of maxDesiredWatermark will greatly affect
>> > >>
>> > >> throughput
>> > >>
>> > >> if the reader is constantly
>> > >>  switching between pause and resume. Can users configure the
>> alignment
>> > >> offset?
>> > >>
>> > >>
>> > >> This is covered in the previous FLIP[1] which has been already
>> > >>
>> > >> implemented
>> > >>
>> > >> in 1.15. In short, it must be enabled with the watermark strategy
>> which
>> > >> also configures drift and update interval.
>> > >>
>> > >> If we don't plan to extend this interface to support align other
>> things,
>> > >>
>> > >> I
>> > >>
>> > >> suggest explicitly declaring
>> > >> the purpose of the methods, such as `alignWatermarksForSplits`
>> instead
>> > of
>> > >> `alignSplits`.
>> > >>
>> > >>
>> > >> Sure let's rename it.
>> > >>
>> > >> @Becket:
>> > >>
>> > >> I understand your point. On the other hand putting all methods, even
>> > with
>> > >> "supportsXXX" methods for enabling certain features, makes the entry
>> > >> threshold for writing a new source higher. Instead of focusing on the
>> > >>
>> > >> basic
>> > >>
>> > >> and required properties of the Source, the person implementing a
>> source
>> > >> must bother with and need to figure out what all of the extra
>> features
>> > >>
>> > >> are
>> > >>
>> > >> about and how to deal with them. It makes it also harder to organize
>> > >> methods in coupled groups as Jark said.
>> > >>
>> > >> Having said that, as I don't have a preference and I agree most of
>> the
>> > >> sources will support the alignment I am fine following your
>> suggestion
>> > to
>> > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
>> but
>> > >> would put the "supportsXXX" there, not in the Source to keep the two
>> > >> methods together.
>> > >>
>> > >> Lastly, I agree it is really unfortunate the "alignSplits" methods
>> > differ
>> > >> slightly for SourceReader and SpitReader. The reason for that is
>> > >> SourceReaderBase deals only with SplitIds, whereas SplitReader needs
>> the
>> > >> actual splits to pause them. I found the discrepancy acceptable for
>> the
>> > >> sake of simplifying changes significantly, especially as they would
>> > >>
>> > >> highly
>> > >>
>> > >> likely impact performance as we would have to perform additional
>> > lookups.
>> > >> Moreover the SplitReader is a secondary interface.
>> > >>
>> > >> Best,
>> > >>
>> > >> Dawid
>> > >>
>> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>> > >>
>> > >> On 24/04/2022 17:15, Jark Wu wrote:
>> > >>
>> > >> Thanks for the effort, Dawid and Sebastian!
>> > >>
>> > >> I just have some minor questions (maybe I missed something).
>> > >>
>> > >> 1. Will the framework always align with watermarks when the source
>> > >> implements the interface?
>> > >> I'm afraid not every case needs watermark alignment even if Kafka
>> > >> implements the interface,
>> > >> and this will affect the throughput somehow. I agree with Becket
>> > >> we may need a
>> > >> `supportSplitsAlignment()` method for users to configure the source
>> to
>> > >> enable/disable the alignment.
>> > >>
>> > >> 2. How does the framework calculate maxDesiredWatermark?
>> > >> I think the algorithm of maxDesiredWatermark will greatly affect
>> > >>
>> > >> throughput
>> > >>
>> > >> if the reader is constantly
>> > >>  switching between pause and resume. Can users configure the
>> alignment
>> > >> offset?
>> > >>
>> > >> 3. Interface/Method Name.
>> > >> Can the interface be used to align other things in the future? For
>> > >>
>> > >> example,
>> > >>
>> > >> align read speed, I have
>> > >> seen users requesting global rate limits. This feature may also need
>> an
>> > >> interface like this.
>> > >> If we don't plan to extend this interface to support align other
>> things,
>> > >>
>> > >> I
>> > >>
>> > >> suggest explicitly declaring
>> > >> the purpose of the methods, such as `alignWatermarksForSplits`
>> instead
>> > of
>> > >> `alignSplits`.
>> > >>
>> > >> 4. Interface or Method.
>> > >> I don't have a strong opinion on this. I think they have their own
>> > >> advantages.
>> > >> In Flink SQL, we heavily use Interfaces for extending abilities
>> > >> (SupportsXxxx) for TableSource/TableSink,
>> > >> and I prefer Interfaces rather than methods in this case. When you
>> have
>> > a
>> > >> bunch of abilities and each ability
>> > >> has more than one method, Interfaces can help to organize them and
>> make
>> > >> users clear which methods
>> > >> need to implement when you want to have an ability.
>> > >>
>> > >>
>> > >> Best,
>> > >> Jark
>> > >>
>> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <be...@gmail.com> <
>> > becket.qin@gmail.com> <be...@gmail.com> <be...@gmail.com> <
>> > >> becket.qin@gmail.com> wrote:
>> > >>
>> > >> Hi Dawid,
>> > >>
>> > >> Thanks for the explanation. Apologies that I somehow misread a bunch
>> of
>> > >> "align" and thought they were "assign".
>> > >>
>> > >> Regarding 1, by default implementation, I was thinking of the default
>> > >>
>> > >> no-op
>> > >>
>> > >> implementation. I am a little worried about the proliferation of
>> > >>
>> > >> decorative
>> > >>
>> > >> interfaces. I think the most important thing about interfaces is that
>> > >>
>> > >> they
>> > >>
>> > >> are easy to understand. In this case, I prefer adding new method to
>> the
>> > >> existing interface for the following reasons:
>> > >>
>> > >> a) I feel the biggest drawback of decorative interfaces is which
>> > >>
>> > >> interface
>> > >>
>> > >> they can decorate and which combinations of multiple decorative
>> > >>
>> > >> interfaces
>> > >>
>> > >> are valid. In the current FLIP, the withSplitsAlignment interface is
>> > only
>> > >> applicable to the SourceReader which means it can't decorate any
>> other
>> > >> interface. From an interface design perspective, a natural question
>> is
>> > >>
>> > >> why
>> > >>
>> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"? And it is
>> > also
>> > >> natural to assume that a split reader implementing both SplitReader
>> and
>> > >> WithSplitAlignment would work, because a source reader implementing
>> > >> SourceReader and withSplitsAlignment works. So why isn't there an
>> > >>
>> > >> interface
>> > >>
>> > >> of AlignedSourceReader? In the future, if there is a new feature
>> added
>> > >> (e.g. sorted or pre-partitioned data aware), are we going to create
>> > >>
>> > >> another
>> > >>
>> > >> interface of SplitReader such as SortedSplitReader or
>> > >>
>> > >> PrePartitionedAware?
>> > >>
>> > >> Can they be combined? So I think the additional decorative interface
>> > like
>> > >> withSplitsAlignment actually increases the understanding cost of
>> users
>> > >> because they have to know what decorative interfaces are there, which
>> > >> interface they can decorate and which combinations of the decorative
>> > >> interfaces are valid and which are not. Ideally we want to avoid
>> that.
>> > To
>> > >> be clear, I am not opposing having an interface of
>> withSplitsAlignment,
>> > >>
>> > >> it
>> > >>
>> > >> is completely OK to have it as an internal interface and let
>> > SourceReader
>> > >> and SplitReader both extend it.
>> > >>
>> > >> b) Adding a new method to the SourceReader with a default
>> implementation
>> > >>
>> > >> of
>> > >>
>> > >> no-op would help avoid logic branching in the source logic,
>> especially
>> > >> given that we agree that the vast majority of the SourceReader
>> > >> implementations, if not all, would just extend from the
>> > SourceReaderBase.
>> > >> That means adding a new method to the interface would effectively
>> give
>> > >>
>> > >> the
>> > >>
>> > >> same user experience, but simpler.
>> > >>
>> > >> c) A related design principle that may be worth discussing is how do
>> we
>> > >>
>> > >> let
>> > >>
>> > >> the Source implementations tell Flink what capability is supported
>> and
>> > >>
>> > >> what
>> > >>
>> > >> is not. Personally speaking I feel the most intuitive place to me is
>> in
>> > >>
>> > >> the
>> > >>
>> > >> Source itself, because that is the entrance of the entire Source
>> > >>
>> > >> connector
>> > >>
>> > >> logic.
>> > >>
>> > >> Based on the above thoughts, I am wondering if the following
>> interface
>> > >> would be easier to understand by the users.
>> > >>
>> > >> - Change "withSplitsAlignment" to internal interface, let both
>> > >>
>> > >> SourceReader
>> > >>
>> > >> and SplitReader extend from it, with a default no-op implementation.
>> > >> - Add a new method "boolean supportSplitsAlignment()" to the Source
>> > >> interface, with a default implementation returning false. Sources
>> that
>> > >>
>> > >> have
>> > >>
>> > >> implemented the alignment logic can change this to return true, and
>> > >> override the alignSplits() methods in the SourceReader / SplitReader
>> if
>> > >> needed.
>> > >> - In the future, if a new optional feature is going to be added to
>> the
>> > >> Source, and that feature requires the awareness from Flink, we can
>> add
>> > >>
>> > >> more
>> > >>
>> > >> such methods to the Source.
>> > >>
>> > >> What do you think?
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Jiangjie (Becket) Qin
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>> > dwysakowicz@apache.org> <dw...@apache.org> <
>> dwysakowicz@apache.org>
>> > <dw...@apache.org>
>> > >> <dw...@apache.org> <dw...@apache.org> <
>> > dwysakowicz@apache.org> <dw...@apache.org>
>> > >>
>> > >> wrote:
>> > >>
>> > >>
>> > >> @Konstantin:
>> > >>
>> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
>> stop &
>> > >> resume behavior) will be implemented for Kafka and Pulsar only,
>> correct?
>> > >>
>> > >> Correct, as far as I know though, those are the only sources which
>> > >>
>> > >> consume
>> > >>
>> > >> concurrently from multiple splits and thus alignment applies.
>> > >>
>> > >> @Thomas:
>> > >>
>> > >> I wonder if "supporting" split alignment in SourceReaderBase and then
>> > >>
>> > >> doing
>> > >>
>> > >> nothing if the split reader does not implement AlignedSplitReader
>> could
>> > >>
>> > >> be
>> > >>
>> > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
>> > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
>> that
>> > >> the source actually supports it.
>> > >>
>> > >> I understand your concern. Hmm, I think we could actually do that.
>> Given
>> > >> the actual implementation of the SourceReaderBase#alignSplits is
>> rather
>> > >> short (just a forward to the corresponding method of SplitFetcher),
>> we
>> > >> could reimplement it in the actual source implementations. This
>> solution
>> > >> has the downside though. Authors of new sources would have to do two
>> > >> things: extend from AlignedSplitReader and implement
>> > >>
>> > >> WithSplitsAssignment,
>> > >>
>> > >> instead of just extending AlignedSplitReader. I would be fine with
>> such
>> > a
>> > >> tradeoff though. What others think?
>> > >>
>> > >> @Steven:
>> > >>
>> > >> For this part from the motivation section, is it accurate? Let's
>> assume
>> > >>
>> > >> one
>> > >>
>> > >> source task consumes from 3 partitions and one of the partition is
>> > >> significantly slower. In this situation, watermark for this source
>> task
>> > >> won't hold back as it is reading recent data from other two Kafka
>> > >> partitions. As a result, it won't hold back the overall watermark. I
>> > >> thought the problem is that we may have late data for this slow
>> > >>
>> > >> partition.
>> > >>
>> > >> It will hold back the watermark. Watermark of an operator is the
>> minimum
>> > >> of watermarks of all splits[1]
>> > >>
>> > >> I have another question about the restart. Say split alignment is
>> > >> triggered. checkpoint is completed. job failed and restored from the
>> > last
>> > >> checkpoint. because alignment decision is not checkpointed, initially
>> > >> alignment won't be enforced until we get a cycle of watermark
>> > aggregation
>> > >> and propagation, right? Not saying this corner is a problem. Just
>> want
>> > to
>> > >> understand it more.
>> > >>
>> > >> Your understanding is correct.
>> > >>
>> > >> @Becket:
>> > >>
>> > >> 1. I think watermark alignment is sort of a general use case, so
>> should
>> > >>
>> > >> we
>> > >>
>> > >> just add the related methods to SourceReader directly instead of
>> > >> introducing the new interface of WithSplitAssignment? We can provide
>> > >> default implementations, so backwards compatibility won't be an
>> issue.
>> > >>
>> > >> I don't think we can provide a default implementation. How would we
>> do
>> > >> that? Would it be just a no-op? Is it better than having an opt-in
>> > >> interface? The default implementation would have to be added
>> exclusively
>> > >>
>> > >> in
>> > >>
>> > >> a *Public* SourceReader interface. By the way notice SourceReaderBase
>> > >> does extend from WithSplitsAlignment, so effectively all
>> implementations
>> > >>
>> > >> do
>> > >>
>> > >> handle the alignment case. To be honest I think it is impossible to
>> > >> implement the SourceReader interface directly by end users.
>> > >>
>> > >> 2. As you mentioned, the SplitReader interface probably also needs
>> some
>> > >> change to support throttling at the split granularity. Can you add
>> that
>> > >> interface change into the public interface section as well?
>> > >>
>> > >> It has been added from the beginning. See *AlignedSplitReader.*
>> > >>
>> > >> 3. Nit, can we avoid using the method name assignSplits here, given
>> that
>> > >>
>> > >> it
>> > >>
>> > >> is not actually changing the split assignments? It seems something
>> like
>> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
>> > >>
>> > >> The method's called *alignSplits*, not assign. Do you still prefer a
>> > >> different name for that? Personally, I am open for suggestions here.
>> > >>
>> > >> Best,
>> > >>
>> > >> Dawid
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> > >>
>> > >>
>> >
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>> > >>
>> > >> On 22/04/2022 05:59, Becket Qin wrote:
>> > >>
>> > >> Thanks for driving the effort, Sebastion. I think the motivation
>> makes a
>> > >> lot of sense. Just a few suggestions / questions.
>> > >>
>> > >> 1. I think watermark alignment is sort of a general use case, so
>> should
>> > >>
>> > >> we
>> > >>
>> > >> just add the related methods to SourceReader directly instead of
>> > >> introducing the new interface of WithSplitAssignment? We can provide
>> > >> default implementations, so backwards compatibility won't be an
>> issue.
>> > >>
>> > >> 2. As you mentioned, the SplitReader interface probably also needs
>> some
>> > >> change to support throttling at the split granularity. Can you add
>> that
>> > >> interface change into the public interface section as well?
>> > >>
>> > >> 3. Nit, can we avoid using the method name assignSplits here, given
>> that
>> > >>
>> > >> it
>> > >>
>> > >> is not actually changing the split assignments? It seems something
>> like
>> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
>> > >>
>> > >> Thanks,
>> > >>
>> > >> Jiangjie (Becket) Qin
>> > >>
>> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <st...@gmail.com> <
>> > stevenz3wu@gmail.com> <st...@gmail.com> <st...@gmail.com> <
>> > >> stevenz3wu@gmail.com> <
>> > >> stevenz3wu@gmail.com> wrote:
>> > >>
>> > >> However, a single source operator may read data from multiple
>> > >>
>> > >> splits/partitions, e.g., multiple Kafka partitions, such that even
>> with
>> > >> watermark alignment the source operator may need to buffer excessive
>> > >>
>> > >> amount
>> > >>
>> > >> of data if one split emits data faster than another.
>> > >>
>> > >> For this part from the motivation section, is it accurate? Let's
>> assume
>> > >>
>> > >> one
>> > >>
>> > >> source task consumes from 3 partitions and one of the partition is
>> > >> significantly slower. In this situation, watermark for this source
>> task
>> > >> won't hold back as it is reading recent data from other two Kafka
>> > >> partitions. As a result, it won't hold back the overall watermark. I
>> > >> thought the problem is that we may have late data for this slow
>> > >>
>> > >> partition.
>> > >>
>> > >> I have another question about the restart. Say split alignment is
>> > >> triggered. checkpoint is completed. job failed and restored from the
>> > last
>> > >> checkpoint. because alignment decision is not checkpointed, initially
>> > >> alignment won't be enforced until we get a cycle of watermark
>> > aggregation
>> > >> and propagation, right? Not saying this corner is a problem. Just
>> want
>> > to
>> > >> understand it more.
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org> <
>> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>> > >> thw@apache.org> <
>> > >> thw@apache.org> wrote:
>> > >>
>> > >> Thanks for working on this!
>> > >>
>> > >> I wonder if "supporting" split alignment in SourceReaderBase and then
>> > >>
>> > >> doing
>> > >>
>> > >> nothing if the split reader does not implement AlignedSplitReader
>> could
>> > >>
>> > >> be
>> > >>
>> > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
>> > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
>> that
>> > >> the source actually supports it.
>> > >>
>> > >> Thanks,
>> > >> Thomas
>> > >>
>> > >>
>> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <kn...@apache.org>
>> <
>> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
>> > >> knaufk@apache.org> <
>> > >> knaufk@apache.org>
>> > >>
>> > >> wrote:
>> > >>
>> > >>
>> > >> Hi Sebastian, Hi Dawid,
>> > >>
>> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
>> stop
>> > >>
>> > >> &
>> > >>
>> > >> resume behavior) will be implemented for Kafka and Pulsar only,
>> > >>
>> > >> correct?
>> > >>
>> > >> +1 in general. I believe it is valuable to complete the watermark
>> > >>
>> > >> aligned
>> > >>
>> > >> story with this FLIP.
>> > >>
>> > >> Cheers,
>> > >>
>> > >> Konstantin
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
>> > dwysakowicz@apache.org> <dw...@apache.org>
>> > >>
>> > >> wrote:
>> > >>
>> > >>
>> > >> To be explicit, having worked on it, I support it ;) I think we can
>> > >> start a vote thread soonish, as there are no concerns so far.
>> > >>
>> > >> Best,
>> > >>
>> > >> Dawid
>> > >>
>> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>> > >>
>> > >> Dear Flink developers,
>> > >>
>> > >> I would like to open a discussion on FLIP 217 [1] for an extension
>> > >>
>> > >> of
>> > >>
>> > >> Watermark Alignment to perform alignment also in SplitReaders. To
>> > >>
>> > >> do
>> > >>
>> > >> so,
>> > >>
>> > >> SplitReaders must be able to suspend and resume reading from split
>> > >>
>> > >> sources
>> > >>
>> > >> where the SourceOperator coordinates and controlls suspend and
>> > >>
>> > >> resume.
>> > >>
>> > >> To
>> > >>
>> > >> gather information about current watermarks of the SplitReaders, we
>> > >>
>> > >> extend
>> > >>
>> > >> the internal WatermarkOutputMulitplexer and report watermarks to
>> > >>
>> > >> the
>> > >>
>> > >> SourceOperator.
>> > >>
>> > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
>> > >>
>> > >> revised
>> > >>
>> > >> and
>> > >>
>> > >> reworked by Dawid Wysakowicz (He did most of the work.) and me. The
>> > >>
>> > >> changes
>> > >>
>> > >> are backwards compatible in a way that if affected components do
>> > >>
>> > >> not
>> > >>
>> > >> support split alignment the behavior is as before.
>> > >>
>> > >> Best,
>> > >> Sebastian
>> > >>
>> > >> [1]
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>> > >>
>> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
>> > >>
>> > >> --
>> > >>
>> > >> Konstantin Knaufhttps://
>> twitter.com/snntrablehttps://github.com/knaufk
>> > >>
>> > >>
>> >
>>
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Dawid Wysakowicz <dw...@apache.org>.
Hey Piotr and Becket,

First of all, let me say I am happy with whichever option is agreed in 
the discussion.

I wanted to clarify a few points from the discussion though:

@Becket:

    The main argument for adding the methods to the SourceReader is that
    these methods are effectively NON-OPTIONAL to SourceReader impl,
    i.e. starting from this FLIP, all the SourceReaders impl are
    expected to support this method, although some old implementations
    may not have implemented this feature. I think we should distinguish
    the new features from the optional features. While the public
    decorative interface is a solution to the optional features. We
    should not use it for the features that are non-optional.

I don't think that this feature is NON-OPTIONAL. Even though preferred, 
I still think it can be simply optional.

    the Java doc of the SupportXXX() method in the Source would be the
    single source of truth regarding how to implement this feature.

I also don't find it entirely true. Half of the classes are 
theoretically optional and are utility classes from the point of view 
how the interfaces are organized. Theoretically users do not need to use 
any of SourceReaderBase & SplitReader. Would be weird to list their 
methods in the Source interface.

@Piotr

    If we have all of the methods with default implementation in the
    base interface, the API doesn't give any clue to the user which set
    of methods are required to be implemented at the same time.

I feel that no matter which option we choose this can not be solved 
entirely in either of the options, because of the point above and 
because the signature of SplitReader#pauseOrResumeSplits and 
SourceReader#pauseOrResumeSplits are slightly different (one identifies 
splits with splitId the other one passes the splits directly).

Best,

Dawid

On 03/05/2022 14:30, Becket Qin wrote:
> Hi Piotr,
>
> Thanks for the comment.
>
> Just to clarify, I am not against the decorative interfaces, but I do 
> think we should use them with caution. The main argument for adding 
> the methods to the SourceReader is that these methods are 
> effectively NON-OPTIONAL to SourceReader impl, i.e. starting from this 
> FLIP, all the SourceReaders impl are expected to support this 
> method, although some old implementations may not have implemented 
> this feature. I think we should distinguish the new features from the 
> optional features. While the public decorative interface is a solution 
> to the optional features. We should not use it for the features that 
> are non-optional.
>
> That said, this feature is optional for SplitReaders. Arguably we can 
> have a decorative interface for that, but for simplicity and 
> symmetry of the interface, personally I prefer just adding a new method.
>
> Regarding the advantages you mentioned about the 
> decorative interfaces, they would make sense if:
> 1. The feature is optional.
> 2. There is only one decorative interface involved for a feature. 
> Otherwise the argument that all the methods are grouped together will 
> not stand.
>
> Compared with that, I think the current solution works fine in all 
> cases, i.e. "having supportXXX() method in Source, and default methods 
> / decorative interfaces in base interfaces.".
>
>     The advantages are:
>     - clean and easy to implement base interface
>
> In the current approach, the Java doc of the SupportXXX() method in 
> the Source would be the single source of truth regarding how to 
> implement this feature. It lists the method that has to be implemented 
> to support this feature, regardless of how many classes / interfaces 
> are involved.
>
> When implementing the base interface, users do not need to implement a 
> method with default implementation. If they are curious what the 
> method is for, the java doc of that method simply points users to the 
> SupportXXX() method in the Source. It seems not adding work to the 
> users compared with decorative interfaces, but gives much better 
> discoverability.
>
>     - all of the methods from a single feature are grouped in a single
>     decorator interface, together with their dedicated java doc. It's also
>     easier to google search for help using the decorator name
>
>     - if an optional feature requires two methods to be implemented at
>     once,
>     decorator can guarantee that 
>
> These two points are not true when multiple components and classes are 
> involved collaboratively to provide a feature. In our case, we have 
> both SourceReader and SplitReader involved. And there might be other 
> interfaces on the JM side involved for some future features. So the 
> relevant methods can actually be scattered over the places. That said, 
> we may still use decorative interfaces for each component, if the 
> feature is optional, given there is a single source of truth for the 
> feature.
>
>     Here I would strongly lean towards making life easier for new users,
>     lowering the entry barrier, at the (imo) slight expense for the
>     power users.
>
> I actually think the current approach is simpler, more extensible and 
> more general for all the users. Can you articulate a bit more on which 
> part you think makes users harder to understand?
>
>
> There is another benefit of the decorative interfaces which is not 
> mentioned, but might be worth considering here. Usually the decorative 
> interfaces give slightly better backwards compatibility than the new 
> default method in the interfaces. That is when users are using a jar 
> that was compiled with an older version of Flink which does not have 
> the default method in the interfaces in question. A decorative 
> interface may still provide backwards compatibility in that case, 
> while default method impl cannot.
>
> I think in Flink we in general do not guarantee custom components 
> compiled with an older version can run with a newer version of Flink. 
> A recompile with a newer version would be required. That said, if we 
> do care about this, we can just change the "supportXXX()" method in 
> the Source interface to use decorative interfaces, and leave the other 
> parts unchanged.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
> On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pn...@apache.org> 
> wrote:
>
>     Hi,
>
>     Sorry for chipping in so late, but I was OoO for the last two weeks.
>     Regarding the interfaces, I would be actually against adding those
>     methods
>     to the base interfaces for the reasons mentioned above. Clogging
>     the base
>     interface for new users with tons of methods that they do not
>     need, do not
>     understand and do not know what to do with them. Moreover, such
>     decorative
>     interfaces are solving a problem if a feature requires two or more
>     methods
>     to be implemented at the same time. If we have all of the methods with
>     default implementation in the base interface, the API doesn't give
>     any clue
>     to the user which set of methods are required to be implemented at
>     the same
>     time.
>
>     > a) I feel the biggest drawback of decorative interfaces is which
>     interface
>     > they can decorate and which combinations of multiple decorative
>     interfaces
>     > are valid. (...)
>     > In the future, if there is a new feature added
>     > (e.g. sorted or pre-partitioned data aware), are we going to create
>     another
>     > interface of SplitReader such as SortedSplitReader or
>     PrePartitionedAware?
>     > Can they be combined? So I think the additional decorative
>     interface like
>     > withSplitsAlignment actually increases the understanding cost of
>     users
>     > because they have to know what decorative interfaces are there,
>     which
>     > interface they can decorate and which combinations of the decorative
>     > interfaces are valid and which are not. Ideally we want to avoid
>     that.
>
>     I'm not sure if I understand how embedding default methods in the base
>     interface is solving the problem: what can be combined or not? If
>     there are
>     two conflicting features, having decorative interfaces that can not be
>     mixed together actually makes much more sense to me rather than
>     having them
>     all in one base class. How would you allow users to implement only
>     one of
>     those two features?
>
>     To reiterate on the issue. Yes, there are drawbacks:
>     - how can a user discover what decorators are there?
>     - how can a user know where the decorator can be applied to?
>
>     However those are drawbacks for more power users, that can be
>     mitigated by
>     the documentation. For example listing all of the decorators with
>     detailed explanation both in the docs and in the java docs. More
>     experienced users will be able to deal with those issues easier,
>     as they
>     will already have some basic understanding of Flink. Also if user
>     has a
>     problem that he wants to solve, he will google search a potential
>     solution
>     to his problem anyway, and while doing that he is very likely to
>     discover
>     the decorator that he needs anyway in the docs.
>
>     The advantages are:
>     - clean and easy to implement base interface
>     - all of the methods from a single feature are grouped in a single
>     decorator interface, together with their dedicated java doc. It's also
>     easier to google search for help using the decorator name
>     - if an optional feature requires two methods to be implemented at
>     once,
>     decorator can guarantee that
>
>     Here I would strongly lean towards making life easier for new users,
>     lowering the entry barrier, at the (imo) slight expense for the
>     power users.
>
>     Best,
>     Piotrek
>
>
>     wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com> napisał(a):
>
>     > Thanks for the reply Sebastian and Dawid,
>     >
>     > I think Sebastion has a good summary. This is a really helpful
>     discussion.
>     >
>     > Thinking a bit more, I feel that it might still be better to add the
>     > supportsXXX() method in the Source rather than SourceReader.
>     >
>     > Generally speaking, what we are trying to do here is to let the
>     Flink
>     > framework know what the Source is capable of. In this FLIP, it
>     happens to
>     > be the capability that only involves SourceReader. But in the
>     future, it is
>     > possible that another functionality involves both the
>     SplitEnumerator and
>     > SourceReader. In that case, following the current approach, we
>     should put
>     > the "supportsXXX()" method in both SplitEnumerator and SourceReader.
>     > Because if we only put this in the SourceReader, then the JM
>     would have to
>     > create a SourceReader in order to know whether this feature is
>     supported,
>     > which is a little ugly. But if we put the "supportsXXX()" method
>     in the
>     > Source, we will break the "symmetric" design because this FLIP
>     chose a
>     > different way.
>     >
>     > This is also why I think supportsXXX() method seems a good thing
>     to have,
>     > because when there are a few interfaces / methods that are
>     expected to be
>     > implemented at the same time in order to deliver a feature, it
>     is always
>     > good to have a single source of truth to tell the framework what
>     to do, so
>     > the framework can do consistent things in different parts.
>     >
>     > @Sebastian Mattheis <se...@ververica.com>
>     >
>     > Regarding interface flavor b), i.e. AlignedSourceReader +
>     > AlignedSplitReader, what I feel awkward about is that we are
>     essentially
>     > expecting almost all the SourceReader implementations to extend
>     > SourceReaderBase, which effectively makes the SourceReader interface
>     > without the pausing support useless. So this indicates that public
>     > decorative interfaces (or sub-interfaces for the same purpose) only
>     > make sense if the original interface is also expected to be used.
>     > Otherwise, it seems makes more sense to add the method to the
>     original
>     > interface itself.
>     >
>     > Cheers,
>     >
>     > Jiangjie (Becket) Qin
>     >
>     >
>     >
>     >
>     > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz
>     <dw...@apache.org>
>     > wrote:
>     >
>     > > Thanks @Sebastian for the nice summary.
>     > >
>     > > I think most of your points aligned with the suggestions I
>     made to the
>     > > FLIP, while you were writing your reply (I believe we hit
>     enter nearly at
>     > > the same time ;) )
>     > >
>     > > Two points after we synced offline
>     > >
>     > > 1. I changed also the supportsWatermarksSplitAlignment to
>     > > supportsPausingSplits to express the general capability of
>     pausing.
>     > >
>     > > 2. As for if we should PausingSourceReader/PausingSplitReader
>     (option b)
>     > > or if we should just add the methods (option c), I suggest to
>     simply add
>     > > the two methods as I felt this is much preferred approach
>     Becket, which
>     > > others do not object. Unless there is an opposition let's go
>     with this
>     > > option c.
>     > >
>     > > Best,
>     > >
>     > > Dawid
>     > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
>     > >
>     > > Hi folks,
>     > >
>     > > Sorry for being a bit silent. Many thanks for all the input and
>     > > suggestions. As I'm a bit new, I needed some time to catch up and
>     > structure
>     > > (for myself) the discussion and I wanted to find a way to
>     structure the
>     > > conclusions. (Also because I had the feeling that some
>     concerns got lost
>     > in
>     > > the discussion.) This is my attempt and please correct me if
>     something is
>     > > wrong or misunderstood. I tried to collect and assemble the
>     opinions,
>     > > suggestions, and conclusions (to the best of my knowledge):
>     > >
>     > > # Top A: Should split alignment (pause/resume behavior) be a
>     general
>     > > capability?
>     > >
>     > > I personally don't see any reason no to have it a general
>     capability
>     > > because for the alignSplit method it is actually independent
>     of the
>     > > watermarks. If we agree here to have it a general capability,
>     we should
>     > > also agree on the right wording. Does "alignSplits(splitsToResume,
>     > > splitsToPause)" refer to what is then actually meant? (I see
>     it as okay.
>     > I
>     > > don't have any better idea whilst Arvid suggested
>     "pauseOrResumeSplits".)
>     > >
>     > > # Top B: Should it be possible do enable/disable split alignment?
>     > >
>     > > I would personally not disable the split alignment on the
>     source reader
>     > > side because if split alignment is used for some other use
>     case (see A)
>     > it
>     > > could have nasty side effects on other/future use cases.
>     Instead, I would
>     > > disable "watermark split alignment" where I think it should
>     disable the
>     > > watermark-dependent trigger for split alignment.
>     > >
>     > > # Top C: Should we add a supportsX method?
>     > >
>     > > I find it difficult to define the scope of a supportsX method
>     w.r.t. to
>     > > the following questions: a) Where is it used? and b) What is
>     the expected
>     > > output? To b), it's not straight-forward to provide a
>     meaningful output,
>     > > e.g., if SourceReader supports split alignment but SplitReader
>     not. This
>     > is
>     > > because with the current implementation, we can determine
>     whether split
>     > > alignment is fully supported only during runtime and specifically
>     > actually
>     > > only when calling alignSplits down the call hierarchy up to
>     the actual
>     > > SplitReaders.
>     > >
>     > > Therefore, I would suggest to either raise an error or warning
>     if the
>     > > alignment is called but not supported at some point. I know we
>     should
>     > > carefully think about when this could be the case because we
>     don't want
>     > to
>     > > flood anybody with such warnings. However, warnings could be
>     an indicator
>     > > for the user that for watermark split alignment use case split
>     reading is
>     > > imbalanced with the conclusion to either disable the trigger for
>     > watermark
>     > > split alignment (see Top B) or to use/implement a source and
>     reader that
>     > > fully supports split alignment.
>     > >
>     > > # Top D: How to design interfaces?
>     > >
>     > > Thanks for structuring the discussion with the the various
>     possibilities
>     > > (a-d). From the discussion and emails, I would like to
>     summarize the
>     > > following requirements:
>     > > - Interfaces should be consistent ("symmetric"), i.e., similar
>     semantics
>     > > should have similar interfaces with similar usage.
>     > > - Make explicit which implementations implement interfaces/support
>     > > behavior.
>     > > - Make clear what are default implementations and how to implement
>     > > interfaces with desired behavior.
>     > >
>     > > This is a simplified view of the relations between relevant
>     classes of
>     > the
>     > > PoC implementation:
>     > >
>     > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- ..
>     <|--
>     > > MySourceReader
>     > >
>     > > MySourceReader <>-- SplitFetcherManager (Internal) <>--
>     SplitFetcher
>     > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
>     > >
>     > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
>     > >
>     > > Note that SourceReaderBase and SplitFetcherManager implement
>     most of the
>     > > "logic" for split alignment just because we wanted to
>     implement split
>     > > alignment and wanted it to be available as kind of a default. As a
>     > > consequence, we have a "default implementation" for
>     SourceReader that
>     > > implements the actual logic for split alignment. For that
>     reason, I find
>     > it
>     > > very confusing to have a NOOP default implementation in the
>     interface for
>     > > the SourceReader. As a consequence, interface strategy c) is
>     difficult
>     > > because this would require NOOP default implementations in the
>     public
>     > > interfaces of SourceReader and SplitReader. This is the same
>     for strategy
>     > > d) because it would require NOOP default implementation in the
>     > > SourceReader. Further, as Dawid described method signatures of
>     alignSplit
>     > > for SourceReader and SplitReader differ and it would be extremely
>     > difficult
>     > > to make the signatures the same (with even potential
>     performance impact
>     > > because of additional loop-ups of split ids). Therefore, having a
>     > symmetric
>     > > decorative interface as of strategy a) is actually not
>     possible and
>     > having
>     > > two decorative interfaces with different method signatures is
>     confusing.
>     > My
>     > > conclusion is that we are best with strategy b) which means to
>     have
>     > > specializing sub-interfaces that inherit from the parent
>     interface:
>     > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
>     > AlignedSplitReader
>     > > With this option, I'm not 100% sure what the implications are
>     and if this
>     > > could get nasty. I would suggest that Dawid and I just try to
>     implement
>     > and
>     > > see if we like it. :)
>     > >
>     > > # Summary
>     > >
>     > > In conclusion, please let me know your perspectives. Please
>     correct me,
>     > if
>     > > something is wrong or if I misunderstood something. My
>     perspective would
>     > be:
>     > >
>     > > Top A: Yes
>     > > Top B: Yes (but disable watermark trigger for split alignment)
>     > > Top C: No
>     > > Top D: b)
>     > >
>     > > Best,
>     > > Sebastian
>     > >
>     > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz
>     <dwysakowicz@apache.org
>     > >
>     > > wrote:
>     > >
>     > >> @Arvid:
>     > >>
>     > >> While I also like Becket's capability approach, I fear that
>     it doesn't
>     > work
>     > >> for this particular use case: Sources can always be aligned
>     cross-task
>     > and
>     > >> this is just about intra-task alignment. So it's plausible to put
>     > sources
>     > >> into an alignment group even though they do not use any of
>     the presented
>     > >> API of FLIP-217. They should just issue a warning, if they handle
>     > multiple
>     > >> splits (see motivation section).
>     > >>
>     > >> Yes, but the "supportXXX" method would be for telling if it
>     supports
>     > that
>     > >> intra-task alignment. Cross-task alignment would always be
>     supported.
>     > >>
>     > >> I updated interfaces to what I believe to be closest to a
>     consensus
>     > >> between all participants. Do you mind taking a look?
>     > >>
>     > >> @Sebastian Do you mind addressing the nits?
>     > >>
>     > >> Best,
>     > >>
>     > >> Dawid
>     > >>
>     > >> On 25/04/2022 13:39, Arvid Heise wrote:
>     > >>
>     > >> Thanks for pushing this effort.
>     > >>
>     > >> I'd actually be in favor of 1b). I fully agree that decorator
>     interfaces
>     > >> should be avoided but I'm also not a big fan of overloading
>     the base
>     > >> interfaces (they are hard to implement as is). The usual
>     feedback to
>     > >> Source-related interfaces are always that they are
>     overwhelming and too
>     > >> hard to implement. However, I'd also not oppose 1c) as scattered
>     > interfaces
>     > >> also have drawbacks. I'd just dislike 1a) and 1d).
>     > >> While I also like Becket's capability approach, I fear that
>     it doesn't
>     > work
>     > >> for this particular use case: Sources can always be aligned
>     cross-task
>     > and
>     > >> this is just about intra-task alignment. So it's plausible to put
>     > sources
>     > >> into an alignment group even though they do not use any of
>     the presented
>     > >> API of FLIP-217. They should just issue a warning, if they handle
>     > multiple
>     > >> splits (see motivation section).
>     > >>
>     > >> I think renaming alignSplits to facilitate future use cases
>     makes sense
>     > but
>     > >> then all interfaces (if 1c) is chosen) should be adjusted
>     accordingly.
>     > >> AlignedSourceReader could be PausingSourceReader and I'd go for
>     > >> pauseOrResumeSplits (Becket's proposal afaik). We could also
>     split it
>     > into
>     > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may
>     allow Sources
>     > to
>     > >> just use 1 instead of 2 library calls (as written in the
>     Javadoc), both
>     > >> Kafka and Pulsar can't use it and I'm not sure if there is a
>     system that
>     > >> can.
>     > >>
>     > >> Some nit for the FLIP:
>     > >> - Please replace "stop" with "pause".
>     > >> - Not sure if it's worth it in the capability section:
>     Sources that
>     > adopt
>     > >> this interface cannot be used in earlier versions. So it
>     feels like we
>     > are
>     > >> only forward compatible (old sources can be used after the
>     change); but
>     > I
>     > >> guess this holds for any API addition.
>     > >> - You might want to add what happens when all splits are paused.
>     > >> - You may want to describe how the 3 flavors of SourceReaderBase
>     > interact
>     > >> with the interface.
>     > >> - I'm not sure if it makes sense to include Kafka and Pulsar
>     in the
>     > FLIP.
>     > >> For me, this is rather immediate follow-up work. (could be in
>     the same
>     > >> umbrella ticket)
>     > >>
>     > >> Best,
>     > >>
>     > >> Arvid
>     > >>
>     > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
>     > dwysakowicz@apache.org> <dw...@apache.org>
>     > >> wrote:
>     > >>
>     > >>
>     > >> a) "MySourceReader implements SourceReader,
>     WithSplitsAlignment", along
>     > >> with "MySplitReader implements SplitReader,
>     WithSplitsAlignment", or
>     > >> b) "MySourceReader implements AlignedSourceReader" and
>     "MySplitReader
>     > >> implements AlignedSplitReader", or
>     > >> c) "MySourceReader implements SourceReader" and "MySplitReader
>     > implements
>     > >> SplitReader".
>     > >>
>     > >> I think the latest proposal according to Dawid would be:
>     > >> d) "MySourceReader implements SourceReader" and "MySplitReader
>     > implements
>     > >> AlignedSplitReader".
>     > >> I am fine with this API, although personally speaking I think
>     it is
>     > simpler
>     > >> to just add a new method to the split reader with default impl.
>     > >>
>     > >>
>     > >> I think that is a good idea to have it aligned as much as
>     possible. I'd
>     > be
>     > >> +1 for your option c). We can merge AlignedSplitReader with
>     > SplitReader. We
>     > >> will update the FLIP shortly.
>     > >>
>     > >> Best,
>     > >>
>     > >> Dawid
>     > >>
>     > >> On 25/04/2022 12:43, Becket Qin wrote:
>     > >>
>     > >> Thanks for the comment, Jark.
>     > >>
>     > >> 3. Interface/Method Name.
>     > >>
>     > >> Can the interface be used to align other things in the
>     future? For
>     > example,
>     > >> align read speed, I have
>     > >> seen users requesting global rate limits. This feature may
>     also need an
>     > >> interface like this.
>     > >> If we don't plan to extend this interface to support align other
>     > things, I
>     > >> suggest explicitly declaring
>     > >> the purpose of the methods, such as
>     `alignWatermarksForSplits` instead
>     > of
>     > >> `alignSplits`.
>     > >>
>     > >> This is a good point. Naming wise, it would usually be more
>     extensible
>     > to
>     > >> just describe what the method actually does, instead of
>     assuming the
>     > >> purpose of doing this. For example, in this case,
>     pauseOrResumeSplits()
>     > >> would be more extensible because this can be used for any
>     kind of flow
>     > >> control, be it watermark alignment or simple rate limiting.
>     > >>
>     > >> 4. Interface or Method.
>     > >>
>     > >> I don't have a strong opinion on this. I think they have
>     their own
>     > >> advantages.
>     > >> In Flink SQL, we heavily use Interfaces for extending abilities
>     > >> (SupportsXxxx) for TableSource/TableSink,
>     > >> and I prefer Interfaces rather than methods in this case.
>     When you have
>     > a
>     > >> bunch of abilities and each ability
>     > >> has more than one method, Interfaces can help to organize
>     them and make
>     > >> users clear which methods
>     > >> need to implement when you want to have an ability.
>     > >>
>     > >> I am OK with decorative interfaces if this is a general
>     design pattern
>     > in
>     > >> the other components in Flink. But it looks like the current API
>     > proposal
>     > >> is not symmetric.
>     > >>
>     > >> The current proposal is essentially "MySourceReader implements
>     > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
>     implements
>     > >> AlignedSplitsReader".
>     > >>
>     > >> Should we make the API symmetric? I'd consider any one of the
>     following
>     > as
>     > >> symmetric.
>     > >>
>     > >> a) "MySourceReader implements SourceReader,
>     WithSplitsAlignment", along
>     > >> with "MySplitReader implements SplitReader,
>     WithSplitsAlignment", or
>     > >> b) "MySourceReader implements AlignedSourceReader" and
>     "MySplitReader
>     > >> implements AlignedSplitReader", or
>     > >> c) "MySourceReader implements SourceReader" and "MySplitReader
>     > implements
>     > >> SplitReader".
>     > >>
>     > >> I think the latest proposal according to Dawid would be:
>     > >> d) "MySourceReader implements SourceReader" and "MySplitReader
>     > implements
>     > >> AlignedSplitReader".
>     > >> I am fine with this API, although personally speaking I think
>     it is
>     > simpler
>     > >> to just add a new method to the split reader with default impl.
>     > >>
>     > >> @Dawid Wysakowicz <dw...@apache.org>
>     <dw...@apache.org> <
>     > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the
>     reply.
>     > >>
>     > >> Having said that, as I don't have a preference and I agree
>     most of the
>     > >>
>     > >> sources will support the alignment I am fine following your
>     suggestion
>     > to
>     > >> have the SourceReader extending from
>     WithWatermarksSplitsAlignment, but
>     > >> would put the "supportsXXX" there, not in the Source to keep
>     the two
>     > >> methods together.
>     > >>
>     > >> One benefit of having the "supportsXXX" in Source is that
>     this allows
>     > some
>     > >> compile time check. For example, if a user enabled watermark
>     alignment
>     > >> while it is not supported by the Source, an exception can be
>     thrown at
>     > >> compile time. It seems in general useful. That said, I agree
>     that API
>     > >> cleanliness wise it is better to put the two methods together.
>     > >>
>     > >> Thanks,
>     > >>
>     > >> Jiangjie (Becket) Qin
>     > >>
>     > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
>     > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
>     > >>
>     > >>
>     > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
>     > >> I don't have other concerns then.
>     > >>
>     > >> Best,
>     > >> Jark
>     > >>
>     > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz
>     <dw...@apache.org>
>     > <dw...@apache.org> <dw...@apache.org>
>     <dw...@apache.org>
>     > >> wrote:
>     > >>
>     > >>
>     > >> @Jark:
>     > >>
>     > >> 1. Will the framework always align with watermarks when the
>     source
>     > >> implements the interface?
>     > >> I'm afraid not every case needs watermark alignment even if Kafka
>     > >> implements the interface,
>     > >> and this will affect the throughput somehow. I agree with Becket
>     > >> we may need a
>     > >> `supportSplitsAlignment()` method for users to configure the
>     source to
>     > >> enable/disable the alignment.
>     > >>
>     > >> 2. How does the framework calculate maxDesiredWatermark?
>     > >> I think the algorithm of maxDesiredWatermark will greatly affect
>     > >>
>     > >> throughput
>     > >>
>     > >> if the reader is constantly
>     > >>  switching between pause and resume. Can users configure the
>     alignment
>     > >> offset?
>     > >>
>     > >>
>     > >> This is covered in the previous FLIP[1] which has been already
>     > >>
>     > >> implemented
>     > >>
>     > >> in 1.15. In short, it must be enabled with the watermark
>     strategy which
>     > >> also configures drift and update interval.
>     > >>
>     > >> If we don't plan to extend this interface to support align
>     other things,
>     > >>
>     > >> I
>     > >>
>     > >> suggest explicitly declaring
>     > >> the purpose of the methods, such as
>     `alignWatermarksForSplits` instead
>     > of
>     > >> `alignSplits`.
>     > >>
>     > >>
>     > >> Sure let's rename it.
>     > >>
>     > >> @Becket:
>     > >>
>     > >> I understand your point. On the other hand putting all
>     methods, even
>     > with
>     > >> "supportsXXX" methods for enabling certain features, makes
>     the entry
>     > >> threshold for writing a new source higher. Instead of
>     focusing on the
>     > >>
>     > >> basic
>     > >>
>     > >> and required properties of the Source, the person
>     implementing a source
>     > >> must bother with and need to figure out what all of the extra
>     features
>     > >>
>     > >> are
>     > >>
>     > >> about and how to deal with them. It makes it also harder to
>     organize
>     > >> methods in coupled groups as Jark said.
>     > >>
>     > >> Having said that, as I don't have a preference and I agree
>     most of the
>     > >> sources will support the alignment I am fine following your
>     suggestion
>     > to
>     > >> have the SourceReader extending from
>     WithWatermarksSplitsAlignment, but
>     > >> would put the "supportsXXX" there, not in the Source to keep
>     the two
>     > >> methods together.
>     > >>
>     > >> Lastly, I agree it is really unfortunate the "alignSplits"
>     methods
>     > differ
>     > >> slightly for SourceReader and SpitReader. The reason for that is
>     > >> SourceReaderBase deals only with SplitIds, whereas
>     SplitReader needs the
>     > >> actual splits to pause them. I found the discrepancy
>     acceptable for the
>     > >> sake of simplifying changes significantly, especially as they
>     would
>     > >>
>     > >> highly
>     > >>
>     > >> likely impact performance as we would have to perform additional
>     > lookups.
>     > >> Moreover the SplitReader is a secondary interface.
>     > >>
>     > >> Best,
>     > >>
>     > >> Dawid
>     > >>
>     > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
>     > >>
>     > >> On 24/04/2022 17:15, Jark Wu wrote:
>     > >>
>     > >> Thanks for the effort, Dawid and Sebastian!
>     > >>
>     > >> I just have some minor questions (maybe I missed something).
>     > >>
>     > >> 1. Will the framework always align with watermarks when the
>     source
>     > >> implements the interface?
>     > >> I'm afraid not every case needs watermark alignment even if Kafka
>     > >> implements the interface,
>     > >> and this will affect the throughput somehow. I agree with Becket
>     > >> we may need a
>     > >> `supportSplitsAlignment()` method for users to configure the
>     source to
>     > >> enable/disable the alignment.
>     > >>
>     > >> 2. How does the framework calculate maxDesiredWatermark?
>     > >> I think the algorithm of maxDesiredWatermark will greatly affect
>     > >>
>     > >> throughput
>     > >>
>     > >> if the reader is constantly
>     > >>  switching between pause and resume. Can users configure the
>     alignment
>     > >> offset?
>     > >>
>     > >> 3. Interface/Method Name.
>     > >> Can the interface be used to align other things in the
>     future? For
>     > >>
>     > >> example,
>     > >>
>     > >> align read speed, I have
>     > >> seen users requesting global rate limits. This feature may
>     also need an
>     > >> interface like this.
>     > >> If we don't plan to extend this interface to support align
>     other things,
>     > >>
>     > >> I
>     > >>
>     > >> suggest explicitly declaring
>     > >> the purpose of the methods, such as
>     `alignWatermarksForSplits` instead
>     > of
>     > >> `alignSplits`.
>     > >>
>     > >> 4. Interface or Method.
>     > >> I don't have a strong opinion on this. I think they have
>     their own
>     > >> advantages.
>     > >> In Flink SQL, we heavily use Interfaces for extending abilities
>     > >> (SupportsXxxx) for TableSource/TableSink,
>     > >> and I prefer Interfaces rather than methods in this case.
>     When you have
>     > a
>     > >> bunch of abilities and each ability
>     > >> has more than one method, Interfaces can help to organize
>     them and make
>     > >> users clear which methods
>     > >> need to implement when you want to have an ability.
>     > >>
>     > >>
>     > >> Best,
>     > >> Jark
>     > >>
>     > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <be...@gmail.com> <
>     > becket.qin@gmail.com> <be...@gmail.com>
>     <be...@gmail.com> <
>     > >> becket.qin@gmail.com> wrote:
>     > >>
>     > >> Hi Dawid,
>     > >>
>     > >> Thanks for the explanation. Apologies that I somehow misread
>     a bunch of
>     > >> "align" and thought they were "assign".
>     > >>
>     > >> Regarding 1, by default implementation, I was thinking of the
>     default
>     > >>
>     > >> no-op
>     > >>
>     > >> implementation. I am a little worried about the proliferation of
>     > >>
>     > >> decorative
>     > >>
>     > >> interfaces. I think the most important thing about interfaces
>     is that
>     > >>
>     > >> they
>     > >>
>     > >> are easy to understand. In this case, I prefer adding new
>     method to the
>     > >> existing interface for the following reasons:
>     > >>
>     > >> a) I feel the biggest drawback of decorative interfaces is which
>     > >>
>     > >> interface
>     > >>
>     > >> they can decorate and which combinations of multiple decorative
>     > >>
>     > >> interfaces
>     > >>
>     > >> are valid. In the current FLIP, the withSplitsAlignment
>     interface is
>     > only
>     > >> applicable to the SourceReader which means it can't decorate
>     any other
>     > >> interface. From an interface design perspective, a natural
>     question is
>     > >>
>     > >> why
>     > >>
>     > >> not let "AlignedSplitReader" extend "withSplitsAlignment"?
>     And it is
>     > also
>     > >> natural to assume that a split reader implementing both
>     SplitReader and
>     > >> WithSplitAlignment would work, because a source reader
>     implementing
>     > >> SourceReader and withSplitsAlignment works. So why isn't there an
>     > >>
>     > >> interface
>     > >>
>     > >> of AlignedSourceReader? In the future, if there is a new
>     feature added
>     > >> (e.g. sorted or pre-partitioned data aware), are we going to
>     create
>     > >>
>     > >> another
>     > >>
>     > >> interface of SplitReader such as SortedSplitReader or
>     > >>
>     > >> PrePartitionedAware?
>     > >>
>     > >> Can they be combined? So I think the additional decorative
>     interface
>     > like
>     > >> withSplitsAlignment actually increases the understanding cost
>     of users
>     > >> because they have to know what decorative interfaces are
>     there, which
>     > >> interface they can decorate and which combinations of the
>     decorative
>     > >> interfaces are valid and which are not. Ideally we want to
>     avoid that.
>     > To
>     > >> be clear, I am not opposing having an interface of
>     withSplitsAlignment,
>     > >>
>     > >> it
>     > >>
>     > >> is completely OK to have it as an internal interface and let
>     > SourceReader
>     > >> and SplitReader both extend it.
>     > >>
>     > >> b) Adding a new method to the SourceReader with a default
>     implementation
>     > >>
>     > >> of
>     > >>
>     > >> no-op would help avoid logic branching in the source logic,
>     especially
>     > >> given that we agree that the vast majority of the SourceReader
>     > >> implementations, if not all, would just extend from the
>     > SourceReaderBase.
>     > >> That means adding a new method to the interface would
>     effectively give
>     > >>
>     > >> the
>     > >>
>     > >> same user experience, but simpler.
>     > >>
>     > >> c) A related design principle that may be worth discussing is
>     how do we
>     > >>
>     > >> let
>     > >>
>     > >> the Source implementations tell Flink what capability is
>     supported and
>     > >>
>     > >> what
>     > >>
>     > >> is not. Personally speaking I feel the most intuitive place
>     to me is in
>     > >>
>     > >> the
>     > >>
>     > >> Source itself, because that is the entrance of the entire Source
>     > >>
>     > >> connector
>     > >>
>     > >> logic.
>     > >>
>     > >> Based on the above thoughts, I am wondering if the following
>     interface
>     > >> would be easier to understand by the users.
>     > >>
>     > >> - Change "withSplitsAlignment" to internal interface, let both
>     > >>
>     > >> SourceReader
>     > >>
>     > >> and SplitReader extend from it, with a default no-op
>     implementation.
>     > >> - Add a new method "boolean supportSplitsAlignment()" to the
>     Source
>     > >> interface, with a default implementation returning false.
>     Sources that
>     > >>
>     > >> have
>     > >>
>     > >> implemented the alignment logic can change this to return
>     true, and
>     > >> override the alignSplits() methods in the SourceReader /
>     SplitReader if
>     > >> needed.
>     > >> - In the future, if a new optional feature is going to be
>     added to the
>     > >> Source, and that feature requires the awareness from Flink,
>     we can add
>     > >>
>     > >> more
>     > >>
>     > >> such methods to the Source.
>     > >>
>     > >> What do you think?
>     > >>
>     > >> Thanks,
>     > >>
>     > >> Jiangjie (Becket) Qin
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
>     > dwysakowicz@apache.org> <dw...@apache.org>
>     <dw...@apache.org>
>     > <dw...@apache.org>
>     > >> <dw...@apache.org> <dw...@apache.org> <
>     > dwysakowicz@apache.org> <dw...@apache.org>
>     > >>
>     > >> wrote:
>     > >>
>     > >>
>     > >> @Konstantin:
>     > >>
>     > >> As part of this FLIP, the `AlignedSplitReader` interface (aka
>     the stop &
>     > >> resume behavior) will be implemented for Kafka and Pulsar
>     only, correct?
>     > >>
>     > >> Correct, as far as I know though, those are the only sources
>     which
>     > >>
>     > >> consume
>     > >>
>     > >> concurrently from multiple splits and thus alignment applies.
>     > >>
>     > >> @Thomas:
>     > >>
>     > >> I wonder if "supporting" split alignment in SourceReaderBase
>     and then
>     > >>
>     > >> doing
>     > >>
>     > >> nothing if the split reader does not implement
>     AlignedSplitReader could
>     > >>
>     > >> be
>     > >>
>     > >> misleading? Perhaps WithSplitsAlignment can instead be added
>     to the
>     > >> specific source reader (i.e. KafkaSourceReader) to make it
>     explicit that
>     > >> the source actually supports it.
>     > >>
>     > >> I understand your concern. Hmm, I think we could actually do
>     that. Given
>     > >> the actual implementation of the SourceReaderBase#alignSplits
>     is rather
>     > >> short (just a forward to the corresponding method of
>     SplitFetcher), we
>     > >> could reimplement it in the actual source implementations.
>     This solution
>     > >> has the downside though. Authors of new sources would have to
>     do two
>     > >> things: extend from AlignedSplitReader and implement
>     > >>
>     > >> WithSplitsAssignment,
>     > >>
>     > >> instead of just extending AlignedSplitReader. I would be fine
>     with such
>     > a
>     > >> tradeoff though. What others think?
>     > >>
>     > >> @Steven:
>     > >>
>     > >> For this part from the motivation section, is it accurate?
>     Let's assume
>     > >>
>     > >> one
>     > >>
>     > >> source task consumes from 3 partitions and one of the
>     partition is
>     > >> significantly slower. In this situation, watermark for this
>     source task
>     > >> won't hold back as it is reading recent data from other two Kafka
>     > >> partitions. As a result, it won't hold back the overall
>     watermark. I
>     > >> thought the problem is that we may have late data for this slow
>     > >>
>     > >> partition.
>     > >>
>     > >> It will hold back the watermark. Watermark of an operator is
>     the minimum
>     > >> of watermarks of all splits[1]
>     > >>
>     > >> I have another question about the restart. Say split alignment is
>     > >> triggered. checkpoint is completed. job failed and restored
>     from the
>     > last
>     > >> checkpoint. because alignment decision is not checkpointed,
>     initially
>     > >> alignment won't be enforced until we get a cycle of watermark
>     > aggregation
>     > >> and propagation, right? Not saying this corner is a problem.
>     Just want
>     > to
>     > >> understand it more.
>     > >>
>     > >> Your understanding is correct.
>     > >>
>     > >> @Becket:
>     > >>
>     > >> 1. I think watermark alignment is sort of a general use case,
>     so should
>     > >>
>     > >> we
>     > >>
>     > >> just add the related methods to SourceReader directly instead of
>     > >> introducing the new interface of WithSplitAssignment? We can
>     provide
>     > >> default implementations, so backwards compatibility won't be
>     an issue.
>     > >>
>     > >> I don't think we can provide a default implementation. How
>     would we do
>     > >> that? Would it be just a no-op? Is it better than having an
>     opt-in
>     > >> interface? The default implementation would have to be added
>     exclusively
>     > >>
>     > >> in
>     > >>
>     > >> a *Public* SourceReader interface. By the way notice
>     SourceReaderBase
>     > >> does extend from WithSplitsAlignment, so effectively all
>     implementations
>     > >>
>     > >> do
>     > >>
>     > >> handle the alignment case. To be honest I think it is
>     impossible to
>     > >> implement the SourceReader interface directly by end users.
>     > >>
>     > >> 2. As you mentioned, the SplitReader interface probably also
>     needs some
>     > >> change to support throttling at the split granularity. Can
>     you add that
>     > >> interface change into the public interface section as well?
>     > >>
>     > >> It has been added from the beginning. See *AlignedSplitReader.*
>     > >>
>     > >> 3. Nit, can we avoid using the method name assignSplits here,
>     given that
>     > >>
>     > >> it
>     > >>
>     > >> is not actually changing the split assignments? It seems
>     something like
>     > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>     accurate.
>     > >>
>     > >> The method's called *alignSplits*, not assign. Do you still
>     prefer a
>     > >> different name for that? Personally, I am open for
>     suggestions here.
>     > >>
>     > >> Best,
>     > >>
>     > >> Dawid
>     > >>
>     > >> [1]
>     > >>
>     > >>
>     > >>
>     > >>
>     >
>     https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
>     > >>
>     > >> On 22/04/2022 05:59, Becket Qin wrote:
>     > >>
>     > >> Thanks for driving the effort, Sebastion. I think the
>     motivation makes a
>     > >> lot of sense. Just a few suggestions / questions.
>     > >>
>     > >> 1. I think watermark alignment is sort of a general use case,
>     so should
>     > >>
>     > >> we
>     > >>
>     > >> just add the related methods to SourceReader directly instead of
>     > >> introducing the new interface of WithSplitAssignment? We can
>     provide
>     > >> default implementations, so backwards compatibility won't be
>     an issue.
>     > >>
>     > >> 2. As you mentioned, the SplitReader interface probably also
>     needs some
>     > >> change to support throttling at the split granularity. Can
>     you add that
>     > >> interface change into the public interface section as well?
>     > >>
>     > >> 3. Nit, can we avoid using the method name assignSplits here,
>     given that
>     > >>
>     > >> it
>     > >>
>     > >> is not actually changing the split assignments? It seems
>     something like
>     > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more
>     accurate.
>     > >>
>     > >> Thanks,
>     > >>
>     > >> Jiangjie (Becket) Qin
>     > >>
>     > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu
>     <st...@gmail.com> <
>     > stevenz3wu@gmail.com> <st...@gmail.com>
>     <st...@gmail.com> <
>     > >> stevenz3wu@gmail.com> <
>     > >> stevenz3wu@gmail.com> wrote:
>     > >>
>     > >> However, a single source operator may read data from multiple
>     > >>
>     > >> splits/partitions, e.g., multiple Kafka partitions, such that
>     even with
>     > >> watermark alignment the source operator may need to buffer
>     excessive
>     > >>
>     > >> amount
>     > >>
>     > >> of data if one split emits data faster than another.
>     > >>
>     > >> For this part from the motivation section, is it accurate?
>     Let's assume
>     > >>
>     > >> one
>     > >>
>     > >> source task consumes from 3 partitions and one of the
>     partition is
>     > >> significantly slower. In this situation, watermark for this
>     source task
>     > >> won't hold back as it is reading recent data from other two Kafka
>     > >> partitions. As a result, it won't hold back the overall
>     watermark. I
>     > >> thought the problem is that we may have late data for this slow
>     > >>
>     > >> partition.
>     > >>
>     > >> I have another question about the restart. Say split alignment is
>     > >> triggered. checkpoint is completed. job failed and restored
>     from the
>     > last
>     > >> checkpoint. because alignment decision is not checkpointed,
>     initially
>     > >> alignment won't be enforced until we get a cycle of watermark
>     > aggregation
>     > >> and propagation, right? Not saying this corner is a problem.
>     Just want
>     > to
>     > >> understand it more.
>     > >>
>     > >>
>     > >>
>     > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org> <
>     > thw@apache.org> <th...@apache.org> <th...@apache.org> <
>     > >> thw@apache.org> <
>     > >> thw@apache.org> wrote:
>     > >>
>     > >> Thanks for working on this!
>     > >>
>     > >> I wonder if "supporting" split alignment in SourceReaderBase
>     and then
>     > >>
>     > >> doing
>     > >>
>     > >> nothing if the split reader does not implement
>     AlignedSplitReader could
>     > >>
>     > >> be
>     > >>
>     > >> misleading? Perhaps WithSplitsAlignment can instead be added
>     to the
>     > >> specific source reader (i.e. KafkaSourceReader) to make it
>     explicit that
>     > >> the source actually supports it.
>     > >>
>     > >> Thanks,
>     > >> Thomas
>     > >>
>     > >>
>     > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf
>     <kn...@apache.org> <
>     > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
>     > >> knaufk@apache.org> <
>     > >> knaufk@apache.org>
>     > >>
>     > >> wrote:
>     > >>
>     > >>
>     > >> Hi Sebastian, Hi Dawid,
>     > >>
>     > >> As part of this FLIP, the `AlignedSplitReader` interface (aka
>     the stop
>     > >>
>     > >> &
>     > >>
>     > >> resume behavior) will be implemented for Kafka and Pulsar only,
>     > >>
>     > >> correct?
>     > >>
>     > >> +1 in general. I believe it is valuable to complete the watermark
>     > >>
>     > >> aligned
>     > >>
>     > >> story with this FLIP.
>     > >>
>     > >> Cheers,
>     > >>
>     > >> Konstantin
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
>     > dwysakowicz@apache.org> <dw...@apache.org>
>     > >>
>     > >> wrote:
>     > >>
>     > >>
>     > >> To be explicit, having worked on it, I support it ;) I think
>     we can
>     > >> start a vote thread soonish, as there are no concerns so far.
>     > >>
>     > >> Best,
>     > >>
>     > >> Dawid
>     > >>
>     > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
>     > >>
>     > >> Dear Flink developers,
>     > >>
>     > >> I would like to open a discussion on FLIP 217 [1] for an
>     extension
>     > >>
>     > >> of
>     > >>
>     > >> Watermark Alignment to perform alignment also in SplitReaders. To
>     > >>
>     > >> do
>     > >>
>     > >> so,
>     > >>
>     > >> SplitReaders must be able to suspend and resume reading from
>     split
>     > >>
>     > >> sources
>     > >>
>     > >> where the SourceOperator coordinates and controlls suspend and
>     > >>
>     > >> resume.
>     > >>
>     > >> To
>     > >>
>     > >> gather information about current watermarks of the
>     SplitReaders, we
>     > >>
>     > >> extend
>     > >>
>     > >> the internal WatermarkOutputMulitplexer and report watermarks to
>     > >>
>     > >> the
>     > >>
>     > >> SourceOperator.
>     > >>
>     > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
>     > >>
>     > >> revised
>     > >>
>     > >> and
>     > >>
>     > >> reworked by Dawid Wysakowicz (He did most of the work.) and
>     me. The
>     > >>
>     > >> changes
>     > >>
>     > >> are backwards compatible in a way that if affected components do
>     > >>
>     > >> not
>     > >>
>     > >> support split alignment the behavior is as before.
>     > >>
>     > >> Best,
>     > >> Sebastian
>     > >>
>     > >> [1]
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     > >>
>     >
>     https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
>     > >>
>     > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
>     > >>
>     > >> --
>     > >>
>     > >> Konstantin
>     Knaufhttps://twitter.com/snntrablehttps://github.com/knaufk
>     <http://twitter.com/snntrablehttps://github.com/knaufk>
>     > >>
>     > >>
>     >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Piotr Nowojski <pn...@apache.org>.
Hi Becket,

Is this feature really non-optional? If so, adding those methods with
default implementation just defeats that purpose. On the other hand, if a
Source doesn't support pausing splits, the system would work. Arguably
watermark alignment would not, and maybe that would deserve logging some
warning if the user configured the alignment but is using sources that do
not support it. On the other hand. Also adding non default methods would
break the Public interface, so we need to make this feature optional.

> When implementing the base interface, users do not need to implement a
method with default implementation.

So why should we spam users with such methods that they do not need?

> Can you articulate a bit more on which part you think makes users harder
to understand?

Imagine you are an inexperienced user, first time dealing with a system.
You want to implement a source interface. You look at it, and in two worlds
you see two different things:

1. (default methods) You see an interface with 50 methods, with various
strange to you "supportX", "supportY", and tens of other methods. You don't
understand any of this, because you are struggling with even the
basic Flink concepts. And somewhere between those there are only 5 methods
that you actually need to implement, but you are not sure which one are
those. Remember, that you have mostly no idea what you are doing.
2. (decorative interfaces) You see an interface with the 5 most important
methods. You still struggle to implement those, but at least you don't have
to wonder about tens of optional features at the beginning.

> There is another benefit of the decorative interfaces which is not
> mentioned, but might be worth considering here. Usually the decorative
> interfaces give slightly better backwards compatibility than the new
> default method in the interfaces. (...)
> I think in Flink we in general do not guarantee custom components compiled
> with an older version can run with a newer version of Flink

Hmm, good point. Don't we actually want this kind of compatibility? There
were discussions about that [1]:

> Flink jobs or ecosystems like external connectors/formats built with
older Flink version X like 1.14 can be running on a newer Flink version Y
like 1.15 with no issue.

Best,
Piotrek

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-207%3A+Flink+backward+and+forward+compatibility

wt., 3 maj 2022 o 14:31 Becket Qin <be...@gmail.com> napisał(a):

> Hi Piotr,
>
> Thanks for the comment.
>
> Just to clarify, I am not against the decorative interfaces, but I do think
> we should use them with caution. The main argument for adding the methods
> to the SourceReader is that these methods are effectively NON-OPTIONAL to
> SourceReader impl, i.e. starting from this FLIP, all the SourceReaders impl
> are expected to support this method, although some old implementations may
> not have implemented this feature. I think we should distinguish the new
> features from the optional features. While the public decorative interface
> is a solution to the optional features. We should not use it for the
> features that are non-optional.
>
> That said, this feature is optional for SplitReaders. Arguably we can have
> a decorative interface for that, but for simplicity and symmetry of the
> interface, personally I prefer just adding a new method.
>
> Regarding the advantages you mentioned about the decorative interfaces,
> they would make sense if:
> 1. The feature is optional.
> 2. There is only one decorative interface involved for a feature. Otherwise
> the argument that all the methods are grouped together will not stand.
>
> Compared with that, I think the current solution works fine in all cases,
> i.e. "having supportXXX() method in Source, and default methods /
> decorative interfaces in base interfaces.".
>
> The advantages are:
> > - clean and easy to implement base interface
>
> In the current approach, the Java doc of the SupportXXX() method in the
> Source would be the single source of truth regarding how to implement this
> feature. It lists the method that has to be implemented to support this
> feature, regardless of how many classes / interfaces are involved.
>
> When implementing the base interface, users do not need to implement a
> method with default implementation. If they are curious what the method is
> for, the java doc of that method simply points users to the SupportXXX()
> method in the Source. It seems not adding work to the users compared with
> decorative interfaces, but gives much better discoverability.
>
> - all of the methods from a single feature are grouped in a single
> > decorator interface, together with their dedicated java doc. It's also
> > easier to google search for help using the decorator name
>
> - if an optional feature requires two methods to be implemented at once,
> > decorator can guarantee that
>
> These two points are not true when multiple components and classes are
> involved collaboratively to provide a feature. In our case, we have both
> SourceReader and SplitReader involved. And there might be other interfaces
> on the JM side involved for some future features. So the relevant methods
> can actually be scattered over the places. That said, we may still use
> decorative interfaces for each component, if the feature is optional, given
> there is a single source of truth for the feature.
>
> Here I would strongly lean towards making life easier for new users,
> > lowering the entry barrier, at the (imo) slight expense for the power
> > users.
>
> I actually think the current approach is simpler, more extensible and more
> general for all the users. Can you articulate a bit more on which part you
> think makes users harder to understand?
>
>
> There is another benefit of the decorative interfaces which is not
> mentioned, but might be worth considering here. Usually the decorative
> interfaces give slightly better backwards compatibility than the new
> default method in the interfaces. That is when users are using a jar that
> was compiled with an older version of Flink which does not have the default
> method in the interfaces in question. A decorative interface may still
> provide backwards compatibility in that case, while default method impl
> cannot.
>
> I think in Flink we in general do not guarantee custom components compiled
> with an older version can run with a newer version of Flink. A recompile
> with a newer version would be required. That said, if we do care about
> this, we can just change the "supportXXX()" method in the Source interface
> to use decorative interfaces, and leave the other parts unchanged.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
> On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pn...@apache.org>
> wrote:
>
> > Hi,
> >
> > Sorry for chipping in so late, but I was OoO for the last two weeks.
> > Regarding the interfaces, I would be actually against adding those
> methods
> > to the base interfaces for the reasons mentioned above. Clogging the base
> > interface for new users with tons of methods that they do not need, do
> not
> > understand and do not know what to do with them. Moreover, such
> decorative
> > interfaces are solving a problem if a feature requires two or more
> methods
> > to be implemented at the same time. If we have all of the methods with
> > default implementation in the base interface, the API doesn't give any
> clue
> > to the user which set of methods are required to be implemented at the
> same
> > time.
> >
> > > a) I feel the biggest drawback of decorative interfaces is which
> > interface
> > > they can decorate and which combinations of multiple decorative
> > interfaces
> > > are valid. (...)
> > > In the future, if there is a new feature added
> > > (e.g. sorted or pre-partitioned data aware), are we going to create
> > another
> > > interface of SplitReader such as SortedSplitReader or
> > PrePartitionedAware?
> > > Can they be combined? So I think the additional decorative interface
> like
> > > withSplitsAlignment actually increases the understanding cost of users
> > > because they have to know what decorative interfaces are there, which
> > > interface they can decorate and which combinations of the decorative
> > > interfaces are valid and which are not. Ideally we want to avoid that.
> >
> > I'm not sure if I understand how embedding default methods in the base
> > interface is solving the problem: what can be combined or not? If there
> are
> > two conflicting features, having decorative interfaces that can not be
> > mixed together actually makes much more sense to me rather than having
> them
> > all in one base class. How would you allow users to implement only one of
> > those two features?
> >
> > To reiterate on the issue. Yes, there are drawbacks:
> > - how can a user discover what decorators are there?
> > - how can a user know where the decorator can be applied to?
> >
> > However those are drawbacks for more power users, that can be mitigated
> by
> > the documentation. For example listing all of the decorators with
> > detailed explanation both in the docs and in the java docs. More
> > experienced users will be able to deal with those issues easier, as they
> > will already have some basic understanding of Flink. Also if user has a
> > problem that he wants to solve, he will google search a potential
> solution
> > to his problem anyway, and while doing that he is very likely to discover
> > the decorator that he needs anyway in the docs.
> >
> > The advantages are:
> > - clean and easy to implement base interface
> > - all of the methods from a single feature are grouped in a single
> > decorator interface, together with their dedicated java doc. It's also
> > easier to google search for help using the decorator name
> > - if an optional feature requires two methods to be implemented at once,
> > decorator can guarantee that
> >
> > Here I would strongly lean towards making life easier for new users,
> > lowering the entry barrier, at the (imo) slight expense for the power
> > users.
> >
> > Best,
> > Piotrek
> >
> >
> > wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com> napisał(a):
> >
> > > Thanks for the reply Sebastian and Dawid,
> > >
> > > I think Sebastion has a good summary. This is a really helpful
> > discussion.
> > >
> > > Thinking a bit more, I feel that it might still be better to add the
> > > supportsXXX() method in the Source rather than SourceReader.
> > >
> > > Generally speaking, what we are trying to do here is to let the Flink
> > > framework know what the Source is capable of. In this FLIP, it happens
> to
> > > be the capability that only involves SourceReader. But in the future,
> it
> > is
> > > possible that another functionality involves both the SplitEnumerator
> and
> > > SourceReader. In that case, following the current approach, we should
> put
> > > the "supportsXXX()" method in both SplitEnumerator and SourceReader.
> > > Because if we only put this in the SourceReader, then the JM would have
> > to
> > > create a SourceReader in order to know whether this feature is
> supported,
> > > which is a little ugly. But if we put the "supportsXXX()" method in the
> > > Source, we will break the "symmetric" design because this FLIP chose a
> > > different way.
> > >
> > > This is also why I think supportsXXX() method seems a good thing to
> have,
> > > because when there are a few interfaces / methods that are expected to
> be
> > > implemented at the same time in order to deliver a feature, it is
> always
> > > good to have a single source of truth to tell the framework what to do,
> > so
> > > the framework can do consistent things in different parts.
> > >
> > > @Sebastian Mattheis <se...@ververica.com>
> > >
> > > Regarding interface flavor b), i.e. AlignedSourceReader +
> > > AlignedSplitReader, what I feel awkward about is that we are
> essentially
> > > expecting almost all the SourceReader implementations to extend
> > > SourceReaderBase, which effectively makes the SourceReader interface
> > > without the pausing support useless. So this indicates that public
> > > decorative interfaces (or sub-interfaces for the same purpose) only
> > > make sense if the original interface is also expected to be used.
> > > Otherwise, it seems makes more sense to add the method to the original
> > > interface itself.
> > >
> > > Cheers,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > >
> > > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <
> dwysakowicz@apache.org
> > >
> > > wrote:
> > >
> > > > Thanks @Sebastian for the nice summary.
> > > >
> > > > I think most of your points aligned with the suggestions I made to
> the
> > > > FLIP, while you were writing your reply (I believe we hit enter
> nearly
> > at
> > > > the same time ;) )
> > > >
> > > > Two points after we synced offline
> > > >
> > > > 1. I changed also the supportsWatermarksSplitAlignment to
> > > > supportsPausingSplits to express the general capability of pausing.
> > > >
> > > > 2. As for if we should PausingSourceReader/PausingSplitReader (option
> > b)
> > > > or if we should just add the methods (option c), I suggest to simply
> > add
> > > > the two methods as I felt this is much preferred approach Becket,
> which
> > > > others do not object. Unless there is an opposition let's go with
> this
> > > > option c.
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> > > >
> > > > Hi folks,
> > > >
> > > > Sorry for being a bit silent. Many thanks for all the input and
> > > > suggestions. As I'm a bit new, I needed some time to catch up and
> > > structure
> > > > (for myself) the discussion and I wanted to find a way to structure
> the
> > > > conclusions. (Also because I had the feeling that some concerns got
> > lost
> > > in
> > > > the discussion.) This is my attempt and please correct me if
> something
> > is
> > > > wrong or misunderstood. I tried to collect and assemble the opinions,
> > > > suggestions, and conclusions (to the best of my knowledge):
> > > >
> > > > # Top A: Should split alignment (pause/resume behavior) be a general
> > > > capability?
> > > >
> > > > I personally don't see any reason no to have it a general capability
> > > > because for the alignSplit method it is actually independent of the
> > > > watermarks. If we agree here to have it a general capability, we
> should
> > > > also agree on the right wording. Does "alignSplits(splitsToResume,
> > > > splitsToPause)" refer to what is then actually meant? (I see it as
> > okay.
> > > I
> > > > don't have any better idea whilst Arvid suggested
> > "pauseOrResumeSplits".)
> > > >
> > > > # Top B: Should it be possible do enable/disable split alignment?
> > > >
> > > > I would personally not disable the split alignment on the source
> reader
> > > > side because if split alignment is used for some other use case (see
> A)
> > > it
> > > > could have nasty side effects on other/future use cases. Instead, I
> > would
> > > > disable "watermark split alignment" where I think it should disable
> the
> > > > watermark-dependent trigger for split alignment.
> > > >
> > > > # Top C: Should we add a supportsX method?
> > > >
> > > > I find it difficult to define the scope of a supportsX method w.r.t.
> to
> > > > the following questions: a) Where is it used? and b) What is the
> > expected
> > > > output? To b), it's not straight-forward to provide a meaningful
> > output,
> > > > e.g., if SourceReader supports split alignment but SplitReader not.
> > This
> > > is
> > > > because with the current implementation, we can determine whether
> split
> > > > alignment is fully supported only during runtime and specifically
> > > actually
> > > > only when calling alignSplits down the call hierarchy up to the
> actual
> > > > SplitReaders.
> > > >
> > > > Therefore, I would suggest to either raise an error or warning if the
> > > > alignment is called but not supported at some point. I know we should
> > > > carefully think about when this could be the case because we don't
> want
> > > to
> > > > flood anybody with such warnings. However, warnings could be an
> > indicator
> > > > for the user that for watermark split alignment use case split
> reading
> > is
> > > > imbalanced with the conclusion to either disable the trigger for
> > > watermark
> > > > split alignment (see Top B) or to use/implement a source and reader
> > that
> > > > fully supports split alignment.
> > > >
> > > > # Top D: How to design interfaces?
> > > >
> > > > Thanks for structuring the discussion with the the various
> > possibilities
> > > > (a-d). From the discussion and emails, I would like to summarize the
> > > > following requirements:
> > > > - Interfaces should be consistent ("symmetric"), i.e., similar
> > semantics
> > > > should have similar interfaces with similar usage.
> > > > - Make explicit which implementations implement interfaces/support
> > > > behavior.
> > > > - Make clear what are default implementations and how to implement
> > > > interfaces with desired behavior.
> > > >
> > > > This is a simplified view of the relations between relevant classes
> of
> > > the
> > > > PoC implementation:
> > > >
> > > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- .. <|--
> > > > MySourceReader
> > > >
> > > > MySourceReader <>-- SplitFetcherManager (Internal) <>-- SplitFetcher
> > > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
> > > >
> > > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> > > >
> > > > Note that SourceReaderBase and SplitFetcherManager implement most of
> > the
> > > > "logic" for split alignment just because we wanted to implement split
> > > > alignment and wanted it to be available as kind of a default. As a
> > > > consequence, we have a "default implementation" for SourceReader that
> > > > implements the actual logic for split alignment. For that reason, I
> > find
> > > it
> > > > very confusing to have a NOOP default implementation in the interface
> > for
> > > > the SourceReader. As a consequence, interface strategy c) is
> difficult
> > > > because this would require NOOP default implementations in the public
> > > > interfaces of SourceReader and SplitReader. This is the same for
> > strategy
> > > > d) because it would require NOOP default implementation in the
> > > > SourceReader. Further, as Dawid described method signatures of
> > alignSplit
> > > > for SourceReader and SplitReader differ and it would be extremely
> > > difficult
> > > > to make the signatures the same (with even potential performance
> impact
> > > > because of additional loop-ups of split ids). Therefore, having a
> > > symmetric
> > > > decorative interface as of strategy a) is actually not possible and
> > > having
> > > > two decorative interfaces with different method signatures is
> > confusing.
> > > My
> > > > conclusion is that we are best with strategy b) which means to have
> > > > specializing sub-interfaces that inherit from the parent interface:
> > > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
> > > AlignedSplitReader
> > > > With this option, I'm not 100% sure what the implications are and if
> > this
> > > > could get nasty. I would suggest that Dawid and I just try to
> implement
> > > and
> > > > see if we like it. :)
> > > >
> > > > # Summary
> > > >
> > > > In conclusion, please let me know your perspectives. Please correct
> me,
> > > if
> > > > something is wrong or if I misunderstood something. My perspective
> > would
> > > be:
> > > >
> > > > Top A: Yes
> > > > Top B: Yes (but disable watermark trigger for split alignment)
> > > > Top C: No
> > > > Top D: b)
> > > >
> > > > Best,
> > > > Sebastian
> > > >
> > > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> > dwysakowicz@apache.org
> > > >
> > > > wrote:
> > > >
> > > >> @Arvid:
> > > >>
> > > >> While I also like Becket's capability approach, I fear that it
> doesn't
> > > work
> > > >> for this particular use case: Sources can always be aligned
> cross-task
> > > and
> > > >> this is just about intra-task alignment. So it's plausible to put
> > > sources
> > > >> into an alignment group even though they do not use any of the
> > presented
> > > >> API of FLIP-217. They should just issue a warning, if they handle
> > > multiple
> > > >> splits (see motivation section).
> > > >>
> > > >> Yes, but the "supportXXX" method would be for telling if it supports
> > > that
> > > >> intra-task alignment. Cross-task alignment would always be
> supported.
> > > >>
> > > >> I updated interfaces to what I believe to be closest to a consensus
> > > >> between all participants. Do you mind taking a look?
> > > >>
> > > >> @Sebastian Do you mind addressing the nits?
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >>
> > > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > > >>
> > > >> Thanks for pushing this effort.
> > > >>
> > > >> I'd actually be in favor of 1b). I fully agree that decorator
> > interfaces
> > > >> should be avoided but I'm also not a big fan of overloading the base
> > > >> interfaces (they are hard to implement as is). The usual feedback to
> > > >> Source-related interfaces are always that they are overwhelming and
> > too
> > > >> hard to implement. However, I'd also not oppose 1c) as scattered
> > > interfaces
> > > >> also have drawbacks. I'd just dislike 1a) and 1d).
> > > >> While I also like Becket's capability approach, I fear that it
> doesn't
> > > work
> > > >> for this particular use case: Sources can always be aligned
> cross-task
> > > and
> > > >> this is just about intra-task alignment. So it's plausible to put
> > > sources
> > > >> into an alignment group even though they do not use any of the
> > presented
> > > >> API of FLIP-217. They should just issue a warning, if they handle
> > > multiple
> > > >> splits (see motivation section).
> > > >>
> > > >> I think renaming alignSplits to facilitate future use cases makes
> > sense
> > > but
> > > >> then all interfaces (if 1c) is chosen) should be adjusted
> accordingly.
> > > >> AlignedSourceReader could be PausingSourceReader and I'd go for
> > > >> pauseOrResumeSplits (Becket's proposal afaik). We could also split
> it
> > > into
> > > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may allow
> > Sources
> > > to
> > > >> just use 1 instead of 2 library calls (as written in the Javadoc),
> > both
> > > >> Kafka and Pulsar can't use it and I'm not sure if there is a system
> > that
> > > >> can.
> > > >>
> > > >> Some nit for the FLIP:
> > > >> - Please replace "stop" with "pause".
> > > >> - Not sure if it's worth it in the capability section: Sources that
> > > adopt
> > > >> this interface cannot be used in earlier versions. So it feels like
> we
> > > are
> > > >> only forward compatible (old sources can be used after the change);
> > but
> > > I
> > > >> guess this holds for any API addition.
> > > >> - You might want to add what happens when all splits are paused.
> > > >> - You may want to describe how the 3 flavors of SourceReaderBase
> > > interact
> > > >> with the interface.
> > > >> - I'm not sure if it makes sense to include Kafka and Pulsar in the
> > > FLIP.
> > > >> For me, this is rather immediate follow-up work. (could be in the
> same
> > > >> umbrella ticket)
> > > >>
> > > >> Best,
> > > >>
> > > >> Arvid
> > > >>
> > > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> > > dwysakowicz@apache.org> <dw...@apache.org>
> > > >> wrote:
> > > >>
> > > >>
> > > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> > along
> > > >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
> > > >> b) "MySourceReader implements AlignedSourceReader" and
> "MySplitReader
> > > >> implements AlignedSplitReader", or
> > > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > > implements
> > > >> SplitReader".
> > > >>
> > > >> I think the latest proposal according to Dawid would be:
> > > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > > implements
> > > >> AlignedSplitReader".
> > > >> I am fine with this API, although personally speaking I think it is
> > > simpler
> > > >> to just add a new method to the split reader with default impl.
> > > >>
> > > >>
> > > >> I think that is a good idea to have it aligned as much as possible.
> > I'd
> > > be
> > > >> +1 for your option c). We can merge AlignedSplitReader with
> > > SplitReader. We
> > > >> will update the FLIP shortly.
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >>
> > > >> On 25/04/2022 12:43, Becket Qin wrote:
> > > >>
> > > >> Thanks for the comment, Jark.
> > > >>
> > > >> 3. Interface/Method Name.
> > > >>
> > > >> Can the interface be used to align other things in the future? For
> > > example,
> > > >> align read speed, I have
> > > >> seen users requesting global rate limits. This feature may also need
> > an
> > > >> interface like this.
> > > >> If we don't plan to extend this interface to support align other
> > > things, I
> > > >> suggest explicitly declaring
> > > >> the purpose of the methods, such as `alignWatermarksForSplits`
> instead
> > > of
> > > >> `alignSplits`.
> > > >>
> > > >> This is a good point. Naming wise, it would usually be more
> extensible
> > > to
> > > >> just describe what the method actually does, instead of assuming the
> > > >> purpose of doing this. For example, in this case,
> > pauseOrResumeSplits()
> > > >> would be more extensible because this can be used for any kind of
> flow
> > > >> control, be it watermark alignment or simple rate limiting.
> > > >>
> > > >> 4. Interface or Method.
> > > >>
> > > >> I don't have a strong opinion on this. I think they have their own
> > > >> advantages.
> > > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > > >> (SupportsXxxx) for TableSource/TableSink,
> > > >> and I prefer Interfaces rather than methods in this case. When you
> > have
> > > a
> > > >> bunch of abilities and each ability
> > > >> has more than one method, Interfaces can help to organize them and
> > make
> > > >> users clear which methods
> > > >> need to implement when you want to have an ability.
> > > >>
> > > >> I am OK with decorative interfaces if this is a general design
> pattern
> > > in
> > > >> the other components in Flink. But it looks like the current API
> > > proposal
> > > >> is not symmetric.
> > > >>
> > > >> The current proposal is essentially "MySourceReader implements
> > > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
> > implements
> > > >> AlignedSplitsReader".
> > > >>
> > > >> Should we make the API symmetric? I'd consider any one of the
> > following
> > > as
> > > >> symmetric.
> > > >>
> > > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> > along
> > > >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
> > > >> b) "MySourceReader implements AlignedSourceReader" and
> "MySplitReader
> > > >> implements AlignedSplitReader", or
> > > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > > implements
> > > >> SplitReader".
> > > >>
> > > >> I think the latest proposal according to Dawid would be:
> > > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > > implements
> > > >> AlignedSplitReader".
> > > >> I am fine with this API, although personally speaking I think it is
> > > simpler
> > > >> to just add a new method to the split reader with default impl.
> > > >>
> > > >> @Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org>
> <
> > > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the
> reply.
> > > >>
> > > >> Having said that, as I don't have a preference and I agree most of
> the
> > > >>
> > > >> sources will support the alignment I am fine following your
> suggestion
> > > to
> > > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
> > but
> > > >> would put the "supportsXXX" there, not in the Source to keep the two
> > > >> methods together.
> > > >>
> > > >> One benefit of having the "supportsXXX" in Source is that this
> allows
> > > some
> > > >> compile time check. For example, if a user enabled watermark
> alignment
> > > >> while it is not supported by the Source, an exception can be thrown
> at
> > > >> compile time. It seems in general useful. That said, I agree that
> API
> > > >> cleanliness wise it is better to put the two methods together.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
> > > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
> > > >>
> > > >>
> > > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
> > > >> I don't have other concerns then.
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> > dwysakowicz@apache.org>
> > > <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org>
> > > >> wrote:
> > > >>
> > > >>
> > > >> @Jark:
> > > >>
> > > >> 1. Will the framework always align with watermarks when the source
> > > >> implements the interface?
> > > >> I'm afraid not every case needs watermark alignment even if Kafka
> > > >> implements the interface,
> > > >> and this will affect the throughput somehow. I agree with Becket
> > > >> we may need a
> > > >> `supportSplitsAlignment()` method for users to configure the source
> to
> > > >> enable/disable the alignment.
> > > >>
> > > >> 2. How does the framework calculate maxDesiredWatermark?
> > > >> I think the algorithm of maxDesiredWatermark will greatly affect
> > > >>
> > > >> throughput
> > > >>
> > > >> if the reader is constantly
> > > >>  switching between pause and resume. Can users configure the
> alignment
> > > >> offset?
> > > >>
> > > >>
> > > >> This is covered in the previous FLIP[1] which has been already
> > > >>
> > > >> implemented
> > > >>
> > > >> in 1.15. In short, it must be enabled with the watermark strategy
> > which
> > > >> also configures drift and update interval.
> > > >>
> > > >> If we don't plan to extend this interface to support align other
> > things,
> > > >>
> > > >> I
> > > >>
> > > >> suggest explicitly declaring
> > > >> the purpose of the methods, such as `alignWatermarksForSplits`
> instead
> > > of
> > > >> `alignSplits`.
> > > >>
> > > >>
> > > >> Sure let's rename it.
> > > >>
> > > >> @Becket:
> > > >>
> > > >> I understand your point. On the other hand putting all methods, even
> > > with
> > > >> "supportsXXX" methods for enabling certain features, makes the entry
> > > >> threshold for writing a new source higher. Instead of focusing on
> the
> > > >>
> > > >> basic
> > > >>
> > > >> and required properties of the Source, the person implementing a
> > source
> > > >> must bother with and need to figure out what all of the extra
> features
> > > >>
> > > >> are
> > > >>
> > > >> about and how to deal with them. It makes it also harder to organize
> > > >> methods in coupled groups as Jark said.
> > > >>
> > > >> Having said that, as I don't have a preference and I agree most of
> the
> > > >> sources will support the alignment I am fine following your
> suggestion
> > > to
> > > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
> > but
> > > >> would put the "supportsXXX" there, not in the Source to keep the two
> > > >> methods together.
> > > >>
> > > >> Lastly, I agree it is really unfortunate the "alignSplits" methods
> > > differ
> > > >> slightly for SourceReader and SpitReader. The reason for that is
> > > >> SourceReaderBase deals only with SplitIds, whereas SplitReader needs
> > the
> > > >> actual splits to pause them. I found the discrepancy acceptable for
> > the
> > > >> sake of simplifying changes significantly, especially as they would
> > > >>
> > > >> highly
> > > >>
> > > >> likely impact performance as we would have to perform additional
> > > lookups.
> > > >> Moreover the SplitReader is a secondary interface.
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >>
> > > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> > > >>
> > > >> On 24/04/2022 17:15, Jark Wu wrote:
> > > >>
> > > >> Thanks for the effort, Dawid and Sebastian!
> > > >>
> > > >> I just have some minor questions (maybe I missed something).
> > > >>
> > > >> 1. Will the framework always align with watermarks when the source
> > > >> implements the interface?
> > > >> I'm afraid not every case needs watermark alignment even if Kafka
> > > >> implements the interface,
> > > >> and this will affect the throughput somehow. I agree with Becket
> > > >> we may need a
> > > >> `supportSplitsAlignment()` method for users to configure the source
> to
> > > >> enable/disable the alignment.
> > > >>
> > > >> 2. How does the framework calculate maxDesiredWatermark?
> > > >> I think the algorithm of maxDesiredWatermark will greatly affect
> > > >>
> > > >> throughput
> > > >>
> > > >> if the reader is constantly
> > > >>  switching between pause and resume. Can users configure the
> alignment
> > > >> offset?
> > > >>
> > > >> 3. Interface/Method Name.
> > > >> Can the interface be used to align other things in the future? For
> > > >>
> > > >> example,
> > > >>
> > > >> align read speed, I have
> > > >> seen users requesting global rate limits. This feature may also need
> > an
> > > >> interface like this.
> > > >> If we don't plan to extend this interface to support align other
> > things,
> > > >>
> > > >> I
> > > >>
> > > >> suggest explicitly declaring
> > > >> the purpose of the methods, such as `alignWatermarksForSplits`
> instead
> > > of
> > > >> `alignSplits`.
> > > >>
> > > >> 4. Interface or Method.
> > > >> I don't have a strong opinion on this. I think they have their own
> > > >> advantages.
> > > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > > >> (SupportsXxxx) for TableSource/TableSink,
> > > >> and I prefer Interfaces rather than methods in this case. When you
> > have
> > > a
> > > >> bunch of abilities and each ability
> > > >> has more than one method, Interfaces can help to organize them and
> > make
> > > >> users clear which methods
> > > >> need to implement when you want to have an ability.
> > > >>
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <be...@gmail.com> <
> > > becket.qin@gmail.com> <be...@gmail.com> <be...@gmail.com> <
> > > >> becket.qin@gmail.com> wrote:
> > > >>
> > > >> Hi Dawid,
> > > >>
> > > >> Thanks for the explanation. Apologies that I somehow misread a bunch
> > of
> > > >> "align" and thought they were "assign".
> > > >>
> > > >> Regarding 1, by default implementation, I was thinking of the
> default
> > > >>
> > > >> no-op
> > > >>
> > > >> implementation. I am a little worried about the proliferation of
> > > >>
> > > >> decorative
> > > >>
> > > >> interfaces. I think the most important thing about interfaces is
> that
> > > >>
> > > >> they
> > > >>
> > > >> are easy to understand. In this case, I prefer adding new method to
> > the
> > > >> existing interface for the following reasons:
> > > >>
> > > >> a) I feel the biggest drawback of decorative interfaces is which
> > > >>
> > > >> interface
> > > >>
> > > >> they can decorate and which combinations of multiple decorative
> > > >>
> > > >> interfaces
> > > >>
> > > >> are valid. In the current FLIP, the withSplitsAlignment interface is
> > > only
> > > >> applicable to the SourceReader which means it can't decorate any
> other
> > > >> interface. From an interface design perspective, a natural question
> is
> > > >>
> > > >> why
> > > >>
> > > >> not let "AlignedSplitReader" extend "withSplitsAlignment"? And it is
> > > also
> > > >> natural to assume that a split reader implementing both SplitReader
> > and
> > > >> WithSplitAlignment would work, because a source reader implementing
> > > >> SourceReader and withSplitsAlignment works. So why isn't there an
> > > >>
> > > >> interface
> > > >>
> > > >> of AlignedSourceReader? In the future, if there is a new feature
> added
> > > >> (e.g. sorted or pre-partitioned data aware), are we going to create
> > > >>
> > > >> another
> > > >>
> > > >> interface of SplitReader such as SortedSplitReader or
> > > >>
> > > >> PrePartitionedAware?
> > > >>
> > > >> Can they be combined? So I think the additional decorative interface
> > > like
> > > >> withSplitsAlignment actually increases the understanding cost of
> users
> > > >> because they have to know what decorative interfaces are there,
> which
> > > >> interface they can decorate and which combinations of the decorative
> > > >> interfaces are valid and which are not. Ideally we want to avoid
> that.
> > > To
> > > >> be clear, I am not opposing having an interface of
> > withSplitsAlignment,
> > > >>
> > > >> it
> > > >>
> > > >> is completely OK to have it as an internal interface and let
> > > SourceReader
> > > >> and SplitReader both extend it.
> > > >>
> > > >> b) Adding a new method to the SourceReader with a default
> > implementation
> > > >>
> > > >> of
> > > >>
> > > >> no-op would help avoid logic branching in the source logic,
> especially
> > > >> given that we agree that the vast majority of the SourceReader
> > > >> implementations, if not all, would just extend from the
> > > SourceReaderBase.
> > > >> That means adding a new method to the interface would effectively
> give
> > > >>
> > > >> the
> > > >>
> > > >> same user experience, but simpler.
> > > >>
> > > >> c) A related design principle that may be worth discussing is how do
> > we
> > > >>
> > > >> let
> > > >>
> > > >> the Source implementations tell Flink what capability is supported
> and
> > > >>
> > > >> what
> > > >>
> > > >> is not. Personally speaking I feel the most intuitive place to me is
> > in
> > > >>
> > > >> the
> > > >>
> > > >> Source itself, because that is the entrance of the entire Source
> > > >>
> > > >> connector
> > > >>
> > > >> logic.
> > > >>
> > > >> Based on the above thoughts, I am wondering if the following
> interface
> > > >> would be easier to understand by the users.
> > > >>
> > > >> - Change "withSplitsAlignment" to internal interface, let both
> > > >>
> > > >> SourceReader
> > > >>
> > > >> and SplitReader extend from it, with a default no-op implementation.
> > > >> - Add a new method "boolean supportSplitsAlignment()" to the Source
> > > >> interface, with a default implementation returning false. Sources
> that
> > > >>
> > > >> have
> > > >>
> > > >> implemented the alignment logic can change this to return true, and
> > > >> override the alignSplits() methods in the SourceReader / SplitReader
> > if
> > > >> needed.
> > > >> - In the future, if a new optional feature is going to be added to
> the
> > > >> Source, and that feature requires the awareness from Flink, we can
> add
> > > >>
> > > >> more
> > > >>
> > > >> such methods to the Source.
> > > >>
> > > >> What do you think?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> > > dwysakowicz@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org
> > >
> > > <dw...@apache.org>
> > > >> <dw...@apache.org> <dw...@apache.org> <
> > > dwysakowicz@apache.org> <dw...@apache.org>
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >> @Konstantin:
> > > >>
> > > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
> > stop &
> > > >> resume behavior) will be implemented for Kafka and Pulsar only,
> > correct?
> > > >>
> > > >> Correct, as far as I know though, those are the only sources which
> > > >>
> > > >> consume
> > > >>
> > > >> concurrently from multiple splits and thus alignment applies.
> > > >>
> > > >> @Thomas:
> > > >>
> > > >> I wonder if "supporting" split alignment in SourceReaderBase and
> then
> > > >>
> > > >> doing
> > > >>
> > > >> nothing if the split reader does not implement AlignedSplitReader
> > could
> > > >>
> > > >> be
> > > >>
> > > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> > > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
> > that
> > > >> the source actually supports it.
> > > >>
> > > >> I understand your concern. Hmm, I think we could actually do that.
> > Given
> > > >> the actual implementation of the SourceReaderBase#alignSplits is
> > rather
> > > >> short (just a forward to the corresponding method of SplitFetcher),
> we
> > > >> could reimplement it in the actual source implementations. This
> > solution
> > > >> has the downside though. Authors of new sources would have to do two
> > > >> things: extend from AlignedSplitReader and implement
> > > >>
> > > >> WithSplitsAssignment,
> > > >>
> > > >> instead of just extending AlignedSplitReader. I would be fine with
> > such
> > > a
> > > >> tradeoff though. What others think?
> > > >>
> > > >> @Steven:
> > > >>
> > > >> For this part from the motivation section, is it accurate? Let's
> > assume
> > > >>
> > > >> one
> > > >>
> > > >> source task consumes from 3 partitions and one of the partition is
> > > >> significantly slower. In this situation, watermark for this source
> > task
> > > >> won't hold back as it is reading recent data from other two Kafka
> > > >> partitions. As a result, it won't hold back the overall watermark. I
> > > >> thought the problem is that we may have late data for this slow
> > > >>
> > > >> partition.
> > > >>
> > > >> It will hold back the watermark. Watermark of an operator is the
> > minimum
> > > >> of watermarks of all splits[1]
> > > >>
> > > >> I have another question about the restart. Say split alignment is
> > > >> triggered. checkpoint is completed. job failed and restored from the
> > > last
> > > >> checkpoint. because alignment decision is not checkpointed,
> initially
> > > >> alignment won't be enforced until we get a cycle of watermark
> > > aggregation
> > > >> and propagation, right? Not saying this corner is a problem. Just
> want
> > > to
> > > >> understand it more.
> > > >>
> > > >> Your understanding is correct.
> > > >>
> > > >> @Becket:
> > > >>
> > > >> 1. I think watermark alignment is sort of a general use case, so
> > should
> > > >>
> > > >> we
> > > >>
> > > >> just add the related methods to SourceReader directly instead of
> > > >> introducing the new interface of WithSplitAssignment? We can provide
> > > >> default implementations, so backwards compatibility won't be an
> issue.
> > > >>
> > > >> I don't think we can provide a default implementation. How would we
> do
> > > >> that? Would it be just a no-op? Is it better than having an opt-in
> > > >> interface? The default implementation would have to be added
> > exclusively
> > > >>
> > > >> in
> > > >>
> > > >> a *Public* SourceReader interface. By the way notice
> SourceReaderBase
> > > >> does extend from WithSplitsAlignment, so effectively all
> > implementations
> > > >>
> > > >> do
> > > >>
> > > >> handle the alignment case. To be honest I think it is impossible to
> > > >> implement the SourceReader interface directly by end users.
> > > >>
> > > >> 2. As you mentioned, the SplitReader interface probably also needs
> > some
> > > >> change to support throttling at the split granularity. Can you add
> > that
> > > >> interface change into the public interface section as well?
> > > >>
> > > >> It has been added from the beginning. See *AlignedSplitReader.*
> > > >>
> > > >> 3. Nit, can we avoid using the method name assignSplits here, given
> > that
> > > >>
> > > >> it
> > > >>
> > > >> is not actually changing the split assignments? It seems something
> > like
> > > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
> > > >>
> > > >> The method's called *alignSplits*, not assign. Do you still prefer a
> > > >> different name for that? Personally, I am open for suggestions here.
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > > >>
> > > >> On 22/04/2022 05:59, Becket Qin wrote:
> > > >>
> > > >> Thanks for driving the effort, Sebastion. I think the motivation
> > makes a
> > > >> lot of sense. Just a few suggestions / questions.
> > > >>
> > > >> 1. I think watermark alignment is sort of a general use case, so
> > should
> > > >>
> > > >> we
> > > >>
> > > >> just add the related methods to SourceReader directly instead of
> > > >> introducing the new interface of WithSplitAssignment? We can provide
> > > >> default implementations, so backwards compatibility won't be an
> issue.
> > > >>
> > > >> 2. As you mentioned, the SplitReader interface probably also needs
> > some
> > > >> change to support throttling at the split granularity. Can you add
> > that
> > > >> interface change into the public interface section as well?
> > > >>
> > > >> 3. Nit, can we avoid using the method name assignSplits here, given
> > that
> > > >>
> > > >> it
> > > >>
> > > >> is not actually changing the split assignments? It seems something
> > like
> > > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Jiangjie (Becket) Qin
> > > >>
> > > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <st...@gmail.com> <
> > > stevenz3wu@gmail.com> <st...@gmail.com> <st...@gmail.com> <
> > > >> stevenz3wu@gmail.com> <
> > > >> stevenz3wu@gmail.com> wrote:
> > > >>
> > > >> However, a single source operator may read data from multiple
> > > >>
> > > >> splits/partitions, e.g., multiple Kafka partitions, such that even
> > with
> > > >> watermark alignment the source operator may need to buffer excessive
> > > >>
> > > >> amount
> > > >>
> > > >> of data if one split emits data faster than another.
> > > >>
> > > >> For this part from the motivation section, is it accurate? Let's
> > assume
> > > >>
> > > >> one
> > > >>
> > > >> source task consumes from 3 partitions and one of the partition is
> > > >> significantly slower. In this situation, watermark for this source
> > task
> > > >> won't hold back as it is reading recent data from other two Kafka
> > > >> partitions. As a result, it won't hold back the overall watermark. I
> > > >> thought the problem is that we may have late data for this slow
> > > >>
> > > >> partition.
> > > >>
> > > >> I have another question about the restart. Say split alignment is
> > > >> triggered. checkpoint is completed. job failed and restored from the
> > > last
> > > >> checkpoint. because alignment decision is not checkpointed,
> initially
> > > >> alignment won't be enforced until we get a cycle of watermark
> > > aggregation
> > > >> and propagation, right? Not saying this corner is a problem. Just
> want
> > > to
> > > >> understand it more.
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org> <
> > > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> > > >> thw@apache.org> <
> > > >> thw@apache.org> wrote:
> > > >>
> > > >> Thanks for working on this!
> > > >>
> > > >> I wonder if "supporting" split alignment in SourceReaderBase and
> then
> > > >>
> > > >> doing
> > > >>
> > > >> nothing if the split reader does not implement AlignedSplitReader
> > could
> > > >>
> > > >> be
> > > >>
> > > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> > > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
> > that
> > > >> the source actually supports it.
> > > >>
> > > >> Thanks,
> > > >> Thomas
> > > >>
> > > >>
> > > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <knaufk@apache.org
> >
> > <
> > > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
> > > >> knaufk@apache.org> <
> > > >> knaufk@apache.org>
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >> Hi Sebastian, Hi Dawid,
> > > >>
> > > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
> stop
> > > >>
> > > >> &
> > > >>
> > > >> resume behavior) will be implemented for Kafka and Pulsar only,
> > > >>
> > > >> correct?
> > > >>
> > > >> +1 in general. I believe it is valuable to complete the watermark
> > > >>
> > > >> aligned
> > > >>
> > > >> story with this FLIP.
> > > >>
> > > >> Cheers,
> > > >>
> > > >> Konstantin
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> > > dwysakowicz@apache.org> <dw...@apache.org>
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >> To be explicit, having worked on it, I support it ;) I think we can
> > > >> start a vote thread soonish, as there are no concerns so far.
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >>
> > > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > > >>
> > > >> Dear Flink developers,
> > > >>
> > > >> I would like to open a discussion on FLIP 217 [1] for an extension
> > > >>
> > > >> of
> > > >>
> > > >> Watermark Alignment to perform alignment also in SplitReaders. To
> > > >>
> > > >> do
> > > >>
> > > >> so,
> > > >>
> > > >> SplitReaders must be able to suspend and resume reading from split
> > > >>
> > > >> sources
> > > >>
> > > >> where the SourceOperator coordinates and controlls suspend and
> > > >>
> > > >> resume.
> > > >>
> > > >> To
> > > >>
> > > >> gather information about current watermarks of the SplitReaders, we
> > > >>
> > > >> extend
> > > >>
> > > >> the internal WatermarkOutputMulitplexer and report watermarks to
> > > >>
> > > >> the
> > > >>
> > > >> SourceOperator.
> > > >>
> > > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
> > > >>
> > > >> revised
> > > >>
> > > >> and
> > > >>
> > > >> reworked by Dawid Wysakowicz (He did most of the work.) and me. The
> > > >>
> > > >> changes
> > > >>
> > > >> are backwards compatible in a way that if affected components do
> > > >>
> > > >> not
> > > >>
> > > >> support split alignment the behavior is as before.
> > > >>
> > > >> Best,
> > > >> Sebastian
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > > >>
> > > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
> > > >>
> > > >> --
> > > >>
> > > >> Konstantin Knaufhttps://
> > twitter.com/snntrablehttps://github.com/knaufk
> > > >>
> > > >>
> > >
> >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

Posted by Becket Qin <be...@gmail.com>.
Hi Piotr,

Thanks for the comment.

Just to clarify, I am not against the decorative interfaces, but I do think
we should use them with caution. The main argument for adding the methods
to the SourceReader is that these methods are effectively NON-OPTIONAL to
SourceReader impl, i.e. starting from this FLIP, all the SourceReaders impl
are expected to support this method, although some old implementations may
not have implemented this feature. I think we should distinguish the new
features from the optional features. While the public decorative interface
is a solution to the optional features. We should not use it for the
features that are non-optional.

That said, this feature is optional for SplitReaders. Arguably we can have
a decorative interface for that, but for simplicity and symmetry of the
interface, personally I prefer just adding a new method.

Regarding the advantages you mentioned about the decorative interfaces,
they would make sense if:
1. The feature is optional.
2. There is only one decorative interface involved for a feature. Otherwise
the argument that all the methods are grouped together will not stand.

Compared with that, I think the current solution works fine in all cases,
i.e. "having supportXXX() method in Source, and default methods /
decorative interfaces in base interfaces.".

The advantages are:
> - clean and easy to implement base interface

In the current approach, the Java doc of the SupportXXX() method in the
Source would be the single source of truth regarding how to implement this
feature. It lists the method that has to be implemented to support this
feature, regardless of how many classes / interfaces are involved.

When implementing the base interface, users do not need to implement a
method with default implementation. If they are curious what the method is
for, the java doc of that method simply points users to the SupportXXX()
method in the Source. It seems not adding work to the users compared with
decorative interfaces, but gives much better discoverability.

- all of the methods from a single feature are grouped in a single
> decorator interface, together with their dedicated java doc. It's also
> easier to google search for help using the decorator name

- if an optional feature requires two methods to be implemented at once,
> decorator can guarantee that

These two points are not true when multiple components and classes are
involved collaboratively to provide a feature. In our case, we have both
SourceReader and SplitReader involved. And there might be other interfaces
on the JM side involved for some future features. So the relevant methods
can actually be scattered over the places. That said, we may still use
decorative interfaces for each component, if the feature is optional, given
there is a single source of truth for the feature.

Here I would strongly lean towards making life easier for new users,
> lowering the entry barrier, at the (imo) slight expense for the power
> users.

I actually think the current approach is simpler, more extensible and more
general for all the users. Can you articulate a bit more on which part you
think makes users harder to understand?


There is another benefit of the decorative interfaces which is not
mentioned, but might be worth considering here. Usually the decorative
interfaces give slightly better backwards compatibility than the new
default method in the interfaces. That is when users are using a jar that
was compiled with an older version of Flink which does not have the default
method in the interfaces in question. A decorative interface may still
provide backwards compatibility in that case, while default method impl
cannot.

I think in Flink we in general do not guarantee custom components compiled
with an older version can run with a newer version of Flink. A recompile
with a newer version would be required. That said, if we do care about
this, we can just change the "supportXXX()" method in the Source interface
to use decorative interfaces, and leave the other parts unchanged.

Thanks,

Jiangjie (Becket) Qin




On Tue, May 3, 2022 at 6:25 PM Piotr Nowojski <pn...@apache.org> wrote:

> Hi,
>
> Sorry for chipping in so late, but I was OoO for the last two weeks.
> Regarding the interfaces, I would be actually against adding those methods
> to the base interfaces for the reasons mentioned above. Clogging the base
> interface for new users with tons of methods that they do not need, do not
> understand and do not know what to do with them. Moreover, such decorative
> interfaces are solving a problem if a feature requires two or more methods
> to be implemented at the same time. If we have all of the methods with
> default implementation in the base interface, the API doesn't give any clue
> to the user which set of methods are required to be implemented at the same
> time.
>
> > a) I feel the biggest drawback of decorative interfaces is which
> interface
> > they can decorate and which combinations of multiple decorative
> interfaces
> > are valid. (...)
> > In the future, if there is a new feature added
> > (e.g. sorted or pre-partitioned data aware), are we going to create
> another
> > interface of SplitReader such as SortedSplitReader or
> PrePartitionedAware?
> > Can they be combined? So I think the additional decorative interface like
> > withSplitsAlignment actually increases the understanding cost of users
> > because they have to know what decorative interfaces are there, which
> > interface they can decorate and which combinations of the decorative
> > interfaces are valid and which are not. Ideally we want to avoid that.
>
> I'm not sure if I understand how embedding default methods in the base
> interface is solving the problem: what can be combined or not? If there are
> two conflicting features, having decorative interfaces that can not be
> mixed together actually makes much more sense to me rather than having them
> all in one base class. How would you allow users to implement only one of
> those two features?
>
> To reiterate on the issue. Yes, there are drawbacks:
> - how can a user discover what decorators are there?
> - how can a user know where the decorator can be applied to?
>
> However those are drawbacks for more power users, that can be mitigated by
> the documentation. For example listing all of the decorators with
> detailed explanation both in the docs and in the java docs. More
> experienced users will be able to deal with those issues easier, as they
> will already have some basic understanding of Flink. Also if user has a
> problem that he wants to solve, he will google search a potential solution
> to his problem anyway, and while doing that he is very likely to discover
> the decorator that he needs anyway in the docs.
>
> The advantages are:
> - clean and easy to implement base interface
> - all of the methods from a single feature are grouped in a single
> decorator interface, together with their dedicated java doc. It's also
> easier to google search for help using the decorator name
> - if an optional feature requires two methods to be implemented at once,
> decorator can guarantee that
>
> Here I would strongly lean towards making life easier for new users,
> lowering the entry barrier, at the (imo) slight expense for the power
> users.
>
> Best,
> Piotrek
>
>
> wt., 26 kwi 2022 o 15:32 Becket Qin <be...@gmail.com> napisał(a):
>
> > Thanks for the reply Sebastian and Dawid,
> >
> > I think Sebastion has a good summary. This is a really helpful
> discussion.
> >
> > Thinking a bit more, I feel that it might still be better to add the
> > supportsXXX() method in the Source rather than SourceReader.
> >
> > Generally speaking, what we are trying to do here is to let the Flink
> > framework know what the Source is capable of. In this FLIP, it happens to
> > be the capability that only involves SourceReader. But in the future, it
> is
> > possible that another functionality involves both the SplitEnumerator and
> > SourceReader. In that case, following the current approach, we should put
> > the "supportsXXX()" method in both SplitEnumerator and SourceReader.
> > Because if we only put this in the SourceReader, then the JM would have
> to
> > create a SourceReader in order to know whether this feature is supported,
> > which is a little ugly. But if we put the "supportsXXX()" method in the
> > Source, we will break the "symmetric" design because this FLIP chose a
> > different way.
> >
> > This is also why I think supportsXXX() method seems a good thing to have,
> > because when there are a few interfaces / methods that are expected to be
> > implemented at the same time in order to deliver a feature, it is always
> > good to have a single source of truth to tell the framework what to do,
> so
> > the framework can do consistent things in different parts.
> >
> > @Sebastian Mattheis <se...@ververica.com>
> >
> > Regarding interface flavor b), i.e. AlignedSourceReader +
> > AlignedSplitReader, what I feel awkward about is that we are essentially
> > expecting almost all the SourceReader implementations to extend
> > SourceReaderBase, which effectively makes the SourceReader interface
> > without the pausing support useless. So this indicates that public
> > decorative interfaces (or sub-interfaces for the same purpose) only
> > make sense if the original interface is also expected to be used.
> > Otherwise, it seems makes more sense to add the method to the original
> > interface itself.
> >
> > Cheers,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> >
> > On Tue, Apr 26, 2022 at 6:05 PM Dawid Wysakowicz <dwysakowicz@apache.org
> >
> > wrote:
> >
> > > Thanks @Sebastian for the nice summary.
> > >
> > > I think most of your points aligned with the suggestions I made to the
> > > FLIP, while you were writing your reply (I believe we hit enter nearly
> at
> > > the same time ;) )
> > >
> > > Two points after we synced offline
> > >
> > > 1. I changed also the supportsWatermarksSplitAlignment to
> > > supportsPausingSplits to express the general capability of pausing.
> > >
> > > 2. As for if we should PausingSourceReader/PausingSplitReader (option
> b)
> > > or if we should just add the methods (option c), I suggest to simply
> add
> > > the two methods as I felt this is much preferred approach Becket, which
> > > others do not object. Unless there is an opposition let's go with this
> > > option c.
> > >
> > > Best,
> > >
> > > Dawid
> > > On 26/04/2022 10:06, Sebastian Mattheis wrote:
> > >
> > > Hi folks,
> > >
> > > Sorry for being a bit silent. Many thanks for all the input and
> > > suggestions. As I'm a bit new, I needed some time to catch up and
> > structure
> > > (for myself) the discussion and I wanted to find a way to structure the
> > > conclusions. (Also because I had the feeling that some concerns got
> lost
> > in
> > > the discussion.) This is my attempt and please correct me if something
> is
> > > wrong or misunderstood. I tried to collect and assemble the opinions,
> > > suggestions, and conclusions (to the best of my knowledge):
> > >
> > > # Top A: Should split alignment (pause/resume behavior) be a general
> > > capability?
> > >
> > > I personally don't see any reason no to have it a general capability
> > > because for the alignSplit method it is actually independent of the
> > > watermarks. If we agree here to have it a general capability, we should
> > > also agree on the right wording. Does "alignSplits(splitsToResume,
> > > splitsToPause)" refer to what is then actually meant? (I see it as
> okay.
> > I
> > > don't have any better idea whilst Arvid suggested
> "pauseOrResumeSplits".)
> > >
> > > # Top B: Should it be possible do enable/disable split alignment?
> > >
> > > I would personally not disable the split alignment on the source reader
> > > side because if split alignment is used for some other use case (see A)
> > it
> > > could have nasty side effects on other/future use cases. Instead, I
> would
> > > disable "watermark split alignment" where I think it should disable the
> > > watermark-dependent trigger for split alignment.
> > >
> > > # Top C: Should we add a supportsX method?
> > >
> > > I find it difficult to define the scope of a supportsX method w.r.t. to
> > > the following questions: a) Where is it used? and b) What is the
> expected
> > > output? To b), it's not straight-forward to provide a meaningful
> output,
> > > e.g., if SourceReader supports split alignment but SplitReader not.
> This
> > is
> > > because with the current implementation, we can determine whether split
> > > alignment is fully supported only during runtime and specifically
> > actually
> > > only when calling alignSplits down the call hierarchy up to the actual
> > > SplitReaders.
> > >
> > > Therefore, I would suggest to either raise an error or warning if the
> > > alignment is called but not supported at some point. I know we should
> > > carefully think about when this could be the case because we don't want
> > to
> > > flood anybody with such warnings. However, warnings could be an
> indicator
> > > for the user that for watermark split alignment use case split reading
> is
> > > imbalanced with the conclusion to either disable the trigger for
> > watermark
> > > split alignment (see Top B) or to use/implement a source and reader
> that
> > > fully supports split alignment.
> > >
> > > # Top D: How to design interfaces?
> > >
> > > Thanks for structuring the discussion with the the various
> possibilities
> > > (a-d). From the discussion and emails, I would like to summarize the
> > > following requirements:
> > > - Interfaces should be consistent ("symmetric"), i.e., similar
> semantics
> > > should have similar interfaces with similar usage.
> > > - Make explicit which implementations implement interfaces/support
> > > behavior.
> > > - Make clear what are default implementations and how to implement
> > > interfaces with desired behavior.
> > >
> > > This is a simplified view of the relations between relevant classes of
> > the
> > > PoC implementation:
> > >
> > > SourceReader (Public) <|-- SourceReaderBase (Internal) <|-- .. <|--
> > > MySourceReader
> > >
> > > MySourceReader <>-- SplitFetcherManager (Internal) <>-- SplitFetcher
> > > (Internal) <>-- SplitReader (Public) <|-- MySplitReader
> > >
> > > (A <|-- B: B inherits from A; A <>-- B: A "has a" B)
> > >
> > > Note that SourceReaderBase and SplitFetcherManager implement most of
> the
> > > "logic" for split alignment just because we wanted to implement split
> > > alignment and wanted it to be available as kind of a default. As a
> > > consequence, we have a "default implementation" for SourceReader that
> > > implements the actual logic for split alignment. For that reason, I
> find
> > it
> > > very confusing to have a NOOP default implementation in the interface
> for
> > > the SourceReader. As a consequence, interface strategy c) is difficult
> > > because this would require NOOP default implementations in the public
> > > interfaces of SourceReader and SplitReader. This is the same for
> strategy
> > > d) because it would require NOOP default implementation in the
> > > SourceReader. Further, as Dawid described method signatures of
> alignSplit
> > > for SourceReader and SplitReader differ and it would be extremely
> > difficult
> > > to make the signatures the same (with even potential performance impact
> > > because of additional loop-ups of split ids). Therefore, having a
> > symmetric
> > > decorative interface as of strategy a) is actually not possible and
> > having
> > > two decorative interfaces with different method signatures is
> confusing.
> > My
> > > conclusion is that we are best with strategy b) which means to have
> > > specializing sub-interfaces that inherit from the parent interface:
> > > SourceReader <|-- AlignedSourceReader, SplitReader <|--
> > AlignedSplitReader
> > > With this option, I'm not 100% sure what the implications are and if
> this
> > > could get nasty. I would suggest that Dawid and I just try to implement
> > and
> > > see if we like it. :)
> > >
> > > # Summary
> > >
> > > In conclusion, please let me know your perspectives. Please correct me,
> > if
> > > something is wrong or if I misunderstood something. My perspective
> would
> > be:
> > >
> > > Top A: Yes
> > > Top B: Yes (but disable watermark trigger for split alignment)
> > > Top C: No
> > > Top D: b)
> > >
> > > Best,
> > > Sebastian
> > >
> > > On Tue, Apr 26, 2022 at 9:55 AM Dawid Wysakowicz <
> dwysakowicz@apache.org
> > >
> > > wrote:
> > >
> > >> @Arvid:
> > >>
> > >> While I also like Becket's capability approach, I fear that it doesn't
> > work
> > >> for this particular use case: Sources can always be aligned cross-task
> > and
> > >> this is just about intra-task alignment. So it's plausible to put
> > sources
> > >> into an alignment group even though they do not use any of the
> presented
> > >> API of FLIP-217. They should just issue a warning, if they handle
> > multiple
> > >> splits (see motivation section).
> > >>
> > >> Yes, but the "supportXXX" method would be for telling if it supports
> > that
> > >> intra-task alignment. Cross-task alignment would always be supported.
> > >>
> > >> I updated interfaces to what I believe to be closest to a consensus
> > >> between all participants. Do you mind taking a look?
> > >>
> > >> @Sebastian Do you mind addressing the nits?
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> On 25/04/2022 13:39, Arvid Heise wrote:
> > >>
> > >> Thanks for pushing this effort.
> > >>
> > >> I'd actually be in favor of 1b). I fully agree that decorator
> interfaces
> > >> should be avoided but I'm also not a big fan of overloading the base
> > >> interfaces (they are hard to implement as is). The usual feedback to
> > >> Source-related interfaces are always that they are overwhelming and
> too
> > >> hard to implement. However, I'd also not oppose 1c) as scattered
> > interfaces
> > >> also have drawbacks. I'd just dislike 1a) and 1d).
> > >> While I also like Becket's capability approach, I fear that it doesn't
> > work
> > >> for this particular use case: Sources can always be aligned cross-task
> > and
> > >> this is just about intra-task alignment. So it's plausible to put
> > sources
> > >> into an alignment group even though they do not use any of the
> presented
> > >> API of FLIP-217. They should just issue a warning, if they handle
> > multiple
> > >> splits (see motivation section).
> > >>
> > >> I think renaming alignSplits to facilitate future use cases makes
> sense
> > but
> > >> then all interfaces (if 1c) is chosen) should be adjusted accordingly.
> > >> AlignedSourceReader could be PausingSourceReader and I'd go for
> > >> pauseOrResumeSplits (Becket's proposal afaik). We could also split it
> > into
> > >> pauseSplit and resumeSplit. While pauseOrResumeSplits may allow
> Sources
> > to
> > >> just use 1 instead of 2 library calls (as written in the Javadoc),
> both
> > >> Kafka and Pulsar can't use it and I'm not sure if there is a system
> that
> > >> can.
> > >>
> > >> Some nit for the FLIP:
> > >> - Please replace "stop" with "pause".
> > >> - Not sure if it's worth it in the capability section: Sources that
> > adopt
> > >> this interface cannot be used in earlier versions. So it feels like we
> > are
> > >> only forward compatible (old sources can be used after the change);
> but
> > I
> > >> guess this holds for any API addition.
> > >> - You might want to add what happens when all splits are paused.
> > >> - You may want to describe how the 3 flavors of SourceReaderBase
> > interact
> > >> with the interface.
> > >> - I'm not sure if it makes sense to include Kafka and Pulsar in the
> > FLIP.
> > >> For me, this is rather immediate follow-up work. (could be in the same
> > >> umbrella ticket)
> > >>
> > >> Best,
> > >>
> > >> Arvid
> > >>
> > >> On Mon, Apr 25, 2022 at 12:52 PM Dawid Wysakowicz <
> > dwysakowicz@apache.org> <dw...@apache.org>
> > >> wrote:
> > >>
> > >>
> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> along
> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
> > >> b) "MySourceReader implements AlignedSourceReader" and "MySplitReader
> > >> implements AlignedSplitReader", or
> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > implements
> > >> SplitReader".
> > >>
> > >> I think the latest proposal according to Dawid would be:
> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > implements
> > >> AlignedSplitReader".
> > >> I am fine with this API, although personally speaking I think it is
> > simpler
> > >> to just add a new method to the split reader with default impl.
> > >>
> > >>
> > >> I think that is a good idea to have it aligned as much as possible.
> I'd
> > be
> > >> +1 for your option c). We can merge AlignedSplitReader with
> > SplitReader. We
> > >> will update the FLIP shortly.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> On 25/04/2022 12:43, Becket Qin wrote:
> > >>
> > >> Thanks for the comment, Jark.
> > >>
> > >> 3. Interface/Method Name.
> > >>
> > >> Can the interface be used to align other things in the future? For
> > example,
> > >> align read speed, I have
> > >> seen users requesting global rate limits. This feature may also need
> an
> > >> interface like this.
> > >> If we don't plan to extend this interface to support align other
> > things, I
> > >> suggest explicitly declaring
> > >> the purpose of the methods, such as `alignWatermarksForSplits` instead
> > of
> > >> `alignSplits`.
> > >>
> > >> This is a good point. Naming wise, it would usually be more extensible
> > to
> > >> just describe what the method actually does, instead of assuming the
> > >> purpose of doing this. For example, in this case,
> pauseOrResumeSplits()
> > >> would be more extensible because this can be used for any kind of flow
> > >> control, be it watermark alignment or simple rate limiting.
> > >>
> > >> 4. Interface or Method.
> > >>
> > >> I don't have a strong opinion on this. I think they have their own
> > >> advantages.
> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > >> (SupportsXxxx) for TableSource/TableSink,
> > >> and I prefer Interfaces rather than methods in this case. When you
> have
> > a
> > >> bunch of abilities and each ability
> > >> has more than one method, Interfaces can help to organize them and
> make
> > >> users clear which methods
> > >> need to implement when you want to have an ability.
> > >>
> > >> I am OK with decorative interfaces if this is a general design pattern
> > in
> > >> the other components in Flink. But it looks like the current API
> > proposal
> > >> is not symmetric.
> > >>
> > >> The current proposal is essentially "MySourceReader implements
> > >> SourceReader, WithSplitsAlignment", along with "MySplitReader
> implements
> > >> AlignedSplitsReader".
> > >>
> > >> Should we make the API symmetric? I'd consider any one of the
> following
> > as
> > >> symmetric.
> > >>
> > >> a) "MySourceReader implements SourceReader, WithSplitsAlignment",
> along
> > >> with "MySplitReader implements SplitReader, WithSplitsAlignment", or
> > >> b) "MySourceReader implements AlignedSourceReader" and "MySplitReader
> > >> implements AlignedSplitReader", or
> > >> c) "MySourceReader implements SourceReader" and "MySplitReader
> > implements
> > >> SplitReader".
> > >>
> > >> I think the latest proposal according to Dawid would be:
> > >> d) "MySourceReader implements SourceReader" and "MySplitReader
> > implements
> > >> AlignedSplitReader".
> > >> I am fine with this API, although personally speaking I think it is
> > simpler
> > >> to just add a new method to the split reader with default impl.
> > >>
> > >> @Dawid Wysakowicz <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org> <dw...@apache.org>, thanks for the reply.
> > >>
> > >> Having said that, as I don't have a preference and I agree most of the
> > >>
> > >> sources will support the alignment I am fine following your suggestion
> > to
> > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
> but
> > >> would put the "supportsXXX" there, not in the Source to keep the two
> > >> methods together.
> > >>
> > >> One benefit of having the "supportsXXX" in Source is that this allows
> > some
> > >> compile time check. For example, if a user enabled watermark alignment
> > >> while it is not supported by the Source, an exception can be thrown at
> > >> compile time. It seems in general useful. That said, I agree that API
> > >> cleanliness wise it is better to put the two methods together.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Mon, Apr 25, 2022 at 5:56 PM Jark Wu <im...@gmail.com> <
> > imjark@gmail.com> <im...@gmail.com> <im...@gmail.com> wrote:
> > >>
> > >>
> > >> Thank Dawid for the reminder on FLIP-182. Sorry I did miss it.
> > >> I don't have other concerns then.
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Mon, 25 Apr 2022 at 15:40, Dawid Wysakowicz <
> dwysakowicz@apache.org>
> > <dw...@apache.org> <dw...@apache.org> <
> dwysakowicz@apache.org>
> > >> wrote:
> > >>
> > >>
> > >> @Jark:
> > >>
> > >> 1. Will the framework always align with watermarks when the source
> > >> implements the interface?
> > >> I'm afraid not every case needs watermark alignment even if Kafka
> > >> implements the interface,
> > >> and this will affect the throughput somehow. I agree with Becket
> > >> we may need a
> > >> `supportSplitsAlignment()` method for users to configure the source to
> > >> enable/disable the alignment.
> > >>
> > >> 2. How does the framework calculate maxDesiredWatermark?
> > >> I think the algorithm of maxDesiredWatermark will greatly affect
> > >>
> > >> throughput
> > >>
> > >> if the reader is constantly
> > >>  switching between pause and resume. Can users configure the alignment
> > >> offset?
> > >>
> > >>
> > >> This is covered in the previous FLIP[1] which has been already
> > >>
> > >> implemented
> > >>
> > >> in 1.15. In short, it must be enabled with the watermark strategy
> which
> > >> also configures drift and update interval.
> > >>
> > >> If we don't plan to extend this interface to support align other
> things,
> > >>
> > >> I
> > >>
> > >> suggest explicitly declaring
> > >> the purpose of the methods, such as `alignWatermarksForSplits` instead
> > of
> > >> `alignSplits`.
> > >>
> > >>
> > >> Sure let's rename it.
> > >>
> > >> @Becket:
> > >>
> > >> I understand your point. On the other hand putting all methods, even
> > with
> > >> "supportsXXX" methods for enabling certain features, makes the entry
> > >> threshold for writing a new source higher. Instead of focusing on the
> > >>
> > >> basic
> > >>
> > >> and required properties of the Source, the person implementing a
> source
> > >> must bother with and need to figure out what all of the extra features
> > >>
> > >> are
> > >>
> > >> about and how to deal with them. It makes it also harder to organize
> > >> methods in coupled groups as Jark said.
> > >>
> > >> Having said that, as I don't have a preference and I agree most of the
> > >> sources will support the alignment I am fine following your suggestion
> > to
> > >> have the SourceReader extending from WithWatermarksSplitsAlignment,
> but
> > >> would put the "supportsXXX" there, not in the Source to keep the two
> > >> methods together.
> > >>
> > >> Lastly, I agree it is really unfortunate the "alignSplits" methods
> > differ
> > >> slightly for SourceReader and SpitReader. The reason for that is
> > >> SourceReaderBase deals only with SplitIds, whereas SplitReader needs
> the
> > >> actual splits to pause them. I found the discrepancy acceptable for
> the
> > >> sake of simplifying changes significantly, especially as they would
> > >>
> > >> highly
> > >>
> > >> likely impact performance as we would have to perform additional
> > lookups.
> > >> Moreover the SplitReader is a secondary interface.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> [1] https://cwiki.apache.org/confluence/x/hQYBCw
> > >>
> > >> On 24/04/2022 17:15, Jark Wu wrote:
> > >>
> > >> Thanks for the effort, Dawid and Sebastian!
> > >>
> > >> I just have some minor questions (maybe I missed something).
> > >>
> > >> 1. Will the framework always align with watermarks when the source
> > >> implements the interface?
> > >> I'm afraid not every case needs watermark alignment even if Kafka
> > >> implements the interface,
> > >> and this will affect the throughput somehow. I agree with Becket
> > >> we may need a
> > >> `supportSplitsAlignment()` method for users to configure the source to
> > >> enable/disable the alignment.
> > >>
> > >> 2. How does the framework calculate maxDesiredWatermark?
> > >> I think the algorithm of maxDesiredWatermark will greatly affect
> > >>
> > >> throughput
> > >>
> > >> if the reader is constantly
> > >>  switching between pause and resume. Can users configure the alignment
> > >> offset?
> > >>
> > >> 3. Interface/Method Name.
> > >> Can the interface be used to align other things in the future? For
> > >>
> > >> example,
> > >>
> > >> align read speed, I have
> > >> seen users requesting global rate limits. This feature may also need
> an
> > >> interface like this.
> > >> If we don't plan to extend this interface to support align other
> things,
> > >>
> > >> I
> > >>
> > >> suggest explicitly declaring
> > >> the purpose of the methods, such as `alignWatermarksForSplits` instead
> > of
> > >> `alignSplits`.
> > >>
> > >> 4. Interface or Method.
> > >> I don't have a strong opinion on this. I think they have their own
> > >> advantages.
> > >> In Flink SQL, we heavily use Interfaces for extending abilities
> > >> (SupportsXxxx) for TableSource/TableSink,
> > >> and I prefer Interfaces rather than methods in this case. When you
> have
> > a
> > >> bunch of abilities and each ability
> > >> has more than one method, Interfaces can help to organize them and
> make
> > >> users clear which methods
> > >> need to implement when you want to have an ability.
> > >>
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> On Sun, 24 Apr 2022 at 18:13, Becket Qin <be...@gmail.com> <
> > becket.qin@gmail.com> <be...@gmail.com> <be...@gmail.com> <
> > >> becket.qin@gmail.com> wrote:
> > >>
> > >> Hi Dawid,
> > >>
> > >> Thanks for the explanation. Apologies that I somehow misread a bunch
> of
> > >> "align" and thought they were "assign".
> > >>
> > >> Regarding 1, by default implementation, I was thinking of the default
> > >>
> > >> no-op
> > >>
> > >> implementation. I am a little worried about the proliferation of
> > >>
> > >> decorative
> > >>
> > >> interfaces. I think the most important thing about interfaces is that
> > >>
> > >> they
> > >>
> > >> are easy to understand. In this case, I prefer adding new method to
> the
> > >> existing interface for the following reasons:
> > >>
> > >> a) I feel the biggest drawback of decorative interfaces is which
> > >>
> > >> interface
> > >>
> > >> they can decorate and which combinations of multiple decorative
> > >>
> > >> interfaces
> > >>
> > >> are valid. In the current FLIP, the withSplitsAlignment interface is
> > only
> > >> applicable to the SourceReader which means it can't decorate any other
> > >> interface. From an interface design perspective, a natural question is
> > >>
> > >> why
> > >>
> > >> not let "AlignedSplitReader" extend "withSplitsAlignment"? And it is
> > also
> > >> natural to assume that a split reader implementing both SplitReader
> and
> > >> WithSplitAlignment would work, because a source reader implementing
> > >> SourceReader and withSplitsAlignment works. So why isn't there an
> > >>
> > >> interface
> > >>
> > >> of AlignedSourceReader? In the future, if there is a new feature added
> > >> (e.g. sorted or pre-partitioned data aware), are we going to create
> > >>
> > >> another
> > >>
> > >> interface of SplitReader such as SortedSplitReader or
> > >>
> > >> PrePartitionedAware?
> > >>
> > >> Can they be combined? So I think the additional decorative interface
> > like
> > >> withSplitsAlignment actually increases the understanding cost of users
> > >> because they have to know what decorative interfaces are there, which
> > >> interface they can decorate and which combinations of the decorative
> > >> interfaces are valid and which are not. Ideally we want to avoid that.
> > To
> > >> be clear, I am not opposing having an interface of
> withSplitsAlignment,
> > >>
> > >> it
> > >>
> > >> is completely OK to have it as an internal interface and let
> > SourceReader
> > >> and SplitReader both extend it.
> > >>
> > >> b) Adding a new method to the SourceReader with a default
> implementation
> > >>
> > >> of
> > >>
> > >> no-op would help avoid logic branching in the source logic, especially
> > >> given that we agree that the vast majority of the SourceReader
> > >> implementations, if not all, would just extend from the
> > SourceReaderBase.
> > >> That means adding a new method to the interface would effectively give
> > >>
> > >> the
> > >>
> > >> same user experience, but simpler.
> > >>
> > >> c) A related design principle that may be worth discussing is how do
> we
> > >>
> > >> let
> > >>
> > >> the Source implementations tell Flink what capability is supported and
> > >>
> > >> what
> > >>
> > >> is not. Personally speaking I feel the most intuitive place to me is
> in
> > >>
> > >> the
> > >>
> > >> Source itself, because that is the entrance of the entire Source
> > >>
> > >> connector
> > >>
> > >> logic.
> > >>
> > >> Based on the above thoughts, I am wondering if the following interface
> > >> would be easier to understand by the users.
> > >>
> > >> - Change "withSplitsAlignment" to internal interface, let both
> > >>
> > >> SourceReader
> > >>
> > >> and SplitReader extend from it, with a default no-op implementation.
> > >> - Add a new method "boolean supportSplitsAlignment()" to the Source
> > >> interface, with a default implementation returning false. Sources that
> > >>
> > >> have
> > >>
> > >> implemented the alignment logic can change this to return true, and
> > >> override the alignSplits() methods in the SourceReader / SplitReader
> if
> > >> needed.
> > >> - In the future, if a new optional feature is going to be added to the
> > >> Source, and that feature requires the awareness from Flink, we can add
> > >>
> > >> more
> > >>
> > >> such methods to the Source.
> > >>
> > >> What do you think?
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Apr 22, 2022 at 4:05 PM Dawid Wysakowicz <
> > dwysakowicz@apache.org> <dw...@apache.org> <dwysakowicz@apache.org
> >
> > <dw...@apache.org>
> > >> <dw...@apache.org> <dw...@apache.org> <
> > dwysakowicz@apache.org> <dw...@apache.org>
> > >>
> > >> wrote:
> > >>
> > >>
> > >> @Konstantin:
> > >>
> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the
> stop &
> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> correct?
> > >>
> > >> Correct, as far as I know though, those are the only sources which
> > >>
> > >> consume
> > >>
> > >> concurrently from multiple splits and thus alignment applies.
> > >>
> > >> @Thomas:
> > >>
> > >> I wonder if "supporting" split alignment in SourceReaderBase and then
> > >>
> > >> doing
> > >>
> > >> nothing if the split reader does not implement AlignedSplitReader
> could
> > >>
> > >> be
> > >>
> > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
> that
> > >> the source actually supports it.
> > >>
> > >> I understand your concern. Hmm, I think we could actually do that.
> Given
> > >> the actual implementation of the SourceReaderBase#alignSplits is
> rather
> > >> short (just a forward to the corresponding method of SplitFetcher), we
> > >> could reimplement it in the actual source implementations. This
> solution
> > >> has the downside though. Authors of new sources would have to do two
> > >> things: extend from AlignedSplitReader and implement
> > >>
> > >> WithSplitsAssignment,
> > >>
> > >> instead of just extending AlignedSplitReader. I would be fine with
> such
> > a
> > >> tradeoff though. What others think?
> > >>
> > >> @Steven:
> > >>
> > >> For this part from the motivation section, is it accurate? Let's
> assume
> > >>
> > >> one
> > >>
> > >> source task consumes from 3 partitions and one of the partition is
> > >> significantly slower. In this situation, watermark for this source
> task
> > >> won't hold back as it is reading recent data from other two Kafka
> > >> partitions. As a result, it won't hold back the overall watermark. I
> > >> thought the problem is that we may have late data for this slow
> > >>
> > >> partition.
> > >>
> > >> It will hold back the watermark. Watermark of an operator is the
> minimum
> > >> of watermarks of all splits[1]
> > >>
> > >> I have another question about the restart. Say split alignment is
> > >> triggered. checkpoint is completed. job failed and restored from the
> > last
> > >> checkpoint. because alignment decision is not checkpointed, initially
> > >> alignment won't be enforced until we get a cycle of watermark
> > aggregation
> > >> and propagation, right? Not saying this corner is a problem. Just want
> > to
> > >> understand it more.
> > >>
> > >> Your understanding is correct.
> > >>
> > >> @Becket:
> > >>
> > >> 1. I think watermark alignment is sort of a general use case, so
> should
> > >>
> > >> we
> > >>
> > >> just add the related methods to SourceReader directly instead of
> > >> introducing the new interface of WithSplitAssignment? We can provide
> > >> default implementations, so backwards compatibility won't be an issue.
> > >>
> > >> I don't think we can provide a default implementation. How would we do
> > >> that? Would it be just a no-op? Is it better than having an opt-in
> > >> interface? The default implementation would have to be added
> exclusively
> > >>
> > >> in
> > >>
> > >> a *Public* SourceReader interface. By the way notice SourceReaderBase
> > >> does extend from WithSplitsAlignment, so effectively all
> implementations
> > >>
> > >> do
> > >>
> > >> handle the alignment case. To be honest I think it is impossible to
> > >> implement the SourceReader interface directly by end users.
> > >>
> > >> 2. As you mentioned, the SplitReader interface probably also needs
> some
> > >> change to support throttling at the split granularity. Can you add
> that
> > >> interface change into the public interface section as well?
> > >>
> > >> It has been added from the beginning. See *AlignedSplitReader.*
> > >>
> > >> 3. Nit, can we avoid using the method name assignSplits here, given
> that
> > >>
> > >> it
> > >>
> > >> is not actually changing the split assignments? It seems something
> like
> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
> > >>
> > >> The method's called *alignSplits*, not assign. Do you still prefer a
> > >> different name for that? Personally, I am open for suggestions here.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> [1]
> > >>
> > >>
> > >>
> > >>
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/sources/#watermark-generation
> > >>
> > >> On 22/04/2022 05:59, Becket Qin wrote:
> > >>
> > >> Thanks for driving the effort, Sebastion. I think the motivation
> makes a
> > >> lot of sense. Just a few suggestions / questions.
> > >>
> > >> 1. I think watermark alignment is sort of a general use case, so
> should
> > >>
> > >> we
> > >>
> > >> just add the related methods to SourceReader directly instead of
> > >> introducing the new interface of WithSplitAssignment? We can provide
> > >> default implementations, so backwards compatibility won't be an issue.
> > >>
> > >> 2. As you mentioned, the SplitReader interface probably also needs
> some
> > >> change to support throttling at the split granularity. Can you add
> that
> > >> interface change into the public interface section as well?
> > >>
> > >> 3. Nit, can we avoid using the method name assignSplits here, given
> that
> > >>
> > >> it
> > >>
> > >> is not actually changing the split assignments? It seems something
> like
> > >> pauseOrResumeSplits(), or adjustSplitsThrottling() is more accurate.
> > >>
> > >> Thanks,
> > >>
> > >> Jiangjie (Becket) Qin
> > >>
> > >> On Thu, Apr 21, 2022 at 11:39 PM Steven Wu <st...@gmail.com> <
> > stevenz3wu@gmail.com> <st...@gmail.com> <st...@gmail.com> <
> > >> stevenz3wu@gmail.com> <
> > >> stevenz3wu@gmail.com> wrote:
> > >>
> > >> However, a single source operator may read data from multiple
> > >>
> > >> splits/partitions, e.g., multiple Kafka partitions, such that even
> with
> > >> watermark alignment the source operator may need to buffer excessive
> > >>
> > >> amount
> > >>
> > >> of data if one split emits data faster than another.
> > >>
> > >> For this part from the motivation section, is it accurate? Let's
> assume
> > >>
> > >> one
> > >>
> > >> source task consumes from 3 partitions and one of the partition is
> > >> significantly slower. In this situation, watermark for this source
> task
> > >> won't hold back as it is reading recent data from other two Kafka
> > >> partitions. As a result, it won't hold back the overall watermark. I
> > >> thought the problem is that we may have late data for this slow
> > >>
> > >> partition.
> > >>
> > >> I have another question about the restart. Say split alignment is
> > >> triggered. checkpoint is completed. job failed and restored from the
> > last
> > >> checkpoint. because alignment decision is not checkpointed, initially
> > >> alignment won't be enforced until we get a cycle of watermark
> > aggregation
> > >> and propagation, right? Not saying this corner is a problem. Just want
> > to
> > >> understand it more.
> > >>
> > >>
> > >>
> > >> On Thu, Apr 21, 2022 at 8:20 AM Thomas Weise <th...@apache.org> <
> > thw@apache.org> <th...@apache.org> <th...@apache.org> <
> > >> thw@apache.org> <
> > >> thw@apache.org> wrote:
> > >>
> > >> Thanks for working on this!
> > >>
> > >> I wonder if "supporting" split alignment in SourceReaderBase and then
> > >>
> > >> doing
> > >>
> > >> nothing if the split reader does not implement AlignedSplitReader
> could
> > >>
> > >> be
> > >>
> > >> misleading? Perhaps WithSplitsAlignment can instead be added to the
> > >> specific source reader (i.e. KafkaSourceReader) to make it explicit
> that
> > >> the source actually supports it.
> > >>
> > >> Thanks,
> > >> Thomas
> > >>
> > >>
> > >> On Thu, Apr 21, 2022 at 4:57 AM Konstantin Knauf <kn...@apache.org>
> <
> > knaufk@apache.org> <kn...@apache.org> <kn...@apache.org> <
> > >> knaufk@apache.org> <
> > >> knaufk@apache.org>
> > >>
> > >> wrote:
> > >>
> > >>
> > >> Hi Sebastian, Hi Dawid,
> > >>
> > >> As part of this FLIP, the `AlignedSplitReader` interface (aka the stop
> > >>
> > >> &
> > >>
> > >> resume behavior) will be implemented for Kafka and Pulsar only,
> > >>
> > >> correct?
> > >>
> > >> +1 in general. I believe it is valuable to complete the watermark
> > >>
> > >> aligned
> > >>
> > >> story with this FLIP.
> > >>
> > >> Cheers,
> > >>
> > >> Konstantin
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Apr 21, 2022 at 12:36 PM Dawid Wysakowicz <
> > dwysakowicz@apache.org> <dw...@apache.org>
> > >>
> > >> wrote:
> > >>
> > >>
> > >> To be explicit, having worked on it, I support it ;) I think we can
> > >> start a vote thread soonish, as there are no concerns so far.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> On 13/04/2022 11:27, Sebastian Mattheis wrote:
> > >>
> > >> Dear Flink developers,
> > >>
> > >> I would like to open a discussion on FLIP 217 [1] for an extension
> > >>
> > >> of
> > >>
> > >> Watermark Alignment to perform alignment also in SplitReaders. To
> > >>
> > >> do
> > >>
> > >> so,
> > >>
> > >> SplitReaders must be able to suspend and resume reading from split
> > >>
> > >> sources
> > >>
> > >> where the SourceOperator coordinates and controlls suspend and
> > >>
> > >> resume.
> > >>
> > >> To
> > >>
> > >> gather information about current watermarks of the SplitReaders, we
> > >>
> > >> extend
> > >>
> > >> the internal WatermarkOutputMulitplexer and report watermarks to
> > >>
> > >> the
> > >>
> > >> SourceOperator.
> > >>
> > >> There is a PoC for this FLIP [2], prototyped by Arvid Heise and
> > >>
> > >> revised
> > >>
> > >> and
> > >>
> > >> reworked by Dawid Wysakowicz (He did most of the work.) and me. The
> > >>
> > >> changes
> > >>
> > >> are backwards compatible in a way that if affected components do
> > >>
> > >> not
> > >>
> > >> support split alignment the behavior is as before.
> > >>
> > >> Best,
> > >> Sebastian
> > >>
> > >> [1]
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-217+Support+watermark+alignment+of+source+splits
> > >>
> > >> [2] https://github.com/dawidwys/flink/tree/aligned-splits
> > >>
> > >> --
> > >>
> > >> Konstantin Knaufhttps://
> twitter.com/snntrablehttps://github.com/knaufk
> > >>
> > >>
> >
>