You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Yangze Guo <ka...@gmail.com> on 2021/01/07 03:07:39 UTC

[DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Hi, there,

We would like to start a discussion thread on "FLIP-156: Runtime
Interfaces for Fine-Grained Resource Requirements"[1], where we
propose Slot Sharing Group (SSG) based runtime interfaces for
specifying fine-grained resource requirements.

In this FLIP:
- Expound the user story of fine-grained resource management.
- Propose runtime interfaces for specifying SSG-based resource requirements.
- Discuss the pros and cons of the three potential granularities for
specifying the resource requirements (op, task and slot sharing group)
and explain why we choose the slot sharing group.

Please find more details in the FLIP wiki document [1]. Looking
forward to your feedback.

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements

Best,
Yangze Guo

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Yangze Guo <ka...@gmail.com>.

Thanks for your feedback.

@Till
> the only option for a scheduler which does not support slot sharing groups is to say that every operator in this slot sharing group needs a slot with the same resources as the whole group.
At the moment, all the implementations of the scheduler respect the
slot sharing group. Regarding your example, in that case, user can
directly split two operators into two slot sharing groups with 100M
respectively.

> If all operators have their resources properly specified, then slot sharing is no longer needed.
I also agree with it. However, specifying resource requirements for
each operator is impractical for complex jobs that contain tens or
even hundreds of operators. It's also hard to have a default value for
operator resource requirements.

The SSG-based approach makes the user's configuration more flexible.
In many cases, users just care about/know the resource requirement of
some subgraphs. Enforcing them to provide more information harms the
usability. If the expert user knows more fine-grained resource
requirements, the operator granularity resource requirements can be
realized by configuring the slot sharing group arrangement.

@Chesney
> Will declaring them on slot sharing groups not also waste resources if the parallelism of operators within that group are different?
Yes, we list it as one of the cons of the SSG-based approach. In that
case, user needs to separate operators with different parallelisms
into different SSGs. However, compared to the benefits we list, we
tend to treat it as a trade-off between usability and resource
utilization for the user to decide. All in all, fine-grained resource
management is for expert users to further optimize resource
utilization, such an extra effort might be worth it.

> It also seems like quite a hassle for users having to recalculate the resource requirements if they change the slot sharing.
If an expert user knows the exact resource requirements of each
operator, they can just place each operator in different slot sharing
groups. If they want some of them placed in the same slot, they just
need to sum up the resource requirements of those operators. There is
no need to maintain the resource requirement of a set of re-usable
operators.

> My main worry is that if we wire the runtime to work on SSGs it's gonna be difficult to implement more fine-grained approaches.
One of the important reasons we choose the SSG-based approach is that
we find that the slot is the basic unit for resource management in
Flink’s runtime.
- Runtime interfaces should only require the minimum set of
information needed. Operator-level resource requirements will be
converted to Slot-level.
- So far, the end-user interfaces for specifying resource requirements
are still under discussion. For runtime interfaces, it should only
require the minimum set of information needed for resource management.



Best,
Yangze Guo



On Thu, Jan 7, 2021 at 10:00 PM Chesnay Schepler <ch...@apache.org> wrote:
>
> Will declaring them on slot sharing groups not also waste resources if
> the parallelism of operators within that group are different?
>
> It also seems like quite a hassle for users having to recalculate the
> resource requirements if they change the slot sharing.
> I'd think that it's not really workable for users that create a set of
> re-usable operators which are mixed and matched in their applications;
> managing the resources requirements in such a setting would be a
> nightmare, and in the end would require operator-level requirements any way.
> In that sense, I'm not even sure whether it really increases usability.
>
> My main worry is that it if we wire the runtime to work on SSGs it's
> gonna be difficult to implement more fine-grained approaches, which
> would not be the case if, for the runtime, they are always defined on an
> operator-level.
>
> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > Thanks for drafting this FLIP and starting this discussion Yangze.
> >
> > I like that defining resource requirements on a slot sharing group makes
> > the overall setup easier and improves usability of resource requirements.
> >
> > What I do not like about it is that it changes slot sharing groups from
> > being a scheduling hint to something which needs to be supported in order
> > to support fine grained resource requirements. So far, the idea of slot
> > sharing groups was that it tells the system that a set of operators can be
> > deployed in the same slot. But the system still had the freedom to say that
> > it would rather place these tasks in different slots if it wanted. If we
> > now specify resource requirements on a per slot sharing group, then the
> > only option for a scheduler which does not support slot sharing groups is
> > to say that every operator in this slot sharing group needs a slot with the
> > same resources as the whole group.
> >
> > So for example, if we have a job consisting of two operator op_1 and op_2
> > where each op needs 100 MB of memory, we would then say that the slot
> > sharing group needs 200 MB of memory to run. If we have a cluster with 2
> > TMs with one slot of 100 MB each, then the system cannot run this job. If
> > the resources were specified on an operator level, then the system could
> > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> >
> > Originally, one of the primary goals of slot sharing groups was to make it
> > easier for the user to reason about how many slots a job needs independent
> > of the actual number of operators in the job. Interestingly, if all
> > operators have their resources properly specified, then slot sharing is no
> > longer needed because Flink could slice off the appropriately sized slots
> > for every Task individually. What matters is whether the whole cluster has
> > enough resources to run all tasks or not.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> >> Hi, there,
> >>
> >> We would like to start a discussion thread on "FLIP-156: Runtime
> >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> >> propose Slot Sharing Group (SSG) based runtime interfaces for
> >> specifying fine-grained resource requirements.
> >>
> >> In this FLIP:
> >> - Expound the user story of fine-grained resource management.
> >> - Propose runtime interfaces for specifying SSG-based resource
> >> requirements.
> >> - Discuss the pros and cons of the three potential granularities for
> >> specifying the resource requirements (op, task and slot sharing group)
> >> and explain why we choose the slot sharing group.
> >>
> >> Please find more details in the FLIP wiki document [1]. Looking
> >> forward to your feedback.
> >>
> >> [1]
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >>
> >> Best,
> >> Yangze Guo
> >>
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Yangze Guo <ka...@gmail.com>.

@Till

Also +1 to treat the SSG resource requirements as a hint instead of a
restrict. We can treat it as a follow-up effort and make it clear in
JavaDocs at the first step.

Best,
Yangze Guo

On Thu, Jan 21, 2021 at 10:00 AM Xintong Song <to...@gmail.com> wrote:
>
> I think this makes sense.
>
> The semantic of a SSG is that operators in the group *can* be scheduled
> together in a slot, which is not a *must*. Specifying resources for SSGs
> should not change that semantic. In cases that needs for scheduling the
> operators into different slots arise, it makes sense for the runtime to
> derive the finer grained resource requirements, if not provided.
>
> We may not need to implement this at the moment since currently SSGs are
> always respected, but we should make that semantic explicit in JavaDocs for
> the interfaces and user documentations when the user APIs are exposed.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Jan 21, 2021 at 1:55 AM Till Rohrmann <tr...@apache.org> wrote:
>
> > Maybe a different minor idea: Would it be possible to treat the SSG
> > resource requirements as a hint for the runtime similar to how slot sharing
> > groups are designed at the moment? Meaning that we don't give the guarantee
> > that Flink will always deploy this set of tasks together no matter what
> > comes. If, for example, the runtime can derive by some means the resource
> > requirements for each task based on the requirements for the SSG, this
> > could be possible. One easy strategy would be to give every task the same
> > resources as the whole slot sharing group. Another one could be
> > distributing the resources equally among the tasks. This does not even have
> > to be implemented but we would give ourselves the freedom to change
> > scheduling if need should arise.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks for the responses, Till and Xintong.
> > >
> > > I second Xintong's comment that SSG-based runtime interface will give
> > > us the flexibility to achieve op/task-based approach. That's one of
> > > the most important reasons for our design choice.
> > >
> > > Some cents regarding the default operator resource:
> > > - It might be good for the scenario of DataStream jobs.
> > >    ** For light-weight operators, the accumulative configuration error
> > > will not be significant. Then, the resource of a task used is
> > > proportional to the number of operators it contains.
> > >    ** For heavy operators like join and window or operators using the
> > > external resources, user will turn to the fine-grained resource
> > > configuration.
> > > - It can increase the stability for the standalone cluster where task
> > > executors registered are heterogeneous(with different default slot
> > > resources).
> > > - It might not be good for SQL users. The operators that SQL will be
> > > transferred to is a black box to the user. We also do not guarantee
> > > the cross-version of consistency of the transformation so far.
> > >
> > > I think it can be treated as a follow-up work when the fine-grained
> > > resource management is end-to-end ready.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > >
> > > On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks for the feedback, Till.
> > > >
> > > > ## I feel that what you proposed (operator-based + default value) might
> > > be
> > > > subsumed by the SSG-based approach.
> > > > Thinking of op_1 -> op_2, there are the following 4 cases, categorized
> > by
> > > > whether the resource requirements are known to the users.
> > > >
> > > >    1. *Both known.* As previously mentioned, there's no reason to put
> > > >    multiple operators whose individual resource requirements are
> > already
> > > known
> > > >    into the same group in fine-grained resource management. And if op_1
> > > and
> > > >    op_2 are in different groups, there should be no problem switching
> > > data
> > > >    exchange mode from pipelined to blocking. This is equivalent to
> > > specifying
> > > >    operator resource requirements in your proposal.
> > > >    2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is
> > in a
> > > >    SSG whose resource is not specified thus would have the default slot
> > > >    resource. This is equivalent to having default operator resources in
> > > your
> > > >    proposal.
> > > >    3. *Both unknown*. The user can either set op_1 and op_2 to the same
> > > SSG
> > > >    or separate SSGs.
> > > >       - If op_1 and op_2 are in the same SSG, it will be equivalent to
> > > the
> > > >       coarse-grained resource management, where op_1 and op_2 share a
> > > default
> > > >       size slot no matter which data exchange mode is used.
> > > >       - If op_1 and op_2 are in different SSGs, then each of them will
> > > use
> > > >       a default size slot. This is equivalent to setting them with
> > > default
> > > >       operator resources in your proposal.
> > > >    4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
> > > >       - It is possible that the user learns the total / max resource
> > > >       requirement from executing and monitoring the job, while not
> > > > being aware of
> > > >       individual operator requirements.
> > > >       - I believe this is the case your proposal does not cover. And
> > TBH,
> > > >       this is probably how most users learn the resource requirements,
> > > > according
> > > >       to my experiences.
> > > >       - In this case, the user might need to specify different
> > resources
> > > if
> > > >       he wants to switch the execution mode, which should not be worse
> > > than not
> > > >       being able to use fine-grained resource management.
> > > >
> > > >
> > > > ## An additional idea inspired by your proposal.
> > > > We may provide multiple options for deciding resources for SSGs whose
> > > > requirement is not specified, if needed.
> > > >
> > > >    - Default slot resource (current design)
> > > >    - Default operator resource times number of operators (equivalent to
> > > >    your proposal)
> > > >
> > > >
> > > > ## Exposing internal runtime strategies
> > > > Theoretically, yes. Tying to the SSGs, the resource requirements might
> > be
> > > > affected if how SSGs are internally handled changes in future.
> > > Practically,
> > > > I do not concretely see at the moment what kind of changes we may want
> > in
> > > > future that might conflict with this FLIP proposal, as the question of
> > > > switching data exchange mode answered above. I'd suggest to not give up
> > > the
> > > > user friendliness we may gain now for the future problems that may or
> > may
> > > > not exist.
> > > >
> > > > Moreover, the SSG-based approach has the flexibility to achieve the
> > > > equivalent behavior as the operator-based approach, if we set each
> > > operator
> > > > (or task) to a separate SSG. We can even provide a shortcut option to
> > > > automatically do that for users, if needed.
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > > >
> > > > > Thanks for the responses Xintong and Stephan,
> > > > >
> > > > > I agree that being able to define the resource requirements for a
> > > group of
> > > > > operators is more user friendly. However, my concern is that we are
> > > > > exposing thereby internal runtime strategies which might limit our
> > > > > flexibility to execute a given job. Moreover, the semantics of
> > > configuring
> > > > > resource requirements for SSGs could break if switching from
> > streaming
> > > to
> > > > > batch execution. If one defines the resource requirements for op_1 ->
> > > op_2
> > > > > which run in pipelined mode when using the streaming execution, then
> > > how do
> > > > > we interpret these requirements when op_1 -> op_2 are executed with a
> > > > > blocking data exchange in batch execution mode? Consequently, I am
> > > still
> > > > > leaning towards Stephan's proposal to set the resource requirements
> > per
> > > > > operator.
> > > > >
> > > > > Maybe the following proposal makes the configuration easier: If the
> > > user
> > > > > wants to use fine-grained resource requirements, then she needs to
> > > specify
> > > > > the default size which is used for operators which have no explicit
> > > > > resource annotation. If this holds true, then every operator would
> > > have a
> > > > > resource requirement and the system can try to execute the operators
> > > in the
> > > > > best possible manner w/o being constrained by how the user set the
> > SSG
> > > > > requirements.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the feedback, Stephan.
> > > > > >
> > > > > > Actually, your proposal has also come to my mind at some point.
> > And I
> > > > > have
> > > > > > some concerns about it.
> > > > > >
> > > > > >
> > > > > > 1. It does not give users the same control as the SSG-based
> > approach.
> > > > > >
> > > > > >
> > > > > > While both approaches do not require specifying for each operator,
> > > > > > SSG-based approach supports the semantic that "some operators
> > > together
> > > > > use
> > > > > > this much resource" while the operator-based approach doesn't.
> > > > > >
> > > > > >
> > > > > > Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
> > > at
> > > > > some
> > > > > > point there's an agg o_n (1 < n < m) which significantly reduces
> > the
> > > data
> > > > > > amount. One can separate the pipeline into 2 groups SSG_1 (o_1,
> > ...,
> > > o_n)
> > > > > > and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> > > parallelisms
> > > > > > for operators in SSG_1 than for operators in SSG_2 won't lead to
> > too
> > > much
> > > > > > wasting of resources. If the two SSGs end up needing different
> > > resources,
> > > > > > with the SSG-based approach one can directly specify resources for
> > > the
> > > > > two
> > > > > > groups. However, with the operator-based approach, the user will
> > > have to
> > > > > > specify resources for each operator in one of the two groups, and
> > > tune
> > > > > the
> > > > > > default slot resource via configurations to fit the other group.
> > > > > >
> > > > > >
> > > > > > 2. It increases the chance of breaking operator chains.
> > > > > >
> > > > > >
> > > > > > Setting chainnable operators into different slot sharing groups
> > will
> > > > > > prevent them from being chained. In the current implementation,
> > > > > downstream
> > > > > > operators, if SSG not explicitly specified, will be set to the same
> > > group
> > > > > > as the chainable upstream operators (unless multiple upstream
> > > operators
> > > > > in
> > > > > > different groups), to reduce the chance of breaking chains.
> > > > > >
> > > > > >
> > > > > > Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
> > > SSGs
> > > > > > based on whether resource is specified we will easily get groups
> > like
> > > > > (o_1,
> > > > > > o_3) & (o_2, o_4), where none of the operators can be chained. This
> > > is
> > > > > also
> > > > > > possible for the SSG-based approach, but I believe the chance is
> > much
> > > > > > smaller because there's no strong reason for users to specify the
> > > groups
> > > > > > with alternate operators like that. We are more likely to get
> > groups
> > > like
> > > > > > (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2
> > and
> > > o_3.
> > > > > >
> > > > > >
> > > > > > 3. It complicates the system by having two different mechanisms for
> > > > > sharing
> > > > > > managed memory in  a slot.
> > > > > >
> > > > > >
> > > > > > - In FLIP-141, we introduced the intra-slot managed memory sharing
> > > > > > mechanism, where managed memory is first distributed according to
> > the
> > > > > > consumer type, then further distributed across operators of that
> > > consumer
> > > > > > type.
> > > > > >
> > > > > > - With the operator-based approach, managed memory size specified
> > > for an
> > > > > > operator should account for all the consumer types of that
> > operator.
> > > That
> > > > > > means the managed memory is first distributed across operators,
> > then
> > > > > > distributed to different consumer types of each operator.
> > > > > >
> > > > > >
> > > > > > Unfortunately, the different order of the two calculation steps can
> > > lead
> > > > > to
> > > > > > different results. To be specific, the semantic of the
> > configuration
> > > > > option
> > > > > > `consumer-weights` changed (within a slot vs. within an operator).
> > > > > >
> > > > > >
> > > > > >
> > > > > > To sum up things:
> > > > > >
> > > > > > While (3) might be a bit more implementation related, I think (1)
> > > and (2)
> > > > > > somehow suggest that, the price for the proposed approach to avoid
> > > > > > specifying resource for every operator is that it's not as
> > > independent
> > > > > from
> > > > > > operator chaining and slot sharing as the operator-based approach
> > > > > discussed
> > > > > > in the FLIP.
> > > > > >
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > > >
> > > > > > > I want to say, first of all, that this is super well written. And
> > > the
> > > > > > > points that the FLIP makes about how to expose the configuration
> > to
> > > > > users
> > > > > > > is exactly the right thing to figure out first.
> > > > > > > So good job here!
> > > > > > >
> > > > > > > About how to let users specify the resource profiles. If I can
> > sum
> > > the
> > > > > > FLIP
> > > > > > > and previous discussion up in my own words, the problem is the
> > > > > following:
> > > > > > >
> > > > > > > Operator-level specification is the simplest and cleanest
> > approach,
> > > > > > because
> > > > > > > > it avoids mixing operator configuration (resource) and
> > > scheduling. No
> > > > > > > > matter what other parameters change (chaining, slot sharing,
> > > > > switching
> > > > > > > > pipelined and blocking shuffles), the resource profiles stay
> > the
> > > > > same.
> > > > > > > > But it would require that a user specifies resources on all
> > > > > operators,
> > > > > > > > which makes it hard to use. That's why the FLIP suggests going
> > > with
> > > > > > > > specifying resources on a Sharing-Group.
> > > > > > >
> > > > > > >
> > > > > > > I think both thoughts are important, so can we find a solution
> > > where
> > > > > the
> > > > > > > Resource Profiles are specified on an Operator, but we still
> > avoid
> > > that
> > > > > > we
> > > > > > > need to specify a resource profile on every operator?
> > > > > > >
> > > > > > > What do you think about something like the following:
> > > > > > >   - Resource Profiles are specified on an operator level.
> > > > > > >   - Not all operators need profiles
> > > > > > >   - All Operators without a Resource Profile ended up in the
> > > default
> > > > > slot
> > > > > > > sharing group with a default profile (will get a default slot).
> > > > > > >   - All Operators with a Resource Profile will go into another
> > slot
> > > > > > sharing
> > > > > > > group (the resource-specified-group).
> > > > > > >   - Users can define different slot sharing groups for operators
> > > like
> > > > > > they
> > > > > > > do now, with the exception that you cannot mix operators that
> > have
> > > a
> > > > > > > resource profile and operators that have no resource profile.
> > > > > > >   - The default case where no operator has a resource profile is
> > > just a
> > > > > > > special case of this model
> > > > > > >   - The chaining logic sums up the profiles per operator, like it
> > > does
> > > > > > now,
> > > > > > > and the scheduler sums up the profiles of the tasks that it
> > > schedules
> > > > > > > together.
> > > > > > >
> > > > > > >
> > > > > > > There is another question about reactive scaling raised in the
> > > FLIP. I
> > > > > > need
> > > > > > > to think a bit about that. That is indeed a bit more tricky once
> > we
> > > > > have
> > > > > > > slots of different sizes.
> > > > > > > It is not clear then which of the different slot requests the
> > > > > > > ResourceManager should fulfill when new resources (TMs) show up,
> > > or how
> > > > > > the
> > > > > > > JobManager redistributes the slots resources when resources (TMs)
> > > > > > disappear
> > > > > > > This question is pretty orthogonal, though, to the "how to
> > specify
> > > the
> > > > > > > resources".
> > > > > > >
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > > > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <
> > tonysong820@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for drafting the FLIP and driving the discussion,
> > Yangze.
> > > > > > > > And Thanks for the feedback, Till and Chesnay.
> > > > > > > >
> > > > > > > > @Till,
> > > > > > > >
> > > > > > > > I agree that specifying requirements for SSGs means that SSGs
> > > need to
> > > > > > be
> > > > > > > > supported in fine-grained resource management, otherwise each
> > > > > operator
> > > > > > > > might use as many resources as the whole group. However, I
> > cannot
> > > > > think
> > > > > > > of
> > > > > > > > a strong reason for not supporting SSGs in fine-grained
> > resource
> > > > > > > > management.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Interestingly, if all operators have their resources properly
> > > > > > > specified,
> > > > > > > > > then slot sharing is no longer needed because Flink could
> > > slice off
> > > > > > the
> > > > > > > > > appropriately sized slots for every Task individually.
> > > > > > > > >
> > > > > > > >
> > > > > > > > So for example, if we have a job consisting of two operator
> > op_1
> > > and
> > > > > > op_2
> > > > > > > > > where each op needs 100 MB of memory, we would then say that
> > > the
> > > > > slot
> > > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > > cluster
> > > > > > with
> > > > > > > 2
> > > > > > > > > TMs with one slot of 100 MB each, then the system cannot run
> > > this
> > > > > > job.
> > > > > > > If
> > > > > > > > > the resources were specified on an operator level, then the
> > > system
> > > > > > > could
> > > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > > TM_2.
> > > > > > > >
> > > > > > > >
> > > > > > > > Couldn't agree more that if all operators' requirements are
> > > properly
> > > > > > > > specified, slot sharing should be no longer needed. I think
> > this
> > > > > > exactly
> > > > > > > > disproves the example. If we already know op_1 and op_2 each
> > > needs
> > > > > 100
> > > > > > MB
> > > > > > > > of memory, why would we put them in the same group? If they are
> > > in
> > > > > > > separate
> > > > > > > > groups, with the proposed approach the system can freely deploy
> > > them
> > > > > to
> > > > > > > > either a 200 MB TM or two 100 MB TMs.
> > > > > > > >
> > > > > > > > Moreover, the precondition for not needing slot sharing is
> > having
> > > > > > > resource
> > > > > > > > requirements properly specified for all operators. This is not
> > > always
> > > > > > > > possible, and usually requires tremendous efforts. One of the
> > > > > benefits
> > > > > > > for
> > > > > > > > SSG-based requirements is that it allows the user to freely
> > > decide
> > > > > the
> > > > > > > > granularity, thus efforts they want to pay. I would consider
> > SSG
> > > in
> > > > > > > > fine-grained resource management as a group of operators that
> > the
> > > > > user
> > > > > > > > would like to specify the total resource for. There can be only
> > > one
> > > > > > group
> > > > > > > > in the job, 2~3 groups dividing the job into a few major parts,
> > > or as
> > > > > > > many
> > > > > > > > groups as the number of tasks/operators, depending on how
> > > > > fine-grained
> > > > > > > the
> > > > > > > > user is able to specify the resources.
> > > > > > > >
> > > > > > > > Having to support SSGs might be a constraint. But given that
> > all
> > > the
> > > > > > > > current scheduler implementations already support SSGs, I tend
> > to
> > > > > think
> > > > > > > > that as an acceptable price for the above discussed usability
> > and
> > > > > > > > flexibility.
> > > > > > > >
> > > > > > > > @Chesnay
> > > > > > > >
> > > > > > > > Will declaring them on slot sharing groups not also waste
> > > resources
> > > > > if
> > > > > > > the
> > > > > > > > > parallelism of operators within that group are different?
> > > > > > > > >
> > > > > > > > Yes. It's a trade-off between usability and resource
> > > utilization. To
> > > > > > > avoid
> > > > > > > > such wasting, the user can define more groups, so that each
> > group
> > > > > > > contains
> > > > > > > > less operators and the chance of having operators with
> > different
> > > > > > > > parallelism will be reduced. The price is to have more resource
> > > > > > > > requirements to specify.
> > > > > > > >
> > > > > > > > It also seems like quite a hassle for users having to
> > > recalculate the
> > > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > > I'd think that it's not really workable for users that create
> > > a set
> > > > > > of
> > > > > > > > > re-usable operators which are mixed and matched in their
> > > > > > applications;
> > > > > > > > > managing the resources requirements in such a setting would
> > be
> > > a
> > > > > > > > > nightmare, and in the end would require operator-level
> > > requirements
> > > > > > any
> > > > > > > > > way.
> > > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > > usability.
> > > > > > > > >
> > > > > > > >
> > > > > > > >    - As mentioned in my reply to Till's comment, there's no
> > > reason to
> > > > > > put
> > > > > > > >    multiple operators whose individual resource requirements
> > are
> > > > > > already
> > > > > > > > known
> > > > > > > >    into the same group in fine-grained resource management.
> > > > > > > >    - Even an operator implementation is reused for multiple
> > > > > > applications,
> > > > > > > >    it does not guarantee the same resource requirements. During
> > > our
> > > > > > years
> > > > > > > > of
> > > > > > > >    practices in Alibaba, with per-operator requirements
> > > specified for
> > > > > > > > Blink's
> > > > > > > >    fine-grained resource management, very few users (including
> > > our
> > > > > > > > specialists
> > > > > > > >    who are dedicated to supporting Blink users) are as
> > > experienced as
> > > > > > to
> > > > > > > >    accurately predict/estimate the operator resource
> > > requirements.
> > > > > Most
> > > > > > > > people
> > > > > > > >    rely on the execution-time metrics (throughput, delay, cpu
> > > load,
> > > > > > > memory
> > > > > > > >    usage, GC pressure, etc.) to improve the specification.
> > > > > > > >
> > > > > > > > To sum up:
> > > > > > > > If the user is capable of providing proper resource
> > requirements
> > > for
> > > > > > > every
> > > > > > > > operator, that's definitely a good thing and we would not need
> > to
> > > > > rely
> > > > > > on
> > > > > > > > the SSGs. However, that shouldn't be a *must* for the
> > > fine-grained
> > > > > > > resource
> > > > > > > > management to work. For those users who are capable and do not
> > > like
> > > > > > > having
> > > > > > > > to set each operator to a separate SSG, I would be ok to have
> > > both
> > > > > > > > SSG-based and operator-based runtime interfaces and to only
> > > fallback
> > > > > to
> > > > > > > the
> > > > > > > > SSG requirements when the operator requirements are not
> > > specified.
> > > > > > > However,
> > > > > > > > as the first step, I think we should prioritise the use cases
> > > where
> > > > > > users
> > > > > > > > are not that experienced.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > chesnay@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Will declaring them on slot sharing groups not also waste
> > > resources
> > > > > > if
> > > > > > > > > the parallelism of operators within that group are different?
> > > > > > > > >
> > > > > > > > > It also seems like quite a hassle for users having to
> > > recalculate
> > > > > the
> > > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > > I'd think that it's not really workable for users that create
> > > a set
> > > > > > of
> > > > > > > > > re-usable operators which are mixed and matched in their
> > > > > > applications;
> > > > > > > > > managing the resources requirements in such a setting would
> > be
> > > a
> > > > > > > > > nightmare, and in the end would require operator-level
> > > requirements
> > > > > > any
> > > > > > > > > way.
> > > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > > usability.
> > > > > > > > >
> > > > > > > > > My main worry is that it if we wire the runtime to work on
> > SSGs
> > > > > it's
> > > > > > > > > gonna be difficult to implement more fine-grained approaches,
> > > which
> > > > > > > > > would not be the case if, for the runtime, they are always
> > > defined
> > > > > on
> > > > > > > an
> > > > > > > > > operator-level.
> > > > > > > > >
> > > > > > > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > > > > > Thanks for drafting this FLIP and starting this discussion
> > > > > Yangze.
> > > > > > > > > >
> > > > > > > > > > I like that defining resource requirements on a slot
> > sharing
> > > > > group
> > > > > > > > makes
> > > > > > > > > > the overall setup easier and improves usability of resource
> > > > > > > > requirements.
> > > > > > > > > >
> > > > > > > > > > What I do not like about it is that it changes slot sharing
> > > > > groups
> > > > > > > from
> > > > > > > > > > being a scheduling hint to something which needs to be
> > > supported
> > > > > in
> > > > > > > > order
> > > > > > > > > > to support fine grained resource requirements. So far, the
> > > idea
> > > > > of
> > > > > > > slot
> > > > > > > > > > sharing groups was that it tells the system that a set of
> > > > > operators
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > deployed in the same slot. But the system still had the
> > > freedom
> > > > > to
> > > > > > > say
> > > > > > > > > that
> > > > > > > > > > it would rather place these tasks in different slots if it
> > > > > wanted.
> > > > > > If
> > > > > > > > we
> > > > > > > > > > now specify resource requirements on a per slot sharing
> > > group,
> > > > > then
> > > > > > > the
> > > > > > > > > > only option for a scheduler which does not support slot
> > > sharing
> > > > > > > groups
> > > > > > > > is
> > > > > > > > > > to say that every operator in this slot sharing group
> > needs a
> > > > > slot
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > same resources as the whole group.
> > > > > > > > > >
> > > > > > > > > > So for example, if we have a job consisting of two operator
> > > op_1
> > > > > > and
> > > > > > > > op_2
> > > > > > > > > > where each op needs 100 MB of memory, we would then say
> > that
> > > the
> > > > > > slot
> > > > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > > cluster
> > > > > > > with
> > > > > > > > 2
> > > > > > > > > > TMs with one slot of 100 MB each, then the system cannot
> > run
> > > this
> > > > > > > job.
> > > > > > > > If
> > > > > > > > > > the resources were specified on an operator level, then the
> > > > > system
> > > > > > > > could
> > > > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > > TM_2.
> > > > > > > > > >
> > > > > > > > > > Originally, one of the primary goals of slot sharing groups
> > > was
> > > > > to
> > > > > > > make
> > > > > > > > > it
> > > > > > > > > > easier for the user to reason about how many slots a job
> > > needs
> > > > > > > > > independent
> > > > > > > > > > of the actual number of operators in the job.
> > Interestingly,
> > > if
> > > > > all
> > > > > > > > > > operators have their resources properly specified, then
> > slot
> > > > > > sharing
> > > > > > > is
> > > > > > > > > no
> > > > > > > > > > longer needed because Flink could slice off the
> > appropriately
> > > > > sized
> > > > > > > > slots
> > > > > > > > > > for every Task individually. What matters is whether the
> > > whole
> > > > > > > cluster
> > > > > > > > > has
> > > > > > > > > > enough resources to run all tasks or not.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > > Till
> > > > > > > > > >
> > > > > > > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > karmagyz@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Hi, there,
> > > > > > > > > >>
> > > > > > > > > >> We would like to start a discussion thread on "FLIP-156:
> > > Runtime
> > > > > > > > > >> Interfaces for Fine-Grained Resource Requirements"[1],
> > > where we
> > > > > > > > > >> propose Slot Sharing Group (SSG) based runtime interfaces
> > > for
> > > > > > > > > >> specifying fine-grained resource requirements.
> > > > > > > > > >>
> > > > > > > > > >> In this FLIP:
> > > > > > > > > >> - Expound the user story of fine-grained resource
> > > management.
> > > > > > > > > >> - Propose runtime interfaces for specifying SSG-based
> > > resource
> > > > > > > > > >> requirements.
> > > > > > > > > >> - Discuss the pros and cons of the three potential
> > > granularities
> > > > > > for
> > > > > > > > > >> specifying the resource requirements (op, task and slot
> > > sharing
> > > > > > > group)
> > > > > > > > > >> and explain why we choose the slot sharing group.
> > > > > > > > > >>
> > > > > > > > > >> Please find more details in the FLIP wiki document [1].
> > > Looking
> > > > > > > > > >> forward to your feedback.
> > > > > > > > > >>
> > > > > > > > > >> [1]
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > > > > >>
> > > > > > > > > >> Best,
> > > > > > > > > >> Yangze Guo
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

I think this makes sense.

The semantic of a SSG is that operators in the group *can* be scheduled
together in a slot, which is not a *must*. Specifying resources for SSGs
should not change that semantic. In cases that needs for scheduling the
operators into different slots arise, it makes sense for the runtime to
derive the finer grained resource requirements, if not provided.

We may not need to implement this at the moment since currently SSGs are
always respected, but we should make that semantic explicit in JavaDocs for
the interfaces and user documentations when the user APIs are exposed.

Thank you~

Xintong Song



On Thu, Jan 21, 2021 at 1:55 AM Till Rohrmann <tr...@apache.org> wrote:

> Maybe a different minor idea: Would it be possible to treat the SSG
> resource requirements as a hint for the runtime similar to how slot sharing
> groups are designed at the moment? Meaning that we don't give the guarantee
> that Flink will always deploy this set of tasks together no matter what
> comes. If, for example, the runtime can derive by some means the resource
> requirements for each task based on the requirements for the SSG, this
> could be possible. One easy strategy would be to give every task the same
> resources as the whole slot sharing group. Another one could be
> distributing the resources equally among the tasks. This does not even have
> to be implemented but we would give ourselves the freedom to change
> scheduling if need should arise.
>
> Cheers,
> Till
>
> On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > Thanks for the responses, Till and Xintong.
> >
> > I second Xintong's comment that SSG-based runtime interface will give
> > us the flexibility to achieve op/task-based approach. That's one of
> > the most important reasons for our design choice.
> >
> > Some cents regarding the default operator resource:
> > - It might be good for the scenario of DataStream jobs.
> >    ** For light-weight operators, the accumulative configuration error
> > will not be significant. Then, the resource of a task used is
> > proportional to the number of operators it contains.
> >    ** For heavy operators like join and window or operators using the
> > external resources, user will turn to the fine-grained resource
> > configuration.
> > - It can increase the stability for the standalone cluster where task
> > executors registered are heterogeneous(with different default slot
> > resources).
> > - It might not be good for SQL users. The operators that SQL will be
> > transferred to is a black box to the user. We also do not guarantee
> > the cross-version of consistency of the transformation so far.
> >
> > I think it can be treated as a follow-up work when the fine-grained
> > resource management is end-to-end ready.
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > Thanks for the feedback, Till.
> > >
> > > ## I feel that what you proposed (operator-based + default value) might
> > be
> > > subsumed by the SSG-based approach.
> > > Thinking of op_1 -> op_2, there are the following 4 cases, categorized
> by
> > > whether the resource requirements are known to the users.
> > >
> > >    1. *Both known.* As previously mentioned, there's no reason to put
> > >    multiple operators whose individual resource requirements are
> already
> > known
> > >    into the same group in fine-grained resource management. And if op_1
> > and
> > >    op_2 are in different groups, there should be no problem switching
> > data
> > >    exchange mode from pipelined to blocking. This is equivalent to
> > specifying
> > >    operator resource requirements in your proposal.
> > >    2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is
> in a
> > >    SSG whose resource is not specified thus would have the default slot
> > >    resource. This is equivalent to having default operator resources in
> > your
> > >    proposal.
> > >    3. *Both unknown*. The user can either set op_1 and op_2 to the same
> > SSG
> > >    or separate SSGs.
> > >       - If op_1 and op_2 are in the same SSG, it will be equivalent to
> > the
> > >       coarse-grained resource management, where op_1 and op_2 share a
> > default
> > >       size slot no matter which data exchange mode is used.
> > >       - If op_1 and op_2 are in different SSGs, then each of them will
> > use
> > >       a default size slot. This is equivalent to setting them with
> > default
> > >       operator resources in your proposal.
> > >    4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
> > >       - It is possible that the user learns the total / max resource
> > >       requirement from executing and monitoring the job, while not
> > > being aware of
> > >       individual operator requirements.
> > >       - I believe this is the case your proposal does not cover. And
> TBH,
> > >       this is probably how most users learn the resource requirements,
> > > according
> > >       to my experiences.
> > >       - In this case, the user might need to specify different
> resources
> > if
> > >       he wants to switch the execution mode, which should not be worse
> > than not
> > >       being able to use fine-grained resource management.
> > >
> > >
> > > ## An additional idea inspired by your proposal.
> > > We may provide multiple options for deciding resources for SSGs whose
> > > requirement is not specified, if needed.
> > >
> > >    - Default slot resource (current design)
> > >    - Default operator resource times number of operators (equivalent to
> > >    your proposal)
> > >
> > >
> > > ## Exposing internal runtime strategies
> > > Theoretically, yes. Tying to the SSGs, the resource requirements might
> be
> > > affected if how SSGs are internally handled changes in future.
> > Practically,
> > > I do not concretely see at the moment what kind of changes we may want
> in
> > > future that might conflict with this FLIP proposal, as the question of
> > > switching data exchange mode answered above. I'd suggest to not give up
> > the
> > > user friendliness we may gain now for the future problems that may or
> may
> > > not exist.
> > >
> > > Moreover, the SSG-based approach has the flexibility to achieve the
> > > equivalent behavior as the operator-based approach, if we set each
> > operator
> > > (or task) to a separate SSG. We can even provide a shortcut option to
> > > automatically do that for users, if needed.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > > Thanks for the responses Xintong and Stephan,
> > > >
> > > > I agree that being able to define the resource requirements for a
> > group of
> > > > operators is more user friendly. However, my concern is that we are
> > > > exposing thereby internal runtime strategies which might limit our
> > > > flexibility to execute a given job. Moreover, the semantics of
> > configuring
> > > > resource requirements for SSGs could break if switching from
> streaming
> > to
> > > > batch execution. If one defines the resource requirements for op_1 ->
> > op_2
> > > > which run in pipelined mode when using the streaming execution, then
> > how do
> > > > we interpret these requirements when op_1 -> op_2 are executed with a
> > > > blocking data exchange in batch execution mode? Consequently, I am
> > still
> > > > leaning towards Stephan's proposal to set the resource requirements
> per
> > > > operator.
> > > >
> > > > Maybe the following proposal makes the configuration easier: If the
> > user
> > > > wants to use fine-grained resource requirements, then she needs to
> > specify
> > > > the default size which is used for operators which have no explicit
> > > > resource annotation. If this holds true, then every operator would
> > have a
> > > > resource requirement and the system can try to execute the operators
> > in the
> > > > best possible manner w/o being constrained by how the user set the
> SSG
> > > > requirements.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for the feedback, Stephan.
> > > > >
> > > > > Actually, your proposal has also come to my mind at some point.
> And I
> > > > have
> > > > > some concerns about it.
> > > > >
> > > > >
> > > > > 1. It does not give users the same control as the SSG-based
> approach.
> > > > >
> > > > >
> > > > > While both approaches do not require specifying for each operator,
> > > > > SSG-based approach supports the semantic that "some operators
> > together
> > > > use
> > > > > this much resource" while the operator-based approach doesn't.
> > > > >
> > > > >
> > > > > Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
> > at
> > > > some
> > > > > point there's an agg o_n (1 < n < m) which significantly reduces
> the
> > data
> > > > > amount. One can separate the pipeline into 2 groups SSG_1 (o_1,
> ...,
> > o_n)
> > > > > and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> > parallelisms
> > > > > for operators in SSG_1 than for operators in SSG_2 won't lead to
> too
> > much
> > > > > wasting of resources. If the two SSGs end up needing different
> > resources,
> > > > > with the SSG-based approach one can directly specify resources for
> > the
> > > > two
> > > > > groups. However, with the operator-based approach, the user will
> > have to
> > > > > specify resources for each operator in one of the two groups, and
> > tune
> > > > the
> > > > > default slot resource via configurations to fit the other group.
> > > > >
> > > > >
> > > > > 2. It increases the chance of breaking operator chains.
> > > > >
> > > > >
> > > > > Setting chainnable operators into different slot sharing groups
> will
> > > > > prevent them from being chained. In the current implementation,
> > > > downstream
> > > > > operators, if SSG not explicitly specified, will be set to the same
> > group
> > > > > as the chainable upstream operators (unless multiple upstream
> > operators
> > > > in
> > > > > different groups), to reduce the chance of breaking chains.
> > > > >
> > > > >
> > > > > Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
> > SSGs
> > > > > based on whether resource is specified we will easily get groups
> like
> > > > (o_1,
> > > > > o_3) & (o_2, o_4), where none of the operators can be chained. This
> > is
> > > > also
> > > > > possible for the SSG-based approach, but I believe the chance is
> much
> > > > > smaller because there's no strong reason for users to specify the
> > groups
> > > > > with alternate operators like that. We are more likely to get
> groups
> > like
> > > > > (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2
> and
> > o_3.
> > > > >
> > > > >
> > > > > 3. It complicates the system by having two different mechanisms for
> > > > sharing
> > > > > managed memory in  a slot.
> > > > >
> > > > >
> > > > > - In FLIP-141, we introduced the intra-slot managed memory sharing
> > > > > mechanism, where managed memory is first distributed according to
> the
> > > > > consumer type, then further distributed across operators of that
> > consumer
> > > > > type.
> > > > >
> > > > > - With the operator-based approach, managed memory size specified
> > for an
> > > > > operator should account for all the consumer types of that
> operator.
> > That
> > > > > means the managed memory is first distributed across operators,
> then
> > > > > distributed to different consumer types of each operator.
> > > > >
> > > > >
> > > > > Unfortunately, the different order of the two calculation steps can
> > lead
> > > > to
> > > > > different results. To be specific, the semantic of the
> configuration
> > > > option
> > > > > `consumer-weights` changed (within a slot vs. within an operator).
> > > > >
> > > > >
> > > > >
> > > > > To sum up things:
> > > > >
> > > > > While (3) might be a bit more implementation related, I think (1)
> > and (2)
> > > > > somehow suggest that, the price for the proposed approach to avoid
> > > > > specifying resource for every operator is that it's not as
> > independent
> > > > from
> > > > > operator chaining and slot sharing as the operator-based approach
> > > > discussed
> > > > > in the FLIP.
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org>
> > wrote:
> > > > >
> > > > > > Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > >
> > > > > > I want to say, first of all, that this is super well written. And
> > the
> > > > > > points that the FLIP makes about how to expose the configuration
> to
> > > > users
> > > > > > is exactly the right thing to figure out first.
> > > > > > So good job here!
> > > > > >
> > > > > > About how to let users specify the resource profiles. If I can
> sum
> > the
> > > > > FLIP
> > > > > > and previous discussion up in my own words, the problem is the
> > > > following:
> > > > > >
> > > > > > Operator-level specification is the simplest and cleanest
> approach,
> > > > > because
> > > > > > > it avoids mixing operator configuration (resource) and
> > scheduling. No
> > > > > > > matter what other parameters change (chaining, slot sharing,
> > > > switching
> > > > > > > pipelined and blocking shuffles), the resource profiles stay
> the
> > > > same.
> > > > > > > But it would require that a user specifies resources on all
> > > > operators,
> > > > > > > which makes it hard to use. That's why the FLIP suggests going
> > with
> > > > > > > specifying resources on a Sharing-Group.
> > > > > >
> > > > > >
> > > > > > I think both thoughts are important, so can we find a solution
> > where
> > > > the
> > > > > > Resource Profiles are specified on an Operator, but we still
> avoid
> > that
> > > > > we
> > > > > > need to specify a resource profile on every operator?
> > > > > >
> > > > > > What do you think about something like the following:
> > > > > >   - Resource Profiles are specified on an operator level.
> > > > > >   - Not all operators need profiles
> > > > > >   - All Operators without a Resource Profile ended up in the
> > default
> > > > slot
> > > > > > sharing group with a default profile (will get a default slot).
> > > > > >   - All Operators with a Resource Profile will go into another
> slot
> > > > > sharing
> > > > > > group (the resource-specified-group).
> > > > > >   - Users can define different slot sharing groups for operators
> > like
> > > > > they
> > > > > > do now, with the exception that you cannot mix operators that
> have
> > a
> > > > > > resource profile and operators that have no resource profile.
> > > > > >   - The default case where no operator has a resource profile is
> > just a
> > > > > > special case of this model
> > > > > >   - The chaining logic sums up the profiles per operator, like it
> > does
> > > > > now,
> > > > > > and the scheduler sums up the profiles of the tasks that it
> > schedules
> > > > > > together.
> > > > > >
> > > > > >
> > > > > > There is another question about reactive scaling raised in the
> > FLIP. I
> > > > > need
> > > > > > to think a bit about that. That is indeed a bit more tricky once
> we
> > > > have
> > > > > > slots of different sizes.
> > > > > > It is not clear then which of the different slot requests the
> > > > > > ResourceManager should fulfill when new resources (TMs) show up,
> > or how
> > > > > the
> > > > > > JobManager redistributes the slots resources when resources (TMs)
> > > > > disappear
> > > > > > This question is pretty orthogonal, though, to the "how to
> specify
> > the
> > > > > > resources".
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Stephan
> > > > > >
> > > > > > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <
> tonysong820@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Thanks for drafting the FLIP and driving the discussion,
> Yangze.
> > > > > > > And Thanks for the feedback, Till and Chesnay.
> > > > > > >
> > > > > > > @Till,
> > > > > > >
> > > > > > > I agree that specifying requirements for SSGs means that SSGs
> > need to
> > > > > be
> > > > > > > supported in fine-grained resource management, otherwise each
> > > > operator
> > > > > > > might use as many resources as the whole group. However, I
> cannot
> > > > think
> > > > > > of
> > > > > > > a strong reason for not supporting SSGs in fine-grained
> resource
> > > > > > > management.
> > > > > > >
> > > > > > >
> > > > > > > > Interestingly, if all operators have their resources properly
> > > > > > specified,
> > > > > > > > then slot sharing is no longer needed because Flink could
> > slice off
> > > > > the
> > > > > > > > appropriately sized slots for every Task individually.
> > > > > > > >
> > > > > > >
> > > > > > > So for example, if we have a job consisting of two operator
> op_1
> > and
> > > > > op_2
> > > > > > > > where each op needs 100 MB of memory, we would then say that
> > the
> > > > slot
> > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > cluster
> > > > > with
> > > > > > 2
> > > > > > > > TMs with one slot of 100 MB each, then the system cannot run
> > this
> > > > > job.
> > > > > > If
> > > > > > > > the resources were specified on an operator level, then the
> > system
> > > > > > could
> > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > TM_2.
> > > > > > >
> > > > > > >
> > > > > > > Couldn't agree more that if all operators' requirements are
> > properly
> > > > > > > specified, slot sharing should be no longer needed. I think
> this
> > > > > exactly
> > > > > > > disproves the example. If we already know op_1 and op_2 each
> > needs
> > > > 100
> > > > > MB
> > > > > > > of memory, why would we put them in the same group? If they are
> > in
> > > > > > separate
> > > > > > > groups, with the proposed approach the system can freely deploy
> > them
> > > > to
> > > > > > > either a 200 MB TM or two 100 MB TMs.
> > > > > > >
> > > > > > > Moreover, the precondition for not needing slot sharing is
> having
> > > > > > resource
> > > > > > > requirements properly specified for all operators. This is not
> > always
> > > > > > > possible, and usually requires tremendous efforts. One of the
> > > > benefits
> > > > > > for
> > > > > > > SSG-based requirements is that it allows the user to freely
> > decide
> > > > the
> > > > > > > granularity, thus efforts they want to pay. I would consider
> SSG
> > in
> > > > > > > fine-grained resource management as a group of operators that
> the
> > > > user
> > > > > > > would like to specify the total resource for. There can be only
> > one
> > > > > group
> > > > > > > in the job, 2~3 groups dividing the job into a few major parts,
> > or as
> > > > > > many
> > > > > > > groups as the number of tasks/operators, depending on how
> > > > fine-grained
> > > > > > the
> > > > > > > user is able to specify the resources.
> > > > > > >
> > > > > > > Having to support SSGs might be a constraint. But given that
> all
> > the
> > > > > > > current scheduler implementations already support SSGs, I tend
> to
> > > > think
> > > > > > > that as an acceptable price for the above discussed usability
> and
> > > > > > > flexibility.
> > > > > > >
> > > > > > > @Chesnay
> > > > > > >
> > > > > > > Will declaring them on slot sharing groups not also waste
> > resources
> > > > if
> > > > > > the
> > > > > > > > parallelism of operators within that group are different?
> > > > > > > >
> > > > > > > Yes. It's a trade-off between usability and resource
> > utilization. To
> > > > > > avoid
> > > > > > > such wasting, the user can define more groups, so that each
> group
> > > > > > contains
> > > > > > > less operators and the chance of having operators with
> different
> > > > > > > parallelism will be reduced. The price is to have more resource
> > > > > > > requirements to specify.
> > > > > > >
> > > > > > > It also seems like quite a hassle for users having to
> > recalculate the
> > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > I'd think that it's not really workable for users that create
> > a set
> > > > > of
> > > > > > > > re-usable operators which are mixed and matched in their
> > > > > applications;
> > > > > > > > managing the resources requirements in such a setting would
> be
> > a
> > > > > > > > nightmare, and in the end would require operator-level
> > requirements
> > > > > any
> > > > > > > > way.
> > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > usability.
> > > > > > > >
> > > > > > >
> > > > > > >    - As mentioned in my reply to Till's comment, there's no
> > reason to
> > > > > put
> > > > > > >    multiple operators whose individual resource requirements
> are
> > > > > already
> > > > > > > known
> > > > > > >    into the same group in fine-grained resource management.
> > > > > > >    - Even an operator implementation is reused for multiple
> > > > > applications,
> > > > > > >    it does not guarantee the same resource requirements. During
> > our
> > > > > years
> > > > > > > of
> > > > > > >    practices in Alibaba, with per-operator requirements
> > specified for
> > > > > > > Blink's
> > > > > > >    fine-grained resource management, very few users (including
> > our
> > > > > > > specialists
> > > > > > >    who are dedicated to supporting Blink users) are as
> > experienced as
> > > > > to
> > > > > > >    accurately predict/estimate the operator resource
> > requirements.
> > > > Most
> > > > > > > people
> > > > > > >    rely on the execution-time metrics (throughput, delay, cpu
> > load,
> > > > > > memory
> > > > > > >    usage, GC pressure, etc.) to improve the specification.
> > > > > > >
> > > > > > > To sum up:
> > > > > > > If the user is capable of providing proper resource
> requirements
> > for
> > > > > > every
> > > > > > > operator, that's definitely a good thing and we would not need
> to
> > > > rely
> > > > > on
> > > > > > > the SSGs. However, that shouldn't be a *must* for the
> > fine-grained
> > > > > > resource
> > > > > > > management to work. For those users who are capable and do not
> > like
> > > > > > having
> > > > > > > to set each operator to a separate SSG, I would be ok to have
> > both
> > > > > > > SSG-based and operator-based runtime interfaces and to only
> > fallback
> > > > to
> > > > > > the
> > > > > > > SSG requirements when the operator requirements are not
> > specified.
> > > > > > However,
> > > > > > > as the first step, I think we should prioritise the use cases
> > where
> > > > > users
> > > > > > > are not that experienced.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > chesnay@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Will declaring them on slot sharing groups not also waste
> > resources
> > > > > if
> > > > > > > > the parallelism of operators within that group are different?
> > > > > > > >
> > > > > > > > It also seems like quite a hassle for users having to
> > recalculate
> > > > the
> > > > > > > > resource requirements if they change the slot sharing.
> > > > > > > > I'd think that it's not really workable for users that create
> > a set
> > > > > of
> > > > > > > > re-usable operators which are mixed and matched in their
> > > > > applications;
> > > > > > > > managing the resources requirements in such a setting would
> be
> > a
> > > > > > > > nightmare, and in the end would require operator-level
> > requirements
> > > > > any
> > > > > > > > way.
> > > > > > > > In that sense, I'm not even sure whether it really increases
> > > > > usability.
> > > > > > > >
> > > > > > > > My main worry is that it if we wire the runtime to work on
> SSGs
> > > > it's
> > > > > > > > gonna be difficult to implement more fine-grained approaches,
> > which
> > > > > > > > would not be the case if, for the runtime, they are always
> > defined
> > > > on
> > > > > > an
> > > > > > > > operator-level.
> > > > > > > >
> > > > > > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > > > > Thanks for drafting this FLIP and starting this discussion
> > > > Yangze.
> > > > > > > > >
> > > > > > > > > I like that defining resource requirements on a slot
> sharing
> > > > group
> > > > > > > makes
> > > > > > > > > the overall setup easier and improves usability of resource
> > > > > > > requirements.
> > > > > > > > >
> > > > > > > > > What I do not like about it is that it changes slot sharing
> > > > groups
> > > > > > from
> > > > > > > > > being a scheduling hint to something which needs to be
> > supported
> > > > in
> > > > > > > order
> > > > > > > > > to support fine grained resource requirements. So far, the
> > idea
> > > > of
> > > > > > slot
> > > > > > > > > sharing groups was that it tells the system that a set of
> > > > operators
> > > > > > can
> > > > > > > > be
> > > > > > > > > deployed in the same slot. But the system still had the
> > freedom
> > > > to
> > > > > > say
> > > > > > > > that
> > > > > > > > > it would rather place these tasks in different slots if it
> > > > wanted.
> > > > > If
> > > > > > > we
> > > > > > > > > now specify resource requirements on a per slot sharing
> > group,
> > > > then
> > > > > > the
> > > > > > > > > only option for a scheduler which does not support slot
> > sharing
> > > > > > groups
> > > > > > > is
> > > > > > > > > to say that every operator in this slot sharing group
> needs a
> > > > slot
> > > > > > with
> > > > > > > > the
> > > > > > > > > same resources as the whole group.
> > > > > > > > >
> > > > > > > > > So for example, if we have a job consisting of two operator
> > op_1
> > > > > and
> > > > > > > op_2
> > > > > > > > > where each op needs 100 MB of memory, we would then say
> that
> > the
> > > > > slot
> > > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> > cluster
> > > > > > with
> > > > > > > 2
> > > > > > > > > TMs with one slot of 100 MB each, then the system cannot
> run
> > this
> > > > > > job.
> > > > > > > If
> > > > > > > > > the resources were specified on an operator level, then the
> > > > system
> > > > > > > could
> > > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> > TM_2.
> > > > > > > > >
> > > > > > > > > Originally, one of the primary goals of slot sharing groups
> > was
> > > > to
> > > > > > make
> > > > > > > > it
> > > > > > > > > easier for the user to reason about how many slots a job
> > needs
> > > > > > > > independent
> > > > > > > > > of the actual number of operators in the job.
> Interestingly,
> > if
> > > > all
> > > > > > > > > operators have their resources properly specified, then
> slot
> > > > > sharing
> > > > > > is
> > > > > > > > no
> > > > > > > > > longer needed because Flink could slice off the
> appropriately
> > > > sized
> > > > > > > slots
> > > > > > > > > for every Task individually. What matters is whether the
> > whole
> > > > > > cluster
> > > > > > > > has
> > > > > > > > > enough resources to run all tasks or not.
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Till
> > > > > > > > >
> > > > > > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > karmagyz@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hi, there,
> > > > > > > > >>
> > > > > > > > >> We would like to start a discussion thread on "FLIP-156:
> > Runtime
> > > > > > > > >> Interfaces for Fine-Grained Resource Requirements"[1],
> > where we
> > > > > > > > >> propose Slot Sharing Group (SSG) based runtime interfaces
> > for
> > > > > > > > >> specifying fine-grained resource requirements.
> > > > > > > > >>
> > > > > > > > >> In this FLIP:
> > > > > > > > >> - Expound the user story of fine-grained resource
> > management.
> > > > > > > > >> - Propose runtime interfaces for specifying SSG-based
> > resource
> > > > > > > > >> requirements.
> > > > > > > > >> - Discuss the pros and cons of the three potential
> > granularities
> > > > > for
> > > > > > > > >> specifying the resource requirements (op, task and slot
> > sharing
> > > > > > group)
> > > > > > > > >> and explain why we choose the slot sharing group.
> > > > > > > > >>
> > > > > > > > >> Please find more details in the FLIP wiki document [1].
> > Looking
> > > > > > > > >> forward to your feedback.
> > > > > > > > >>
> > > > > > > > >> [1]
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > > > >>
> > > > > > > > >> Best,
> > > > > > > > >> Yangze Guo
> > > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

I second Till's concern about implicitly interpreting zero resource
requirements for unspecified operators.

I'm not against allowing both specifying SSG requirements as shortcuts and
further refining operator requirements as needed.

Combining Till's idea, we can do the following.
- Prefer using operator requirements if they are available for all
operators in a SSG, otherwise fallback to SSG requirements or default slot
resource if not specified.
- If cases that SSGs are not strictly respected and finer-grained
requirements are needed, derive them automatically if not provided.

I'm leaning towards introducing the SSG interfaces as the first step, and
introduce operator interfaces and the deriving logics as
future improvements.

Thank you~

Xintong Song



On Thu, Jan 21, 2021 at 4:45 PM Till Rohrmann <tr...@apache.org> wrote:

> If I understand you correctly Chesnay, then you want to decouple the
> resource requirement specification from the slot sharing group assignment.
> Hence, per default all operators would be in the same slot sharing group.
> If there is no operator with a resource specification, then the system
> would allocate a default slot for it. If there is at least one operator,
> then the system would sum up all the specified resources and allocate a
> slot of this size. This effectively means that all unspecified operators
> will implicitly have a zero resource requirement. Did I understand your
> idea correctly?
>
> I am wondering whether this wouldn't lead to a surprising behaviour for the
> user. If the user specifies the resource requirements for a single
> operator, then he probably will assume that the other operators will get
> the default share of resources and not nothing.
>
> Cheers,
> Till
>
> On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <ch...@apache.org>
> wrote:
>
> > Is there even a functional difference between specifying the
> > requirements for an SSG vs specifying the same requirements on a single
> > operator within that group (ideally a colocation group to avoid this
> > whole hint business)?
> >
> > Wouldn't we get the best of both worlds in the latter case?
> >
> > Users can take shortcuts to define shared requirements,
> > but refine them further as needed on a per-operator basis,
> > without changing semantics of slotsharing groups
> > nor the runtime being locked into SSG-based requirements.
> >
> > (And before anyone argues what happens if slotsharing groups change or
> > whatnot, that's a plain API issue that we could surely solve. (A plain
> > iteration over slotsharing groups and therein contained operators would
> > suffice)).
> >
> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > Maybe a different minor idea: Would it be possible to treat the SSG
> > > resource requirements as a hint for the runtime similar to how slot
> > sharing
> > > groups are designed at the moment? Meaning that we don't give the
> > guarantee
> > > that Flink will always deploy this set of tasks together no matter what
> > > comes. If, for example, the runtime can derive by some means the
> resource
> > > requirements for each task based on the requirements for the SSG, this
> > > could be possible. One easy strategy would be to give every task the
> same
> > > resources as the whole slot sharing group. Another one could be
> > > distributing the resources equally among the tasks. This does not even
> > have
> > > to be implemented but we would give ourselves the freedom to change
> > > scheduling if need should arise.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > >> Thanks for the responses, Till and Xintong.
> > >>
> > >> I second Xintong's comment that SSG-based runtime interface will give
> > >> us the flexibility to achieve op/task-based approach. That's one of
> > >> the most important reasons for our design choice.
> > >>
> > >> Some cents regarding the default operator resource:
> > >> - It might be good for the scenario of DataStream jobs.
> > >>     ** For light-weight operators, the accumulative configuration
> error
> > >> will not be significant. Then, the resource of a task used is
> > >> proportional to the number of operators it contains.
> > >>     ** For heavy operators like join and window or operators using the
> > >> external resources, user will turn to the fine-grained resource
> > >> configuration.
> > >> - It can increase the stability for the standalone cluster where task
> > >> executors registered are heterogeneous(with different default slot
> > >> resources).
> > >> - It might not be good for SQL users. The operators that SQL will be
> > >> transferred to is a black box to the user. We also do not guarantee
> > >> the cross-version of consistency of the transformation so far.
> > >>
> > >> I think it can be treated as a follow-up work when the fine-grained
> > >> resource management is end-to-end ready.
> > >>
> > >> Best,
> > >> Yangze Guo
> > >>
> > >>
> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <to...@gmail.com>
> > >> wrote:
> > >>> Thanks for the feedback, Till.
> > >>>
> > >>> ## I feel that what you proposed (operator-based + default value)
> might
> > >> be
> > >>> subsumed by the SSG-based approach.
> > >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> categorized
> > by
> > >>> whether the resource requirements are known to the users.
> > >>>
> > >>>     1. *Both known.* As previously mentioned, there's no reason to
> put
> > >>>     multiple operators whose individual resource requirements are
> > already
> > >> known
> > >>>     into the same group in fine-grained resource management. And if
> > op_1
> > >> and
> > >>>     op_2 are in different groups, there should be no problem
> switching
> > >> data
> > >>>     exchange mode from pipelined to blocking. This is equivalent to
> > >> specifying
> > >>>     operator resource requirements in your proposal.
> > >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is
> > in a
> > >>>     SSG whose resource is not specified thus would have the default
> > slot
> > >>>     resource. This is equivalent to having default operator resources
> > in
> > >> your
> > >>>     proposal.
> > >>>     3. *Both unknown*. The user can either set op_1 and op_2 to the
> > same
> > >> SSG
> > >>>     or separate SSGs.
> > >>>        - If op_1 and op_2 are in the same SSG, it will be equivalent
> to
> > >> the
> > >>>        coarse-grained resource management, where op_1 and op_2 share
> a
> > >> default
> > >>>        size slot no matter which data exchange mode is used.
> > >>>        - If op_1 and op_2 are in different SSGs, then each of them
> will
> > >> use
> > >>>        a default size slot. This is equivalent to setting them with
> > >> default
> > >>>        operator resources in your proposal.
> > >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is
> known.*
> > >>>        - It is possible that the user learns the total / max resource
> > >>>        requirement from executing and monitoring the job, while not
> > >>> being aware of
> > >>>        individual operator requirements.
> > >>>        - I believe this is the case your proposal does not cover. And
> > TBH,
> > >>>        this is probably how most users learn the resource
> requirements,
> > >>> according
> > >>>        to my experiences.
> > >>>        - In this case, the user might need to specify different
> > resources
> > >> if
> > >>>        he wants to switch the execution mode, which should not be
> worse
> > >> than not
> > >>>        being able to use fine-grained resource management.
> > >>>
> > >>>
> > >>> ## An additional idea inspired by your proposal.
> > >>> We may provide multiple options for deciding resources for SSGs whose
> > >>> requirement is not specified, if needed.
> > >>>
> > >>>     - Default slot resource (current design)
> > >>>     - Default operator resource times number of operators (equivalent
> > to
> > >>>     your proposal)
> > >>>
> > >>>
> > >>> ## Exposing internal runtime strategies
> > >>> Theoretically, yes. Tying to the SSGs, the resource requirements
> might
> > be
> > >>> affected if how SSGs are internally handled changes in future.
> > >> Practically,
> > >>> I do not concretely see at the moment what kind of changes we may
> want
> > in
> > >>> future that might conflict with this FLIP proposal, as the question
> of
> > >>> switching data exchange mode answered above. I'd suggest to not give
> up
> > >> the
> > >>> user friendliness we may gain now for the future problems that may or
> > may
> > >>> not exist.
> > >>>
> > >>> Moreover, the SSG-based approach has the flexibility to achieve the
> > >>> equivalent behavior as the operator-based approach, if we set each
> > >> operator
> > >>> (or task) to a separate SSG. We can even provide a shortcut option to
> > >>> automatically do that for users, if needed.
> > >>>
> > >>>
> > >>> Thank you~
> > >>>
> > >>> Xintong Song
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <trohrmann@apache.org
> >
> > >> wrote:
> > >>>> Thanks for the responses Xintong and Stephan,
> > >>>>
> > >>>> I agree that being able to define the resource requirements for a
> > >> group of
> > >>>> operators is more user friendly. However, my concern is that we are
> > >>>> exposing thereby internal runtime strategies which might limit our
> > >>>> flexibility to execute a given job. Moreover, the semantics of
> > >> configuring
> > >>>> resource requirements for SSGs could break if switching from
> streaming
> > >> to
> > >>>> batch execution. If one defines the resource requirements for op_1
> ->
> > >> op_2
> > >>>> which run in pipelined mode when using the streaming execution, then
> > >> how do
> > >>>> we interpret these requirements when op_1 -> op_2 are executed with
> a
> > >>>> blocking data exchange in batch execution mode? Consequently, I am
> > >> still
> > >>>> leaning towards Stephan's proposal to set the resource requirements
> > per
> > >>>> operator.
> > >>>>
> > >>>> Maybe the following proposal makes the configuration easier: If the
> > >> user
> > >>>> wants to use fine-grained resource requirements, then she needs to
> > >> specify
> > >>>> the default size which is used for operators which have no explicit
> > >>>> resource annotation. If this holds true, then every operator would
> > >> have a
> > >>>> resource requirement and the system can try to execute the operators
> > >> in the
> > >>>> best possible manner w/o being constrained by how the user set the
> SSG
> > >>>> requirements.
> > >>>>
> > >>>> Cheers,
> > >>>> Till
> > >>>>
> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <tonysong820@gmail.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Thanks for the feedback, Stephan.
> > >>>>>
> > >>>>> Actually, your proposal has also come to my mind at some point.
> And I
> > >>>> have
> > >>>>> some concerns about it.
> > >>>>>
> > >>>>>
> > >>>>> 1. It does not give users the same control as the SSG-based
> approach.
> > >>>>>
> > >>>>>
> > >>>>> While both approaches do not require specifying for each operator,
> > >>>>> SSG-based approach supports the semantic that "some operators
> > >> together
> > >>>> use
> > >>>>> this much resource" while the operator-based approach doesn't.
> > >>>>>
> > >>>>>
> > >>>>> Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
> > >> at
> > >>>> some
> > >>>>> point there's an agg o_n (1 < n < m) which significantly reduces
> the
> > >> data
> > >>>>> amount. One can separate the pipeline into 2 groups SSG_1 (o_1,
> ...,
> > >> o_n)
> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> > >> parallelisms
> > >>>>> for operators in SSG_1 than for operators in SSG_2 won't lead to
> too
> > >> much
> > >>>>> wasting of resources. If the two SSGs end up needing different
> > >> resources,
> > >>>>> with the SSG-based approach one can directly specify resources for
> > >> the
> > >>>> two
> > >>>>> groups. However, with the operator-based approach, the user will
> > >> have to
> > >>>>> specify resources for each operator in one of the two groups, and
> > >> tune
> > >>>> the
> > >>>>> default slot resource via configurations to fit the other group.
> > >>>>>
> > >>>>>
> > >>>>> 2. It increases the chance of breaking operator chains.
> > >>>>>
> > >>>>>
> > >>>>> Setting chainnable operators into different slot sharing groups
> will
> > >>>>> prevent them from being chained. In the current implementation,
> > >>>> downstream
> > >>>>> operators, if SSG not explicitly specified, will be set to the same
> > >> group
> > >>>>> as the chainable upstream operators (unless multiple upstream
> > >> operators
> > >>>> in
> > >>>>> different groups), to reduce the chance of breaking chains.
> > >>>>>
> > >>>>>
> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
> > >> SSGs
> > >>>>> based on whether resource is specified we will easily get groups
> like
> > >>>> (o_1,
> > >>>>> o_3) & (o_2, o_4), where none of the operators can be chained. This
> > >> is
> > >>>> also
> > >>>>> possible for the SSG-based approach, but I believe the chance is
> much
> > >>>>> smaller because there's no strong reason for users to specify the
> > >> groups
> > >>>>> with alternate operators like that. We are more likely to get
> groups
> > >> like
> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2
> and
> > >> o_3.
> > >>>>>
> > >>>>> 3. It complicates the system by having two different mechanisms for
> > >>>> sharing
> > >>>>> managed memory in  a slot.
> > >>>>>
> > >>>>>
> > >>>>> - In FLIP-141, we introduced the intra-slot managed memory sharing
> > >>>>> mechanism, where managed memory is first distributed according to
> the
> > >>>>> consumer type, then further distributed across operators of that
> > >> consumer
> > >>>>> type.
> > >>>>>
> > >>>>> - With the operator-based approach, managed memory size specified
> > >> for an
> > >>>>> operator should account for all the consumer types of that
> operator.
> > >> That
> > >>>>> means the managed memory is first distributed across operators,
> then
> > >>>>> distributed to different consumer types of each operator.
> > >>>>>
> > >>>>>
> > >>>>> Unfortunately, the different order of the two calculation steps can
> > >> lead
> > >>>> to
> > >>>>> different results. To be specific, the semantic of the
> configuration
> > >>>> option
> > >>>>> `consumer-weights` changed (within a slot vs. within an operator).
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> To sum up things:
> > >>>>>
> > >>>>> While (3) might be a bit more implementation related, I think (1)
> > >> and (2)
> > >>>>> somehow suggest that, the price for the proposed approach to avoid
> > >>>>> specifying resource for every operator is that it's not as
> > >> independent
> > >>>> from
> > >>>>> operator chaining and slot sharing as the operator-based approach
> > >>>> discussed
> > >>>>> in the FLIP.
> > >>>>>
> > >>>>>
> > >>>>> Thank you~
> > >>>>>
> > >>>>> Xintong Song
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org>
> > >> wrote:
> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > >>>>>>
> > >>>>>> I want to say, first of all, that this is super well written. And
> > >> the
> > >>>>>> points that the FLIP makes about how to expose the configuration
> to
> > >>>> users
> > >>>>>> is exactly the right thing to figure out first.
> > >>>>>> So good job here!
> > >>>>>>
> > >>>>>> About how to let users specify the resource profiles. If I can sum
> > >> the
> > >>>>> FLIP
> > >>>>>> and previous discussion up in my own words, the problem is the
> > >>>> following:
> > >>>>>> Operator-level specification is the simplest and cleanest
> approach,
> > >>>>> because
> > >>>>>>> it avoids mixing operator configuration (resource) and
> > >> scheduling. No
> > >>>>>>> matter what other parameters change (chaining, slot sharing,
> > >>>> switching
> > >>>>>>> pipelined and blocking shuffles), the resource profiles stay the
> > >>>> same.
> > >>>>>>> But it would require that a user specifies resources on all
> > >>>> operators,
> > >>>>>>> which makes it hard to use. That's why the FLIP suggests going
> > >> with
> > >>>>>>> specifying resources on a Sharing-Group.
> > >>>>>>
> > >>>>>> I think both thoughts are important, so can we find a solution
> > >> where
> > >>>> the
> > >>>>>> Resource Profiles are specified on an Operator, but we still avoid
> > >> that
> > >>>>> we
> > >>>>>> need to specify a resource profile on every operator?
> > >>>>>>
> > >>>>>> What do you think about something like the following:
> > >>>>>>    - Resource Profiles are specified on an operator level.
> > >>>>>>    - Not all operators need profiles
> > >>>>>>    - All Operators without a Resource Profile ended up in the
> > >> default
> > >>>> slot
> > >>>>>> sharing group with a default profile (will get a default slot).
> > >>>>>>    - All Operators with a Resource Profile will go into another
> slot
> > >>>>> sharing
> > >>>>>> group (the resource-specified-group).
> > >>>>>>    - Users can define different slot sharing groups for operators
> > >> like
> > >>>>> they
> > >>>>>> do now, with the exception that you cannot mix operators that have
> > >> a
> > >>>>>> resource profile and operators that have no resource profile.
> > >>>>>>    - The default case where no operator has a resource profile is
> > >> just a
> > >>>>>> special case of this model
> > >>>>>>    - The chaining logic sums up the profiles per operator, like it
> > >> does
> > >>>>> now,
> > >>>>>> and the scheduler sums up the profiles of the tasks that it
> > >> schedules
> > >>>>>> together.
> > >>>>>>
> > >>>>>>
> > >>>>>> There is another question about reactive scaling raised in the
> > >> FLIP. I
> > >>>>> need
> > >>>>>> to think a bit about that. That is indeed a bit more tricky once
> we
> > >>>> have
> > >>>>>> slots of different sizes.
> > >>>>>> It is not clear then which of the different slot requests the
> > >>>>>> ResourceManager should fulfill when new resources (TMs) show up,
> > >> or how
> > >>>>> the
> > >>>>>> JobManager redistributes the slots resources when resources (TMs)
> > >>>>> disappear
> > >>>>>> This question is pretty orthogonal, though, to the "how to specify
> > >> the
> > >>>>>> resources".
> > >>>>>>
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Stephan
> > >>>>>>
> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <
> tonysong820@gmail.com
> > >>>>> wrote:
> > >>>>>>> Thanks for drafting the FLIP and driving the discussion, Yangze.
> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > >>>>>>>
> > >>>>>>> @Till,
> > >>>>>>>
> > >>>>>>> I agree that specifying requirements for SSGs means that SSGs
> > >> need to
> > >>>>> be
> > >>>>>>> supported in fine-grained resource management, otherwise each
> > >>>> operator
> > >>>>>>> might use as many resources as the whole group. However, I cannot
> > >>>> think
> > >>>>>> of
> > >>>>>>> a strong reason for not supporting SSGs in fine-grained resource
> > >>>>>>> management.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Interestingly, if all operators have their resources properly
> > >>>>>> specified,
> > >>>>>>>> then slot sharing is no longer needed because Flink could
> > >> slice off
> > >>>>> the
> > >>>>>>>> appropriately sized slots for every Task individually.
> > >>>>>>>>
> > >>>>>>> So for example, if we have a job consisting of two operator op_1
> > >> and
> > >>>>> op_2
> > >>>>>>>> where each op needs 100 MB of memory, we would then say that
> > >> the
> > >>>> slot
> > >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
> > >> cluster
> > >>>>> with
> > >>>>>> 2
> > >>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
> > >> this
> > >>>>> job.
> > >>>>>> If
> > >>>>>>>> the resources were specified on an operator level, then the
> > >> system
> > >>>>>> could
> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
> > >> TM_2.
> > >>>>>>>
> > >>>>>>> Couldn't agree more that if all operators' requirements are
> > >> properly
> > >>>>>>> specified, slot sharing should be no longer needed. I think this
> > >>>>> exactly
> > >>>>>>> disproves the example. If we already know op_1 and op_2 each
> > >> needs
> > >>>> 100
> > >>>>> MB
> > >>>>>>> of memory, why would we put them in the same group? If they are
> > >> in
> > >>>>>> separate
> > >>>>>>> groups, with the proposed approach the system can freely deploy
> > >> them
> > >>>> to
> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > >>>>>>>
> > >>>>>>> Moreover, the precondition for not needing slot sharing is having
> > >>>>>> resource
> > >>>>>>> requirements properly specified for all operators. This is not
> > >> always
> > >>>>>>> possible, and usually requires tremendous efforts. One of the
> > >>>> benefits
> > >>>>>> for
> > >>>>>>> SSG-based requirements is that it allows the user to freely
> > >> decide
> > >>>> the
> > >>>>>>> granularity, thus efforts they want to pay. I would consider SSG
> > >> in
> > >>>>>>> fine-grained resource management as a group of operators that the
> > >>>> user
> > >>>>>>> would like to specify the total resource for. There can be only
> > >> one
> > >>>>> group
> > >>>>>>> in the job, 2~3 groups dividing the job into a few major parts,
> > >> or as
> > >>>>>> many
> > >>>>>>> groups as the number of tasks/operators, depending on how
> > >>>> fine-grained
> > >>>>>> the
> > >>>>>>> user is able to specify the resources.
> > >>>>>>>
> > >>>>>>> Having to support SSGs might be a constraint. But given that all
> > >> the
> > >>>>>>> current scheduler implementations already support SSGs, I tend to
> > >>>> think
> > >>>>>>> that as an acceptable price for the above discussed usability and
> > >>>>>>> flexibility.
> > >>>>>>>
> > >>>>>>> @Chesnay
> > >>>>>>>
> > >>>>>>> Will declaring them on slot sharing groups not also waste
> > >> resources
> > >>>> if
> > >>>>>> the
> > >>>>>>>> parallelism of operators within that group are different?
> > >>>>>>>>
> > >>>>>>> Yes. It's a trade-off between usability and resource
> > >> utilization. To
> > >>>>>> avoid
> > >>>>>>> such wasting, the user can define more groups, so that each group
> > >>>>>> contains
> > >>>>>>> less operators and the chance of having operators with different
> > >>>>>>> parallelism will be reduced. The price is to have more resource
> > >>>>>>> requirements to specify.
> > >>>>>>>
> > >>>>>>> It also seems like quite a hassle for users having to
> > >> recalculate the
> > >>>>>>>> resource requirements if they change the slot sharing.
> > >>>>>>>> I'd think that it's not really workable for users that create
> > >> a set
> > >>>>> of
> > >>>>>>>> re-usable operators which are mixed and matched in their
> > >>>>> applications;
> > >>>>>>>> managing the resources requirements in such a setting would be
> > >> a
> > >>>>>>>> nightmare, and in the end would require operator-level
> > >> requirements
> > >>>>> any
> > >>>>>>>> way.
> > >>>>>>>> In that sense, I'm not even sure whether it really increases
> > >>>>> usability.
> > >>>>>>>     - As mentioned in my reply to Till's comment, there's no
> > >> reason to
> > >>>>> put
> > >>>>>>>     multiple operators whose individual resource requirements are
> > >>>>> already
> > >>>>>>> known
> > >>>>>>>     into the same group in fine-grained resource management.
> > >>>>>>>     - Even an operator implementation is reused for multiple
> > >>>>> applications,
> > >>>>>>>     it does not guarantee the same resource requirements. During
> > >> our
> > >>>>> years
> > >>>>>>> of
> > >>>>>>>     practices in Alibaba, with per-operator requirements
> > >> specified for
> > >>>>>>> Blink's
> > >>>>>>>     fine-grained resource management, very few users (including
> > >> our
> > >>>>>>> specialists
> > >>>>>>>     who are dedicated to supporting Blink users) are as
> > >> experienced as
> > >>>>> to
> > >>>>>>>     accurately predict/estimate the operator resource
> > >> requirements.
> > >>>> Most
> > >>>>>>> people
> > >>>>>>>     rely on the execution-time metrics (throughput, delay, cpu
> > >> load,
> > >>>>>> memory
> > >>>>>>>     usage, GC pressure, etc.) to improve the specification.
> > >>>>>>>
> > >>>>>>> To sum up:
> > >>>>>>> If the user is capable of providing proper resource requirements
> > >> for
> > >>>>>> every
> > >>>>>>> operator, that's definitely a good thing and we would not need to
> > >>>> rely
> > >>>>> on
> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > >> fine-grained
> > >>>>>> resource
> > >>>>>>> management to work. For those users who are capable and do not
> > >> like
> > >>>>>> having
> > >>>>>>> to set each operator to a separate SSG, I would be ok to have
> > >> both
> > >>>>>>> SSG-based and operator-based runtime interfaces and to only
> > >> fallback
> > >>>> to
> > >>>>>> the
> > >>>>>>> SSG requirements when the operator requirements are not
> > >> specified.
> > >>>>>> However,
> > >>>>>>> as the first step, I think we should prioritise the use cases
> > >> where
> > >>>>> users
> > >>>>>>> are not that experienced.
> > >>>>>>>
> > >>>>>>> Thank you~
> > >>>>>>>
> > >>>>>>> Xintong Song
> > >>>>>>>
> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > >> chesnay@apache.org>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Will declaring them on slot sharing groups not also waste
> > >> resources
> > >>>>> if
> > >>>>>>>> the parallelism of operators within that group are different?
> > >>>>>>>>
> > >>>>>>>> It also seems like quite a hassle for users having to
> > >> recalculate
> > >>>> the
> > >>>>>>>> resource requirements if they change the slot sharing.
> > >>>>>>>> I'd think that it's not really workable for users that create
> > >> a set
> > >>>>> of
> > >>>>>>>> re-usable operators which are mixed and matched in their
> > >>>>> applications;
> > >>>>>>>> managing the resources requirements in such a setting would be
> > >> a
> > >>>>>>>> nightmare, and in the end would require operator-level
> > >> requirements
> > >>>>> any
> > >>>>>>>> way.
> > >>>>>>>> In that sense, I'm not even sure whether it really increases
> > >>>>> usability.
> > >>>>>>>> My main worry is that it if we wire the runtime to work on SSGs
> > >>>> it's
> > >>>>>>>> gonna be difficult to implement more fine-grained approaches,
> > >> which
> > >>>>>>>> would not be the case if, for the runtime, they are always
> > >> defined
> > >>>> on
> > >>>>>> an
> > >>>>>>>> operator-level.
> > >>>>>>>>
> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
> > >>>> Yangze.
> > >>>>>>>>> I like that defining resource requirements on a slot sharing
> > >>>> group
> > >>>>>>> makes
> > >>>>>>>>> the overall setup easier and improves usability of resource
> > >>>>>>> requirements.
> > >>>>>>>>> What I do not like about it is that it changes slot sharing
> > >>>> groups
> > >>>>>> from
> > >>>>>>>>> being a scheduling hint to something which needs to be
> > >> supported
> > >>>> in
> > >>>>>>> order
> > >>>>>>>>> to support fine grained resource requirements. So far, the
> > >> idea
> > >>>> of
> > >>>>>> slot
> > >>>>>>>>> sharing groups was that it tells the system that a set of
> > >>>> operators
> > >>>>>> can
> > >>>>>>>> be
> > >>>>>>>>> deployed in the same slot. But the system still had the
> > >> freedom
> > >>>> to
> > >>>>>> say
> > >>>>>>>> that
> > >>>>>>>>> it would rather place these tasks in different slots if it
> > >>>> wanted.
> > >>>>> If
> > >>>>>>> we
> > >>>>>>>>> now specify resource requirements on a per slot sharing
> > >> group,
> > >>>> then
> > >>>>>> the
> > >>>>>>>>> only option for a scheduler which does not support slot
> > >> sharing
> > >>>>>> groups
> > >>>>>>> is
> > >>>>>>>>> to say that every operator in this slot sharing group needs a
> > >>>> slot
> > >>>>>> with
> > >>>>>>>> the
> > >>>>>>>>> same resources as the whole group.
> > >>>>>>>>>
> > >>>>>>>>> So for example, if we have a job consisting of two operator
> > >> op_1
> > >>>>> and
> > >>>>>>> op_2
> > >>>>>>>>> where each op needs 100 MB of memory, we would then say that
> > >> the
> > >>>>> slot
> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
> > >> cluster
> > >>>>>> with
> > >>>>>>> 2
> > >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
> > >> this
> > >>>>>> job.
> > >>>>>>> If
> > >>>>>>>>> the resources were specified on an operator level, then the
> > >>>> system
> > >>>>>>> could
> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
> > >> TM_2.
> > >>>>>>>>> Originally, one of the primary goals of slot sharing groups
> > >> was
> > >>>> to
> > >>>>>> make
> > >>>>>>>> it
> > >>>>>>>>> easier for the user to reason about how many slots a job
> > >> needs
> > >>>>>>>> independent
> > >>>>>>>>> of the actual number of operators in the job. Interestingly,
> > >> if
> > >>>> all
> > >>>>>>>>> operators have their resources properly specified, then slot
> > >>>>> sharing
> > >>>>>> is
> > >>>>>>>> no
> > >>>>>>>>> longer needed because Flink could slice off the appropriately
> > >>>> sized
> > >>>>>>> slots
> > >>>>>>>>> for every Task individually. What matters is whether the
> > >> whole
> > >>>>>> cluster
> > >>>>>>>> has
> > >>>>>>>>> enough resources to run all tasks or not.
> > >>>>>>>>>
> > >>>>>>>>> Cheers,
> > >>>>>>>>> Till
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > >> karmagyz@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>>> Hi, there,
> > >>>>>>>>>>
> > >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
> > >> Runtime
> > >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
> > >> where we
> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
> > >> for
> > >>>>>>>>>> specifying fine-grained resource requirements.
> > >>>>>>>>>>
> > >>>>>>>>>> In this FLIP:
> > >>>>>>>>>> - Expound the user story of fine-grained resource
> > >> management.
> > >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
> > >> resource
> > >>>>>>>>>> requirements.
> > >>>>>>>>>> - Discuss the pros and cons of the three potential
> > >> granularities
> > >>>>> for
> > >>>>>>>>>> specifying the resource requirements (op, task and slot
> > >> sharing
> > >>>>>> group)
> > >>>>>>>>>> and explain why we choose the slot sharing group.
> > >>>>>>>>>>
> > >>>>>>>>>> Please find more details in the FLIP wiki document [1].
> > >> Looking
> > >>>>>>>>>> forward to your feedback.
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Yangze Guo
> > >>>>>>>>>>
> > >>>>>>>>
> >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Kezhu Wang <ke...@gmail.com>.

Hi Xintong,

Thanks for the backgrounds!

I understand the impractical of operator level specifications and the value
of group level specifications. Just not that confident about “Coupling
between operator chaining / slot sharing”, seems to me, it requires more
knowledge than “Expose operator chaining”.

Best,
Kezhu Wang

On Thu, Feb 4, 2021 at 13:22 Xintong Song <to...@gmail.com> wrote:

> Hi Kezhu,
>
> Maybe let me share some backgrounds first.
>
>    - We at Alibaba have been using fine-grained resource management for
>    many years, with Blink (an internal version of Flink).
>    - We have been trying to contribute this feature to Apache Flink since
>    many years ago. However, we haven't succeeded, due to various reasons.
>       - Back to years ago, I believe there were not many users that used
>       Flink in production at a very large scale, thus less demand for
> the feature.
>       - The feature on Blink is quite specific to our internal use cases
>       and scenarios. We have not made it general enough to cover the
> community's
>       common use cases.
>       - Divergences between Flink & Blink code bases.
>    - Blink used operator-level resource interfaces. According to our years
>    of production experiences, we believe that specifying operator-level
>    resources are neither necessary nor easy-to-use. This is why we propose
>    group-level interfaces.
>
> Back to your questions.
>
> I saw the dicussion to keep slot sharing as an hint, but in reality, will
> > SSG jobs expect to fail or
> > run slowly if scheduler does not respect it ? A slot with 20GB memory is
> > different from two 1GB
> > default sized slots. So, we actually depends on scheduler
> > version/implementation/de-fact if we
> > claim it is an hint.
> >
>
> SSG-based resource requirements are considered hints because the SSG itself
> is a hint. There's no guarantee that operators of a SSG will always be
> scheduled together. I think you have a good point that, if SSGs are not
> respected, is it prefered to fail the job or to interpret the resource of
> an actual slot. It's possible that we provide a configuration option and
> leave that decision to the users. However, that is a design choice we need
> to make when there's indeed a need for not respecting the SSGs.
>
> Do you mean code-path or production environment ? If it is code-path, could
> > you please point out where
> > the story breaks ?
> >
> > From the dicussion and history, could I consider FLIP-156 is an
> redirection
> > more than inheritance/enhancement
> > of current halfly-cooked/ancient implmentation ?
> >
>
> If you try to set the operator resources, you would find that it won't work
> at the moment. There are several things not ready.
>
>    - Interfaces for setting operator resources are never really exposed to
>    users.
>    - The resource manager never allocates slots with the requested
>    resources.
>    - Managed memory size specified for operators will not be respected,
>    because managed memory is shared within a slot with a different
> approach.
>
> While the first 2 points are more related to that the feature is not yet
> ready, the last point is closely related to the specifying operator level
> resources.
>
> To sum up, we do not want to support specifying operator level in the first
> step, for the following reasons.
>
>    - It's not likely needed, due to poor usability compared to the
>    SSG-based approach.
>    - It introduces the complexity to deal with the managed memory sharing.
>    - It introduces the complexity to deal with combining resource
>    requirements from two different levels.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Feb 3, 2021 at 7:50 PM Kezhu Wang <ke...@gmail.com> wrote:
>
> > Hi Till,
> >
> > Based on what I understood, if not wrong, the door is not closed after
> SSG
> > resource specifying. So, hope it could be useful in potential future
> > improvement.
> >
> > Best,
> > Kezhu Wang
> >
> >
> > On February 3, 2021 at 18:07:21, Till Rohrmann (trohrmann@apache.org)
> > wrote:
> >
> > Thanks for sharing your thoughts Kezhu. I like your ideas of how
> > per-operator and SSG requirements can be combined. I've also thought
> about
> > defining a default resource profile for all tasks which have no resources
> > configured. That way all operators would have resources assigned if the
> > user chooses to use this feature.
> >
> > As Yangze and Xintong have said, we have decided to first only support
> > specifying resources for SSGs as this seems more user friendly. Based on
> > the feedback for this feature one potential development direction might
> be
> > to allow the resource specification on per-operator basis. Here we could
> > pick up your ideas.
> >
> > Cheers,
> > Till
> >
> > On Wed, Feb 3, 2021 at 7:31 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks for your feedback, Kezhu.
> > >
> > > I think Flink *runtime* already has an ideal granularity for resource
> > > > management 'task'. If there is
> > > > a slot shared by multiple tasks, that slot's resource requirement is
> > > simple
> > > > sum of all its logical
> > > > slots. So basically, this is no resource requirement for
> > SlotSharingGroup
> > > > in runtime until now,
> > > > right ?
> > >
> > > That is a halfly-cooked implementation, coming from the previous
> attempts
> > > (years ago) trying to deliver the fine-grained resource management
> > feature,
> > > and never really put into use.
> > >
> > > From the FLIP and dicusssion, I assume that SSG resource specifying
> will
> > > > override operator level
> > > > resource specifying if both are specified ?
> > > >
> > > Actually, I think we should use the finer-grained resources (i.e.
> > operator
> > > level) if both are specified. And more importantly, that is based on
> the
> > > assumption that we do need two different levels of interfaces.
> > >
> > > So, I wonder whether we could interpret SSG resource specifying as an
> > "add"
> > > > but not an "set" on
> > > > resource requirement ?
> > > >
> > > IIUC, this is the core idea behind your proposal. I think it provides
> an
> > > interesting idea of how we combine operator level and SSG level
> > resources,
> > > *if
> > > we allow configuring resources at both levels*. However, I'm not sure
> > > whether the configuring resources on the operator level is indeed
> needed.
> > > Therefore, as a first step, this FLIP proposes to only introduce the
> > > SSG-level interfaces. As listed in the future plan, we would consider
> > > allowing operator level resource configuration later if we do see a
> need
> > > for it. At that time, we definitely should discuss what to do if
> > resources
> > > are configured at both levels.
> > >
> > > * Could SSG express negative resource requirement ?
> > > >
> > > No.
> > >
> > > Is there concrete bar for partial resource configured not function ? I
> > > > saw it will fail job submission in Dispatcher.submitJob.
> > > >
> > > With the SSG-based approach, this should no longer be needed. The
> > > constraint was introduced because we can neither properly define what
> is
> > > the resource of a task chained from an operator with specified resource
> > and
> > > another with unspecified resource, nor for a slot shared by a task with
> > > specified resource and another with unspecified resource. With the
> > > SSG-based approach, we no longer have those problems.
> > >
> > > An option(cluster/job level) to force slot sharing in scheduler ? This
> > > > could be useful in case of migration from FLIP-156 to future
> approach.
> > > >
> > > I think this is exactly what we are trying to avoid, requiring the
> > > scheduler to enforce slot sharing.
> > >
> > > An option(cluster) to ignore resource specifying(allow resource
> specified
> > > > job to run on open box environment) for no production usage ?
> > > >
> > > That's possible. Actually, we are planning to introduce an option for
> > > activating the fine-grained resource management, for development
> > purposes.
> > > We might consider to keep that option after the feature is completed,
> to
> > > allow disable the feature without having to touch the job codes.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <ke...@gmail.com> wrote:
> > >
> > > > Hi all, sorry for join discussion even after voting started.
> > > >
> > > > I want to share my thoughts on this after reading above discussions.
> > > >
> > > > I think Flink *runtime* already has an ideal granularity for resource
> > > > management 'task'. If there is
> > > > a slot shared by multiple tasks, that slot's resource requirement is
> > > simple
> > > > sum of all its logical
> > > > slots. So basically, this is no resource requirement for
> > SlotSharingGroup
> > > > in runtime until now,
> > > > right ?
> > > >
> > > > As in discussion, we already agree upon that: "If all operators have
> > > their
> > > > resources properly
> > > > specified, then slot sharing is no longer needed. "
> > > >
> > > > So seems to me, naturally in mind path, what we would discuss is
> that:
> > > how
> > > > to bridge impractical
> > > > operator level resource specifying to runtime task level resource
> > > > requirement ? This is actually a
> > > > pure api thing as Chesnay has pointed out.
> > > >
> > > > But FLIP-156 brings another direction on table: how about using SSG
> for
> > > > both api and runtime
> > > > resource specifying ?
> > > >
> > > > From the FLIP and dicusssion, I assume that SSG resource specifying
> > will
> > > > override operator level
> > > > resource specifying if both are specified ?
> > > >
> > > > So, I wonder whether we could interpret SSG resource specifying as an
> > > "add"
> > > > but not an "set" on
> > > > resource requirement ?
> > > >
> > > > The semantics is that SSG resource specifying adds additional
> resource
> > to
> > > > shared slot to express
> > > > concerns on possible high thoughput and resource requirement for
> tasks
> > in
> > > > one physical slot.
> > > >
> > > > The result is that if scheduler indeed respect slot sharing,
> allocated
> > > slot
> > > > will gain extra resource
> > > > specified for that SSG.
> > > >
> > > > I think one of coding barrier from "add" approach is
> > ResourceSpec.UNKNOWN
> > > > which didn't support
> > > > 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> > > > executor should be aware of
> > > > this.
> > > >
> > > > @Chesnay
> > > > > My main worry is that it if we wire the runtime to work on SSGs
> it's
> > > > > gonna be difficult to implement more fine-grained approaches, which
> > > > > would not be the case if, for the runtime, they are always defined
> on
> > > an
> > > > > operator-level.
> > > >
> > > > An "add" operation should be less invasive and enforce low barrier
> for
> > > > future find-grained
> > > > approaches.
> > > >
> > > > @Stephan
> > > > > - Users can define different slot sharing groups for operators like
> > > > they
> > > > > do now, with the exception that you cannot mix operators that have
> a
> > > > > resource profile and operators that have no resource profile.
> > > >
> > > > @Till
> > > > > This effectively means that all unspecified operators
> > > > > will implicitly have a zero resource requirement.
> > > > > I am wondering whether this wouldn't lead to a surprising behaviour
> > for
> > > > the
> > > > > user. If the user specifies the resource requirements for a single
> > > > > operator, then he probably will assume that the other operators
> will
> > > get
> > > > > the default share of resources and not nothing.
> > > >
> > > > I think it is inherent due to fact that we could not defining
> > > > ResourceSpec.ONE, eg. resource
> > > > requirement for exact one default slot, with concrete numbers ? I
> tend
> > to
> > > > squash out unspecified one
> > > > if there are operators in chaining with explicit resource specifying.
> > > > Otherwise, the protocol tends
> > > > to verbose as say "give me this much resource and a default". I think
> > if
> > > we
> > > > have explict resource
> > > > specifying for partial operators, it is just saying "I don't care
> other
> > > > operators that much, just
> > > > get them places to run". It is most likely be cases there are
> stateless
> > > > fliter/map or other less
> > > > resource consuming operators. If there is indeed a problem, I think
> > > clients
> > > > can specify a global
> > > > default(or other level default in future). In job graph generating
> > phase,
> > > > we could take that default
> > > > into account for unspecified operators.
> > > >
> > > > @FLIP-156
> > > > > Expose operator chaining. (Cons fo task level resource specifying)
> > > >
> > > > Is it inherent for all group level resource specifying ? They will
> > either
> > > > break chaining or obey it,
> > > > or event could not work with.
> > > >
> > > > To sum up above, my suggestions are:
> > > >
> > > > In api side:
> > > > * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> > > > unspecified).
> > > > * Operator: ResourceSpec.ZERO(unspecified) as default.
> > > > * Task: sum of requirements from specified operators + global
> > default(if
> > > > there are any unspecified operators)
> > > > * SSG: additional resource to physical slot.
> > > >
> > > > In runtime side:
> > > > * Task: ResourceSpec.Task or ResourceSpec.ZERO
> > > > * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
> > > >
> > > > Physical slot gets sum up resources from logical slots and SSG, if it
> > > gets
> > > > ResourceSpec.ZERO, it is
> > > > just a default sized slot.
> > > >
> > > > In short, turn SSG resource speciying as "add" and drop
> > > > ResourceSpec.UNKNOWN.
> > > >
> > > >
> > > > Questions/Issues:
> > > > * Could SSG express negative resource requirement ?
> > > > * Is there concrete bar for partial resource configured not function
> ?
> > I
> > > > saw it will fail job submission in Dispatcher.submitJob.
> > > > * An option(cluster/job level) to force slot sharing in scheduler ?
> > This
> > > > could be useful in case of migration from FLIP-156 to future
> approach.
> > > > * An option(cluster) to ignore resource specifying(allow resource
> > > specified
> > > > job to run on open box environment) for no production usage ?
> > > >
> > > >
> > > >
> > > > On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com)
> > wrote:
> > > >
> > > > Thanks for reply, Till and Xintong!
> > > >
> > > > I update the FLIP, including:
> > > > - Edit the JavaDoc of the proposed
> > > > StreamGraphGenerator#setSlotSharingGroupResource.
> > > > - Add "Future Plan" section, which contains the potential follow-up
> > > > issues and the limitations to be documented when fine-grained
> resource
> > > > management is exposed to users.
> > > >
> > > > I'll start a vote in another thread.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <trohrmann@apache.org
> >
> > > > wrote:
> > > > >
> > > > > Thanks for summarizing the discussion, Yangze. I agree that setting
> > > > > resource requirements per operator is not very user friendly.
> > > Moreover, I
> > > > > couldn't come up with a different proposal which would be as easy
> to
> > > use
> > > > > and wouldn't expose internal scheduling details. In fact, following
> > > this
> > > > > argument then we shouldn't have exposed the slot sharing groups in
> > the
> > > > > first place.
> > > > >
> > > > > What is important for the user is that we properly document the
> > > > limitations
> > > > > and constraints the fine grained resource specification has. For
> > > example,
> > > > > we should explain how optimizations like chaining are affected by
> it
> > > and
> > > > > how different execution modes (batch vs. streaming) affect the
> > > execution
> > > > of
> > > > > operators which have specified resources. These things shouldn't
> > become
> > > > > part of the contract of this feature and are more caused by
> internal
> > > > > implementation details but it will be important to understand these
> > > > things
> > > > > properly in order to use this feature effectively.
> > > > >
> > > > > Hence, +1 for starting the vote for this FLIP.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <
> tonysong820@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks for the summary, Yangze.
> > > > > >
> > > > > > The changes and follow-up issues LGTM. Let's wait for responses
> > from
> > > > the
> > > > > > others before starting a vote.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > > > > summarize the current convergence in the discussion. Please let
> > me
> > > > > > > know if I got things wrong or missed something crucial here.
> > > > > > >
> > > > > > > Change of this FLIP:
> > > > > > > - Treat the SSG resource requirements as a hint instead of a
> > > > > > > restriction for the runtime. That's should be explicitly
> > explained
> > > in
> > > > > > > the JavaDocs.
> > > > > > >
> > > > > > > Potential follow-up issues if needed:
> > > > > > > - Provide operator-level resource configuration interface.
> > > > > > > - Provide multiple options for deciding resources for SSGs
> whose
> > > > > > > requirement is not specified:
> > > > > > > ** Default slot resource.
> > > > > > > ** Default operator resource times number of operators.
> > > > > > >
> > > > > > > If there are no other issues, I'll update the FLIP accordingly
> > and
> > > > > > > start a vote thread. Thanks all for the valuable feedback
> again.
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <
> > > tonysong820@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > FGRuntimeInterface.png
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <
> > > > tonysong820@gmail.com>
> > > >
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> I think Chesnay's proposal could actually work. IIUC, the
> > > keypoint
> > > > is
> > > > > > > to derive operator requirements from SSG requirements on the
> API
> > > > side, so
> > > > > > > that the runtime only deals with operator requirements. It's
> > > > debatable
> > > > > > how
> > > > > > > the deriving should be done though. E.g., an alternative could
> be
> > > to
> > > > > > evenly
> > > > > > > divide the SSG requirement into requirements of operators in
> the
> > > > group.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> However, I'm not entirely sure which option is more desired.
> > > > > > > Illustrating my understanding in the following figure, in which
> > on
> > > > the
> > > > > > top
> > > > > > > is Chesnay's proposal and on the bottom is the SSG-based
> proposal
> > > in
> > > > this
> > > > > > > FLIP.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> I think the major difference between the two approaches is
> > where
> > > > > > > deriving operator requirements from SSG requirements happens.
> > > > > > > >>
> > > > > > > >> - Chesnay's proposal simplifies the runtime logic and the
> > > > interface to
> > > > > > > expose, at the price of moving more complexity (i.e. the
> > deriving)
> > > to
> > > > the
> > > > > > > API side. The question is, where do we prefer to keep the
> > > complexity?
> > > > I'm
> > > > > > > slightly leaning towards having a thin API and keep the
> > complexity
> > > in
> > > > > > > runtime if possible.
> > > > > > > >>
> > > > > > > >> - Notice that the dash line arrows represent optional steps
> > that
> > > > are
> > > > > > > needed only for schedulers that do not respect SSGs, which we
> > don't
> > > > have
> > > > > > at
> > > > > > > the moment. If we only look at the solid line arrows, then the
> > > > SSG-based
> > > > > > > approach is much simpler, without needing to derive and
> aggregate
> > > the
> > > > > > > requirements back and forth. I'm not sure about complicating
> the
> > > > current
> > > > > > > design only for the potential future needs.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Thank you~
> > > > > > > >>
> > > > > > > >> Xintong Song
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> > > > chesnay@apache.org>
> > > > > > > wrote:
> > > > > > > >>>
> > > > > > > >>> You're raising a good point, but I think I can rectify that
> > > with
> > > > a
> > > > > > > minor
> > > > > > > >>> adjustment.
> > > > > > > >>>
> > > > > > > >>> Default requirements are whatever the default requirements
> > are,
> > > > > > setting
> > > > > > > >>> the requirements for one operator has no effect on other
> > > > operators.
> > > > > > > >>>
> > > > > > > >>> With these rules, and some API enhancements, the following
> > > mockup
> > > > > > would
> > > > > > > >>> replicate the SSG-based behavior:
> > > > > > > >>>
> > > > > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > > > > >>> vertices = slotSharingGroup.getVertices()
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> > >
> >
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > > > > >>> }
> > > > > > > >>>
> > > > > > > >>> We could even allow setting requirements on
> > slotsharing-groups
> > > > > > > >>> colocation-groups and internally translate them
> accordingly.
> > > > > > > >>> I can't help but feel this is a plain API issue.
> > > > > > > >>>
> > > > > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > > > > >>> > If I understand you correctly Chesnay, then you want to
> > > > decouple
> > > > > > the
> > > > > > > >>> > resource requirement specification from the slot sharing
> > > group
> > > > > > > >>> > assignment. Hence, per default all operators would be in
> > the
> > > > same
> > > > > > > slot
> > > > > > > >>> > sharing group. If there is no operator with a resource
> > > > > > specification,
> > > > > > > >>> > then the system would allocate a default slot for it. If
> > > there
> > > > is
> > > > > > at
> > > > > > > >>> > least one operator, then the system would sum up all the
> > > > specified
> > > > > > > >>> > resources and allocate a slot of this size. This
> > effectively
> > > > means
> > > > > > > >>> > that all unspecified operators will implicitly have a
> zero
> > > > resource
> > > > > > > >>> > requirement. Did I understand your idea correctly?
> > > > > > > >>> >
> > > > > > > >>> > I am wondering whether this wouldn't lead to a surprising
> > > > behaviour
> > > > > > > >>> > for the user. If the user specifies the resource
> > requirements
> > > > for a
> > > > > > > >>> > single operator, then he probably will assume that the
> > other
> > > > > > > operators
> > > > > > > >>> > will get the default share of resources and not nothing.
> > > > > > > >>> >
> > > > > > > >>> > Cheers,
> > > > > > > >>> > Till
> > > > > > > >>> >
> > > > > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > > > > chesnay@apache.org
> > > > > > > >>> > <ma...@apache.org>> wrote:
> > > > > > > >>> >
> > > > > > > >>> > Is there even a functional difference between specifying
> > the
> > > > > > > >>> > requirements for an SSG vs specifying the same
> requirements
> > > on
> > > > > > a
> > > > > > > >>> > single
> > > > > > > >>> > operator within that group (ideally a colocation group to
> > > avoid
> > > > > > > this
> > > > > > > >>> > whole hint business)?
> > > > > > > >>> >
> > > > > > > >>> > Wouldn't we get the best of both worlds in the latter
> case?
> > > > > > > >>> >
> > > > > > > >>> > Users can take shortcuts to define shared requirements,
> > > > > > > >>> > but refine them further as needed on a per-operator
> basis,
> > > > > > > >>> > without changing semantics of slotsharing groups
> > > > > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > > > > >>> >
> > > > > > > >>> > (And before anyone argues what happens if slotsharing
> > groups
> > > > > > > >>> > change or
> > > > > > > >>> > whatnot, that's a plain API issue that we could surely
> > solve.
> > > > > > (A
> > > > > > > >>> > plain
> > > > > > > >>> > iteration over slotsharing groups and therein contained
> > > > > > operators
> > > > > > > >>> > would
> > > > > > > >>> > suffice)).
> > > > > > > >>> >
> > > > > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > > > > >>> > > Maybe a different minor idea: Would it be possible to
> > treat
> > > > > > > the SSG
> > > > > > > >>> > > resource requirements as a hint for the runtime similar
> > to
> > > > > > how
> > > > > > > >>> > slot sharing
> > > > > > > >>> > > groups are designed at the moment? Meaning that we
> don't
> > > give
> > > > > > > >>> > the guarantee
> > > > > > > >>> > > that Flink will always deploy this set of tasks
> together
> > no
> > > > > > > >>> > matter what
> > > > > > > >>> > > comes. If, for example, the runtime can derive by some
> > > means
> > > > > > > the
> > > > > > > >>> > resource
> > > > > > > >>> > > requirements for each task based on the requirements
> for
> > > the
> > > > > > > >>> > SSG, this
> > > > > > > >>> > > could be possible. One easy strategy would be to give
> > every
> > > > > > > task
> > > > > > > >>> > the same
> > > > > > > >>> > > resources as the whole slot sharing group. Another one
> > > could
> > > > > > be
> > > > > > > >>> > > distributing the resources equally among the tasks.
> This
> > > does
> > > > > > > >>> > not even have
> > > > > > > >>> > > to be implemented but we would give ourselves the
> freedom
> > > to
> > > > > > > change
> > > > > > > >>> > > scheduling if need should arise.
> > > > > > > >>> > >
> > > > > > > >>> > > Cheers,
> > > > > > > >>> > > Till
> > > > > > > >>> > >
> > > > > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > > > > karmagyz@gmail.com
> > > > > > > >>> > <ma...@gmail.com>> wrote:
> > > > > > > >>> > >
> > > > > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > > > > >>> > >>
> > > > > > > >>> > >> I second Xintong's comment that SSG-based runtime
> > > interface
> > > > > > > >>> > will give
> > > > > > > >>> > >> us the flexibility to achieve op/task-based approach.
> > > That's
> > > > > > > one of
> > > > > > > >>> > >> the most important reasons for our design choice.
> > > > > > > >>> > >>
> > > > > > > >>> > >> Some cents regarding the default operator resource:
> > > > > > > >>> > >> - It might be good for the scenario of DataStream
> jobs.
> > > > > > > >>> > >> ** For light-weight operators, the accumulative
> > > > > > > >>> > configuration error
> > > > > > > >>> > >> will not be significant. Then, the resource of a task
> > used
> > > > > > is
> > > > > > > >>> > >> proportional to the number of operators it contains.
> > > > > > > >>> > >> ** For heavy operators like join and window or
> operators
> > > > > > > >>> > using the
> > > > > > > >>> > >> external resources, user will turn to the fine-grained
> > > > > > > resource
> > > > > > > >>> > >> configuration.
> > > > > > > >>> > >> - It can increase the stability for the standalone
> > cluster
> > > > > > > >>> > where task
> > > > > > > >>> > >> executors registered are heterogeneous(with different
> > > > > > default
> > > > > > > slot
> > > > > > > >>> > >> resources).
> > > > > > > >>> > >> - It might not be good for SQL users. The operators
> that
> > > SQL
> > > > > > > >>> > will be
> > > > > > > >>> > >> transferred to is a black box to the user. We also do
> > not
> > > > > > > guarantee
> > > > > > > >>> > >> the cross-version of consistency of the transformation
> > so
> > > > > > far.
> > > > > > > >>> > >>
> > > > > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > > > > fine-grained
> > > > > > > >>> > >> resource management is end-to-end ready.
> > > > > > > >>> > >>
> > > > > > > >>> > >> Best,
> > > > > > > >>> > >> Yangze Guo
> > > > > > > >>> > >>
> > > > > > > >>> > >>
> > > > > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > > > >>> > >> wrote:
> > > > > > > >>> > >>> Thanks for the feedback, Till.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> ## I feel that what you proposed (operator-based +
> > > default
> > > > > > > >>> > value) might
> > > > > > > >>> > >> be
> > > > > > > >>> > >>> subsumed by the SSG-based approach.
> > > > > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4
> > > cases,
> > > > > > > >>> > categorized by
> > > > > > > >>> > >>> whether the resource requirements are known to the
> > users.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > > > > >>> > reason to put
> > > > > > > >>> > >>> multiple operators whose individual resource
> > > > > > requirements
> > > > > > > >>> > are already
> > > > > > > >>> > >> known
> > > > > > > >>> > >>> into the same group in fine-grained resource
> > > > > > management.
> > > > > > > >>> > And if op_1
> > > > > > > >>> > >> and
> > > > > > > >>> > >>> op_2 are in different groups, there should be no
> > > > > > problem
> > > > > > > >>> > switching
> > > > > > > >>> > >> data
> > > > > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > > > > >>> > equivalent to
> > > > > > > >>> > >> specifying
> > > > > > > >>> > >>> operator resource requirements in your proposal.
> > > > > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > > > > that
> > > > > > > >>> > op_2 is in a
> > > > > > > >>> > >>> SSG whose resource is not specified thus would have
> the
> > > > > > > >>> > default slot
> > > > > > > >>> > >>> resource. This is equivalent to having default
> operator
> > > > > > > >>> > resources in
> > > > > > > >>> > >> your
> > > > > > > >>> > >>> proposal.
> > > > > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > > > > op_2
> > > > > > > >>> > to the same
> > > > > > > >>> > >> SSG
> > > > > > > >>> > >>> or separate SSGs.
> > > > > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > > > > >>> > equivalent to
> > > > > > > >>> > >> the
> > > > > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > > > > op_2
> > > > > > > >>> > share a
> > > > > > > >>> > >> default
> > > > > > > >>> > >>> size slot no matter which data exchange mode is
> > > > > > used.
> > > > > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > > > > of
> > > > > > > >>> > them will
> > > > > > > >>> > >> use
> > > > > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > > > > them
> > > > > > > >>> > with
> > > > > > > >>> > >> default
> > > > > > > >>> > >>> operator resources in your proposal.
> > > > > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and
> op_2
> > > > > > > is
> > > > > > > >>> > known.*
> > > > > > > >>> > >>> - It is possible that the user learns the total /
> > > > > > max
> > > > > > > >>> > resource
> > > > > > > >>> > >>> requirement from executing and monitoring the job,
> > > > > > > >>> > while not
> > > > > > > >>> > >>> being aware of
> > > > > > > >>> > >>> individual operator requirements.
> > > > > > > >>> > >>> - I believe this is the case your proposal does not
> > > > > > > >>> > cover. And TBH,
> > > > > > > >>> > >>> this is probably how most users learn the resource
> > > > > > > >>> > requirements,
> > > > > > > >>> > >>> according
> > > > > > > >>> > >>> to my experiences.
> > > > > > > >>> > >>> - In this case, the user might need to specify
> > > > > > > >>> > different resources
> > > > > > > >>> > >> if
> > > > > > > >>> > >>> he wants to switch the execution mode, which should
> > > > > > > not
> > > > > > > >>> > be worse
> > > > > > > >>> > >> than not
> > > > > > > >>> > >>> being able to use fine-grained resource management.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > > > > >>> > >>> We may provide multiple options for deciding
> resources
> > > for
> > > > > > > >>> > SSGs whose
> > > > > > > >>> > >>> requirement is not specified, if needed.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> - Default slot resource (current design)
> > > > > > > >>> > >>> - Default operator resource times number of operators
> > > > > > > >>> > (equivalent to
> > > > > > > >>> > >>> your proposal)
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> ## Exposing internal runtime strategies
> > > > > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > > > > >>> > requirements might be
> > > > > > > >>> > >>> affected if how SSGs are internally handled changes
> in
> > > > > > > future.
> > > > > > > >>> > >> Practically,
> > > > > > > >>> > >>> I do not concretely see at the moment what kind of
> > > changes
> > > > > > we
> > > > > > > >>> > may want in
> > > > > > > >>> > >>> future that might conflict with this FLIP proposal,
> as
> > > the
> > > > > > > >>> > question of
> > > > > > > >>> > >>> switching data exchange mode answered above. I'd
> > suggest
> > > to
> > > > > > > >>> > not give up
> > > > > > > >>> > >> the
> > > > > > > >>> > >>> user friendliness we may gain now for the future
> > problems
> > > > > > > that
> > > > > > > >>> > may or may
> > > > > > > >>> > >>> not exist.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> Moreover, the SSG-based approach has the flexibility
> to
> > > > > > > >>> > achieve the
> > > > > > > >>> > >>> equivalent behavior as the operator-based approach,
> if
> > we
> > > > > > > set each
> > > > > > > >>> > >> operator
> > > > > > > >>> > >>> (or task) to a separate SSG. We can even provide a
> > > shortcut
> > > > > > > >>> > option to
> > > > > > > >>> > >>> automatically do that for users, if needed.
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> Thank you~
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> Xintong Song
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>>
> > > > > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > > > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > > > > > >>> > >> wrote:
> > > > > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> I agree that being able to define the resource
> > > > > > requirements
> > > > > > > for a
> > > > > > > >>> > >> group of
> > > > > > > >>> > >>>> operators is more user friendly. However, my concern
> > is
> > > > > > that
> > > > > > > >>> > we are
> > > > > > > >>> > >>>> exposing thereby internal runtime strategies which
> > might
> > > > > > > >>> > limit our
> > > > > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > > > > semantics
> > > > > > > of
> > > > > > > >>> > >> configuring
> > > > > > > >>> > >>>> resource requirements for SSGs could break if
> > switching
> > > > > > from
> > > > > > > >>> > streaming
> > > > > > > >>> > >> to
> > > > > > > >>> > >>>> batch execution. If one defines the resource
> > > requirements
> > > > > > > for
> > > > > > > >>> > op_1 ->
> > > > > > > >>> > >> op_2
> > > > > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > > > > >>> > execution, then
> > > > > > > >>> > >> how do
> > > > > > > >>> > >>>> we interpret these requirements when op_1 -> op_2
> are
> > > > > > > >>> > executed with a
> > > > > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > > > > Consequently,
> > > > > > > >>> > I am
> > > > > > > >>> > >> still
> > > > > > > >>> > >>>> leaning towards Stephan's proposal to set the
> resource
> > > > > > > >>> > requirements per
> > > > > > > >>> > >>>> operator.
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > > > > easier:
> > > > > > > >>> > If the
> > > > > > > >>> > >> user
> > > > > > > >>> > >>>> wants to use fine-grained resource requirements,
> then
> > > she
> > > > > > > >>> > needs to
> > > > > > > >>> > >> specify
> > > > > > > >>> > >>>> the default size which is used for operators which
> > have
> > > no
> > > > > > > >>> > explicit
> > > > > > > >>> > >>>> resource annotation. If this holds true, then every
> > > > > > operator
> > > > > > > >>> > would
> > > > > > > >>> > >> have a
> > > > > > > >>> > >>>> resource requirement and the system can try to
> execute
> > > the
> > > > > > > >>> > operators
> > > > > > > >>> > >> in the
> > > > > > > >>> > >>>> best possible manner w/o being constrained by how
> the
> > > user
> > > > > > > >>> > set the SSG
> > > > > > > >>> > >>>> requirements.
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> Cheers,
> > > > > > > >>> > >>>> Till
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > > > >>> > >>>> wrote:
> > > > > > > >>> > >>>>
> > > > > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Actually, your proposal has also come to my mind at
> > > some
> > > > > > > >>> > point. And I
> > > > > > > >>> > >>>> have
> > > > > > > >>> > >>>>> some concerns about it.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> 1. It does not give users the same control as the
> > > > > > SSG-based
> > > > > > > >>> > approach.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> While both approaches do not require specifying for
> > > each
> > > > > > > >>> > operator,
> > > > > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > > > > operators
> > > > > > > >>> > >> together
> > > > > > > >>> > >>>> use
> > > > > > > >>> > >>>>> this much resource" while the operator-based
> approach
> > > > > > > doesn't.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Think of a long pipeline with m operators (o_1,
> o_2,
> > > ...,
> > > > > > > >>> > o_m), and
> > > > > > > >>> > >> at
> > > > > > > >>> > >>>> some
> > > > > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which
> > > significantly
> > > > > > > >>> > reduces the
> > > > > > > >>> > >> data
> > > > > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups
> > > SSG_1
> > > > > > > >>> > (o_1, ...,
> > > > > > > >>> > >> o_n)
> > > > > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring
> much
> > > > > > higher
> > > > > > > >>> > >> parallelisms
> > > > > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2
> > > won't
> > > > > > > >>> > lead to too
> > > > > > > >>> > >> much
> > > > > > > >>> > >>>>> wasting of resources. If the two SSGs end up
> needing
> > > > > > > different
> > > > > > > >>> > >> resources,
> > > > > > > >>> > >>>>> with the SSG-based approach one can directly
> specify
> > > > > > > >>> > resources for
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>> two
> > > > > > > >>> > >>>>> groups. However, with the operator-based approach,
> > the
> > > > > > > user will
> > > > > > > >>> > >> have to
> > > > > > > >>> > >>>>> specify resources for each operator in one of the
> two
> > > > > > > >>> > groups, and
> > > > > > > >>> > >> tune
> > > > > > > >>> > >>>> the
> > > > > > > >>> > >>>>> default slot resource via configurations to fit the
> > > other
> > > > > > > group.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> 2. It increases the chance of breaking operator
> > chains.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Setting chainnable operators into different slot
> > > sharing
> > > > > > > >>> > groups will
> > > > > > > >>> > >>>>> prevent them from being chained. In the current
> > > > > > > implementation,
> > > > > > > >>> > >>>> downstream
> > > > > > > >>> > >>>>> operators, if SSG not explicitly specified, will be
> > set
> > > > > > to
> > > > > > > >>> > the same
> > > > > > > >>> > >> group
> > > > > > > >>> > >>>>> as the chainable upstream operators (unless
> multiple
> > > > > > > upstream
> > > > > > > >>> > >> operators
> > > > > > > >>> > >>>> in
> > > > > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > > > > chains.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3
> ->
> > > o_3,
> > > > > > > >>> > deciding
> > > > > > > >>> > >> SSGs
> > > > > > > >>> > >>>>> based on whether resource is specified we will
> easily
> > > get
> > > > > > > >>> > groups like
> > > > > > > >>> > >>>> (o_1,
> > > > > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can
> be
> > > > > > > >>> > chained. This
> > > > > > > >>> > >> is
> > > > > > > >>> > >>>> also
> > > > > > > >>> > >>>>> possible for the SSG-based approach, but I believe
> > the
> > > > > > > >>> > chance is much
> > > > > > > >>> > >>>>> smaller because there's no strong reason for users
> to
> > > > > > > >>> > specify the
> > > > > > > >>> > >> groups
> > > > > > > >>> > >>>>> with alternate operators like that. We are more
> > likely
> > > to
> > > > > > > >>> > get groups
> > > > > > > >>> > >> like
> > > > > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks
> only
> > > > > > > between
> > > > > > > >>> > o_2 and
> > > > > > > >>> > >> o_3.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> 3. It complicates the system by having two
> different
> > > > > > > >>> > mechanisms for
> > > > > > > >>> > >>>> sharing
> > > > > > > >>> > >>>>> managed memory in a slot.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > > > > memory
> > > > > > > >>> > sharing
> > > > > > > >>> > >>>>> mechanism, where managed memory is first
> distributed
> > > > > > > >>> > according to the
> > > > > > > >>> > >>>>> consumer type, then further distributed across
> > > operators
> > > > > > > of that
> > > > > > > >>> > >> consumer
> > > > > > > >>> > >>>>> type.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> - With the operator-based approach, managed memory
> > size
> > > > > > > >>> > specified
> > > > > > > >>> > >> for an
> > > > > > > >>> > >>>>> operator should account for all the consumer types
> of
> > > > > > that
> > > > > > > >>> > operator.
> > > > > > > >>> > >> That
> > > > > > > >>> > >>>>> means the managed memory is first distributed
> across
> > > > > > > >>> > operators, then
> > > > > > > >>> > >>>>> distributed to different consumer types of each
> > > operator.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Unfortunately, the different order of the two
> > > calculation
> > > > > > > >>> > steps can
> > > > > > > >>> > >> lead
> > > > > > > >>> > >>>> to
> > > > > > > >>> > >>>>> different results. To be specific, the semantic of
> > the
> > > > > > > >>> > configuration
> > > > > > > >>> > >>>> option
> > > > > > > >>> > >>>>> `consumer-weights` changed (within a slot vs.
> within
> > an
> > > > > > > >>> > operator).
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> To sum up things:
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> While (3) might be a bit more implementation
> related,
> > I
> > > > > > > >>> > think (1)
> > > > > > > >>> > >> and (2)
> > > > > > > >>> > >>>>> somehow suggest that, the price for the proposed
> > > approach
> > > > > > > to
> > > > > > > >>> > avoid
> > > > > > > >>> > >>>>> specifying resource for every operator is that it's
> > not
> > > > > > as
> > > > > > > >>> > >> independent
> > > > > > > >>> > >>>> from
> > > > > > > >>> > >>>>> operator chaining and slot sharing as the
> > > operator-based
> > > > > > > >>> > approach
> > > > > > > >>> > >>>> discussed
> > > > > > > >>> > >>>>> in the FLIP.
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Thank you~
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> Xintong Song
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>>
> > > > > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > > > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > > > > > >>> > >> wrote:
> > > > > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> I want to say, first of all, that this is super
> well
> > > > > > > >>> > written. And
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > > > > >>> > configuration to
> > > > > > > >>> > >>>> users
> > > > > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > > > > >>> > >>>>>> So good job here!
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> About how to let users specify the resource
> > profiles.
> > > > > > If I
> > > > > > > >>> > can sum
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>> FLIP
> > > > > > > >>> > >>>>>> and previous discussion up in my own words, the
> > > problem
> > > > > > > is the
> > > > > > > >>> > >>>> following:
> > > > > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > > > > cleanest
> > > > > > > >>> > approach,
> > > > > > > >>> > >>>>> because
> > > > > > > >>> > >>>>>>> it avoids mixing operator configuration
> (resource)
> > > and
> > > > > > > >>> > >> scheduling. No
> > > > > > > >>> > >>>>>>> matter what other parameters change (chaining,
> slot
> > > > > > > sharing,
> > > > > > > >>> > >>>> switching
> > > > > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource
> > > profiles
> > > > > > > >>> > stay the
> > > > > > > >>> > >>>> same.
> > > > > > > >>> > >>>>>>> But it would require that a user specifies
> > resources
> > > on
> > > > > > > all
> > > > > > > >>> > >>>> operators,
> > > > > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > > > > suggests
> > > > > > > going
> > > > > > > >>> > >> with
> > > > > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> I think both thoughts are important, so can we
> find
> > a
> > > > > > > solution
> > > > > > > >>> > >> where
> > > > > > > >>> > >>>> the
> > > > > > > >>> > >>>>>> Resource Profiles are specified on an Operator,
> but
> > we
> > > > > > > >>> > still avoid
> > > > > > > >>> > >> that
> > > > > > > >>> > >>>>> we
> > > > > > > >>> > >>>>>> need to specify a resource profile on every
> > operator?
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> What do you think about something like the
> > following:
> > > > > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > > > > level.
> > > > > > > >>> > >>>>>> - Not all operators need profiles
> > > > > > > >>> > >>>>>> - All Operators without a Resource Profile ended
> up
> > > > > > in
> > > > > > > the
> > > > > > > >>> > >> default
> > > > > > > >>> > >>>> slot
> > > > > > > >>> > >>>>>> sharing group with a default profile (will get a
> > > default
> > > > > > > slot).
> > > > > > > >>> > >>>>>> - All Operators with a Resource Profile will go
> into
> > > > > > > >>> > another slot
> > > > > > > >>> > >>>>> sharing
> > > > > > > >>> > >>>>>> group (the resource-specified-group).
> > > > > > > >>> > >>>>>> - Users can define different slot sharing groups
> for
> > > > > > > >>> > operators
> > > > > > > >>> > >> like
> > > > > > > >>> > >>>>> they
> > > > > > > >>> > >>>>>> do now, with the exception that you cannot mix
> > > operators
> > > > > > > >>> > that have
> > > > > > > >>> > >> a
> > > > > > > >>> > >>>>>> resource profile and operators that have no
> resource
> > > > > > > profile.
> > > > > > > >>> > >>>>>> - The default case where no operator has a
> resource
> > > > > > > >>> > profile is
> > > > > > > >>> > >> just a
> > > > > > > >>> > >>>>>> special case of this model
> > > > > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > > > > operator,
> > > > > > > >>> > like it
> > > > > > > >>> > >> does
> > > > > > > >>> > >>>>> now,
> > > > > > > >>> > >>>>>> and the scheduler sums up the profiles of the
> tasks
> > > that
> > > > > > > it
> > > > > > > >>> > >> schedules
> > > > > > > >>> > >>>>>> together.
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> There is another question about reactive scaling
> > > raised
> > > > > > > in the
> > > > > > > >>> > >> FLIP. I
> > > > > > > >>> > >>>>> need
> > > > > > > >>> > >>>>>> to think a bit about that. That is indeed a bit
> more
> > > > > > > tricky
> > > > > > > >>> > once we
> > > > > > > >>> > >>>> have
> > > > > > > >>> > >>>>>> slots of different sizes.
> > > > > > > >>> > >>>>>> It is not clear then which of the different slot
> > > > > > requests
> > > > > > > the
> > > > > > > >>> > >>>>>> ResourceManager should fulfill when new resources
> > > (TMs)
> > > > > > > >>> > show up,
> > > > > > > >>> > >> or how
> > > > > > > >>> > >>>>> the
> > > > > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > > > > resources
> > > > > > > >>> > (TMs)
> > > > > > > >>> > >>>>> disappear
> > > > > > > >>> > >>>>>> This question is pretty orthogonal, though, to the
> > > "how
> > > > > > to
> > > > > > > >>> > specify
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>>> resources".
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> Best,
> > > > > > > >>> > >>>>>> Stephan
> > > > > > > >>> > >>>>>>
> > > > > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > > > > > >>> > >>>>> wrote:
> > > > > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > > > > discussion,
> > > > > > > >>> > Yangze.
> > > > > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> @Till,
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> I agree that specifying requirements for SSGs
> means
> > > > > > that
> > > > > > > SSGs
> > > > > > > >>> > >> need to
> > > > > > > >>> > >>>>> be
> > > > > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > > > > otherwise
> > > > > > > each
> > > > > > > >>> > >>>> operator
> > > > > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > > > > However,
> > > > > > > I
> > > > > > > >>> > cannot
> > > > > > > >>> > >>>> think
> > > > > > > >>> > >>>>>> of
> > > > > > > >>> > >>>>>>> a strong reason for not supporting SSGs in
> > > fine-grained
> > > > > > > >>> > resource
> > > > > > > >>> > >>>>>>> management.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>>> Interestingly, if all operators have their
> > resources
> > > > > > > properly
> > > > > > > >>> > >>>>>> specified,
> > > > > > > >>> > >>>>>>>> then slot sharing is no longer needed because
> > Flink
> > > > > > > could
> > > > > > > >>> > >> slice off
> > > > > > > >>> > >>>>> the
> > > > > > > >>> > >>>>>>>> appropriately sized slots for every Task
> > > individually.
> > > > > > > >>> > >>>>>>>>
> > > > > > > >>> > >>>>>>> So for example, if we have a job consisting of
> two
> > > > > > > >>> > operator op_1
> > > > > > > >>> > >> and
> > > > > > > >>> > >>>>> op_2
> > > > > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would
> > then
> > > > > > say
> > > > > > > that
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>> slot
> > > > > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If
> we
> > > > > > have
> > > > > > > a
> > > > > > > >>> > >> cluster
> > > > > > > >>> > >>>>> with
> > > > > > > >>> > >>>>>> 2
> > > > > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the
> system
> > > > > > > cannot run
> > > > > > > >>> > >> this
> > > > > > > >>> > >>>>> job.
> > > > > > > >>> > >>>>>> If
> > > > > > > >>> > >>>>>>>> the resources were specified on an operator
> level,
> > > > > > then
> > > > > > > the
> > > > > > > >>> > >> system
> > > > > > > >>> > >>>>>> could
> > > > > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1
> and
> > > > > > op_2
> > > > > > > to
> > > > > > > >>> > >> TM_2.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Couldn't agree more that if all operators'
> > > requirements
> > > > > > > are
> > > > > > > >>> > >> properly
> > > > > > > >>> > >>>>>>> specified, slot sharing should be no longer
> needed.
> > I
> > > > > > > >>> > think this
> > > > > > > >>> > >>>>> exactly
> > > > > > > >>> > >>>>>>> disproves the example. If we already know op_1
> and
> > > op_2
> > > > > > > each
> > > > > > > >>> > >> needs
> > > > > > > >>> > >>>> 100
> > > > > > > >>> > >>>>> MB
> > > > > > > >>> > >>>>>>> of memory, why would we put them in the same
> group?
> > > If
> > > > > > > >>> > they are
> > > > > > > >>> > >> in
> > > > > > > >>> > >>>>>> separate
> > > > > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > > > > freely
> > > > > > > >>> > deploy
> > > > > > > >>> > >> them
> > > > > > > >>> > >>>> to
> > > > > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Moreover, the precondition for not needing slot
> > > sharing
> > > > > > > is
> > > > > > > >>> > having
> > > > > > > >>> > >>>>>> resource
> > > > > > > >>> > >>>>>>> requirements properly specified for all
> operators.
> > > This
> > > > > > > is not
> > > > > > > >>> > >> always
> > > > > > > >>> > >>>>>>> possible, and usually requires tremendous
> efforts.
> > > One
> > > > > > > of the
> > > > > > > >>> > >>>> benefits
> > > > > > > >>> > >>>>>> for
> > > > > > > >>> > >>>>>>> SSG-based requirements is that it allows the user
> > to
> > > > > > > freely
> > > > > > > >>> > >> decide
> > > > > > > >>> > >>>> the
> > > > > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I
> would
> > > > > > > >>> > consider SSG
> > > > > > > >>> > >> in
> > > > > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > > > > operators
> > > > > > > >>> > that the
> > > > > > > >>> > >>>> user
> > > > > > > >>> > >>>>>>> would like to specify the total resource for.
> There
> > > can
> > > > > > > be
> > > > > > > >>> > only
> > > > > > > >>> > >> one
> > > > > > > >>> > >>>>> group
> > > > > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a
> few
> > > > > > major
> > > > > > > >>> > parts,
> > > > > > > >>> > >> or as
> > > > > > > >>> > >>>>>> many
> > > > > > > >>> > >>>>>>> groups as the number of tasks/operators,
> depending
> > on
> > > > > > how
> > > > > > > >>> > >>>> fine-grained
> > > > > > > >>> > >>>>>> the
> > > > > > > >>> > >>>>>>> user is able to specify the resources.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But
> > > given
> > > > > > > >>> > that all
> > > > > > > >>> > >> the
> > > > > > > >>> > >>>>>>> current scheduler implementations already support
> > > > > > SSGs, I
> > > > > > > >>> > tend to
> > > > > > > >>> > >>>> think
> > > > > > > >>> > >>>>>>> that as an acceptable price for the above
> discussed
> > > > > > > >>> > usability and
> > > > > > > >>> > >>>>>>> flexibility.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> @Chesnay
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> Will declaring them on slot sharing groups not
> also
> > > > > > waste
> > > > > > > >>> > >> resources
> > > > > > > >>> > >>>> if
> > > > > > > >>> > >>>>>> the
> > > > > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > > > > different?
> > > > > > > >>> > >>>>>>>>
> > > > > > > >>> > >>>>>>> Yes. It's a trade-off between usability and
> > resource
> > > > > > > >>> > >> utilization. To
> > > > > > > >>> > >>>>>> avoid
> > > > > > > >>> > >>>>>>> such wasting, the user can define more groups, so
> > > that
> > > > > > > >>> > each group
> > > > > > > >>> > >>>>>> contains
> > > > > > > >>> > >>>>>>> less operators and the chance of having operators
> > > with
> > > > > > > >>> > different
> > > > > > > >>> > >>>>>>> parallelism will be reduced. The price is to have
> > > more
> > > > > > > >>> > resource
> > > > > > > >>> > >>>>>>> requirements to specify.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> It also seems like quite a hassle for users
> having
> > to
> > > > > > > >>> > >> recalculate the
> > > > > > > >>> > >>>>>>>> resource requirements if they change the slot
> > > sharing.
> > > > > > > >>> > >>>>>>>> I'd think that it's not really workable for
> users
> > > that
> > > > > > > create
> > > > > > > >>> > >> a set
> > > > > > > >>> > >>>>> of
> > > > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched
> in
> > > > > > their
> > > > > > > >>> > >>>>> applications;
> > > > > > > >>> > >>>>>>>> managing the resources requirements in such a
> > > setting
> > > > > > > >>> > would be
> > > > > > > >>> > >> a
> > > > > > > >>> > >>>>>>>> nightmare, and in the end would require
> > > operator-level
> > > > > > > >>> > >> requirements
> > > > > > > >>> > >>>>> any
> > > > > > > >>> > >>>>>>>> way.
> > > > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it
> really
> > > > > > > increases
> > > > > > > >>> > >>>>> usability.
> > > > > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > > > > there's no
> > > > > > > >>> > >> reason to
> > > > > > > >>> > >>>>> put
> > > > > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > > > > >>> > requirements are
> > > > > > > >>> > >>>>> already
> > > > > > > >>> > >>>>>>> known
> > > > > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > > > > management.
> > > > > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > > > > multiple
> > > > > > > >>> > >>>>> applications,
> > > > > > > >>> > >>>>>>> it does not guarantee the same resource
> > > > > > requirements.
> > > > > > > >>> > During
> > > > > > > >>> > >> our
> > > > > > > >>> > >>>>> years
> > > > > > > >>> > >>>>>>> of
> > > > > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > > > > requirements
> > > > > > > >>> > >> specified for
> > > > > > > >>> > >>>>>>> Blink's
> > > > > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > > > > >>> > (including
> > > > > > > >>> > >> our
> > > > > > > >>> > >>>>>>> specialists
> > > > > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are
> as
> > > > > > > >>> > >> experienced as
> > > > > > > >>> > >>>>> to
> > > > > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > > > > >>> > >> requirements.
> > > > > > > >>> > >>>> Most
> > > > > > > >>> > >>>>>>> people
> > > > > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > > > > delay, cpu
> > > > > > > >>> > >> load,
> > > > > > > >>> > >>>>>> memory
> > > > > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > > > > specification.
> > > > > > > >>> > >>>>>>>
> > > > > > > >>> > >>>>>>> To sum up:
> > > > > > > >>> > >>>>>>> If the user is capable of providing proper
> resource
> > > > > > > >>> > requirements
> > > > > > > >>> > >> for
> > > > > > > >>> > >>>>>> every
> > > > > > > >>> > >>>>>>> operator, that's definitely a good thing and we
> > would
> > > > > > not
> > > > > > > >>> > need to
> > > > > > > >>> > >>>> rely
> > > > > > > >>> > >>>>> on
> > > > > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for
> > the
> > > > > > > >>> > >> fine-grained
> > > > > > > >>> > >>>>>> resource
> > > > > > > >>> > >>>>>>> management to work. For those users who are
> capable
> > > and
> > > > > > > do not
> > > > > > > >>> > >> like
> > > > > > > >>> > >>>>>> having
> > > > > > > >>> > >>>>>>> to set each operator to a separate SSG, I would
> be
> > ok
> > > > > > to
> > > > > > > have
> > > > > > > >>> > >> both
> > > > > > > >>> > >>>>>>> SSG-based and operator-based runtime

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

Hi Kezhu,

Maybe let me share some backgrounds first.

   - We at Alibaba have been using fine-grained resource management for
   many years, with Blink (an internal version of Flink).
   - We have been trying to contribute this feature to Apache Flink since
   many years ago. However, we haven't succeeded, due to various reasons.
      - Back to years ago, I believe there were not many users that used
      Flink in production at a very large scale, thus less demand for
the feature.
      - The feature on Blink is quite specific to our internal use cases
      and scenarios. We have not made it general enough to cover the
community's
      common use cases.
      - Divergences between Flink & Blink code bases.
   - Blink used operator-level resource interfaces. According to our years
   of production experiences, we believe that specifying operator-level
   resources are neither necessary nor easy-to-use. This is why we propose
   group-level interfaces.

Back to your questions.

I saw the dicussion to keep slot sharing as an hint, but in reality, will
> SSG jobs expect to fail or
> run slowly if scheduler does not respect it ? A slot with 20GB memory is
> different from two 1GB
> default sized slots. So, we actually depends on scheduler
> version/implementation/de-fact if we
> claim it is an hint.
>

SSG-based resource requirements are considered hints because the SSG itself
is a hint. There's no guarantee that operators of a SSG will always be
scheduled together. I think you have a good point that, if SSGs are not
respected, is it prefered to fail the job or to interpret the resource of
an actual slot. It's possible that we provide a configuration option and
leave that decision to the users. However, that is a design choice we need
to make when there's indeed a need for not respecting the SSGs.

Do you mean code-path or production environment ? If it is code-path, could
> you please point out where
> the story breaks ?
>
> From the dicussion and history, could I consider FLIP-156 is an redirection
> more than inheritance/enhancement
> of current halfly-cooked/ancient implmentation ?
>

If you try to set the operator resources, you would find that it won't work
at the moment. There are several things not ready.

   - Interfaces for setting operator resources are never really exposed to
   users.
   - The resource manager never allocates slots with the requested
   resources.
   - Managed memory size specified for operators will not be respected,
   because managed memory is shared within a slot with a different approach.

While the first 2 points are more related to that the feature is not yet
ready, the last point is closely related to the specifying operator level
resources.

To sum up, we do not want to support specifying operator level in the first
step, for the following reasons.

   - It's not likely needed, due to poor usability compared to the
   SSG-based approach.
   - It introduces the complexity to deal with the managed memory sharing.
   - It introduces the complexity to deal with combining resource
   requirements from two different levels.


Thank you~

Xintong Song



On Wed, Feb 3, 2021 at 7:50 PM Kezhu Wang <ke...@gmail.com> wrote:

> Hi Till,
>
> Based on what I understood, if not wrong, the door is not closed after SSG
> resource specifying. So, hope it could be useful in potential future
> improvement.
>
> Best,
> Kezhu Wang
>
>
> On February 3, 2021 at 18:07:21, Till Rohrmann (trohrmann@apache.org)
> wrote:
>
> Thanks for sharing your thoughts Kezhu. I like your ideas of how
> per-operator and SSG requirements can be combined. I've also thought about
> defining a default resource profile for all tasks which have no resources
> configured. That way all operators would have resources assigned if the
> user chooses to use this feature.
>
> As Yangze and Xintong have said, we have decided to first only support
> specifying resources for SSGs as this seems more user friendly. Based on
> the feedback for this feature one potential development direction might be
> to allow the resource specification on per-operator basis. Here we could
> pick up your ideas.
>
> Cheers,
> Till
>
> On Wed, Feb 3, 2021 at 7:31 AM Xintong Song <to...@gmail.com> wrote:
>
> > Thanks for your feedback, Kezhu.
> >
> > I think Flink *runtime* already has an ideal granularity for resource
> > > management 'task'. If there is
> > > a slot shared by multiple tasks, that slot's resource requirement is
> > simple
> > > sum of all its logical
> > > slots. So basically, this is no resource requirement for
> SlotSharingGroup
> > > in runtime until now,
> > > right ?
> >
> > That is a halfly-cooked implementation, coming from the previous attempts
> > (years ago) trying to deliver the fine-grained resource management
> feature,
> > and never really put into use.
> >
> > From the FLIP and dicusssion, I assume that SSG resource specifying will
> > > override operator level
> > > resource specifying if both are specified ?
> > >
> > Actually, I think we should use the finer-grained resources (i.e.
> operator
> > level) if both are specified. And more importantly, that is based on the
> > assumption that we do need two different levels of interfaces.
> >
> > So, I wonder whether we could interpret SSG resource specifying as an
> "add"
> > > but not an "set" on
> > > resource requirement ?
> > >
> > IIUC, this is the core idea behind your proposal. I think it provides an
> > interesting idea of how we combine operator level and SSG level
> resources,
> > *if
> > we allow configuring resources at both levels*. However, I'm not sure
> > whether the configuring resources on the operator level is indeed needed.
> > Therefore, as a first step, this FLIP proposes to only introduce the
> > SSG-level interfaces. As listed in the future plan, we would consider
> > allowing operator level resource configuration later if we do see a need
> > for it. At that time, we definitely should discuss what to do if
> resources
> > are configured at both levels.
> >
> > * Could SSG express negative resource requirement ?
> > >
> > No.
> >
> > Is there concrete bar for partial resource configured not function ? I
> > > saw it will fail job submission in Dispatcher.submitJob.
> > >
> > With the SSG-based approach, this should no longer be needed. The
> > constraint was introduced because we can neither properly define what is
> > the resource of a task chained from an operator with specified resource
> and
> > another with unspecified resource, nor for a slot shared by a task with
> > specified resource and another with unspecified resource. With the
> > SSG-based approach, we no longer have those problems.
> >
> > An option(cluster/job level) to force slot sharing in scheduler ? This
> > > could be useful in case of migration from FLIP-156 to future approach.
> > >
> > I think this is exactly what we are trying to avoid, requiring the
> > scheduler to enforce slot sharing.
> >
> > An option(cluster) to ignore resource specifying(allow resource specified
> > > job to run on open box environment) for no production usage ?
> > >
> > That's possible. Actually, we are planning to introduce an option for
> > activating the fine-grained resource management, for development
> purposes.
> > We might consider to keep that option after the feature is completed, to
> > allow disable the feature without having to touch the job codes.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <ke...@gmail.com> wrote:
> >
> > > Hi all, sorry for join discussion even after voting started.
> > >
> > > I want to share my thoughts on this after reading above discussions.
> > >
> > > I think Flink *runtime* already has an ideal granularity for resource
> > > management 'task'. If there is
> > > a slot shared by multiple tasks, that slot's resource requirement is
> > simple
> > > sum of all its logical
> > > slots. So basically, this is no resource requirement for
> SlotSharingGroup
> > > in runtime until now,
> > > right ?
> > >
> > > As in discussion, we already agree upon that: "If all operators have
> > their
> > > resources properly
> > > specified, then slot sharing is no longer needed. "
> > >
> > > So seems to me, naturally in mind path, what we would discuss is that:
> > how
> > > to bridge impractical
> > > operator level resource specifying to runtime task level resource
> > > requirement ? This is actually a
> > > pure api thing as Chesnay has pointed out.
> > >
> > > But FLIP-156 brings another direction on table: how about using SSG for
> > > both api and runtime
> > > resource specifying ?
> > >
> > > From the FLIP and dicusssion, I assume that SSG resource specifying
> will
> > > override operator level
> > > resource specifying if both are specified ?
> > >
> > > So, I wonder whether we could interpret SSG resource specifying as an
> > "add"
> > > but not an "set" on
> > > resource requirement ?
> > >
> > > The semantics is that SSG resource specifying adds additional resource
> to
> > > shared slot to express
> > > concerns on possible high thoughput and resource requirement for tasks
> in
> > > one physical slot.
> > >
> > > The result is that if scheduler indeed respect slot sharing, allocated
> > slot
> > > will gain extra resource
> > > specified for that SSG.
> > >
> > > I think one of coding barrier from "add" approach is
> ResourceSpec.UNKNOWN
> > > which didn't support
> > > 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> > > executor should be aware of
> > > this.
> > >
> > > @Chesnay
> > > > My main worry is that it if we wire the runtime to work on SSGs it's
> > > > gonna be difficult to implement more fine-grained approaches, which
> > > > would not be the case if, for the runtime, they are always defined on
> > an
> > > > operator-level.
> > >
> > > An "add" operation should be less invasive and enforce low barrier for
> > > future find-grained
> > > approaches.
> > >
> > > @Stephan
> > > > - Users can define different slot sharing groups for operators like
> > > they
> > > > do now, with the exception that you cannot mix operators that have a
> > > > resource profile and operators that have no resource profile.
> > >
> > > @Till
> > > > This effectively means that all unspecified operators
> > > > will implicitly have a zero resource requirement.
> > > > I am wondering whether this wouldn't lead to a surprising behaviour
> for
> > > the
> > > > user. If the user specifies the resource requirements for a single
> > > > operator, then he probably will assume that the other operators will
> > get
> > > > the default share of resources and not nothing.
> > >
> > > I think it is inherent due to fact that we could not defining
> > > ResourceSpec.ONE, eg. resource
> > > requirement for exact one default slot, with concrete numbers ? I tend
> to
> > > squash out unspecified one
> > > if there are operators in chaining with explicit resource specifying.
> > > Otherwise, the protocol tends
> > > to verbose as say "give me this much resource and a default". I think
> if
> > we
> > > have explict resource
> > > specifying for partial operators, it is just saying "I don't care other
> > > operators that much, just
> > > get them places to run". It is most likely be cases there are stateless
> > > fliter/map or other less
> > > resource consuming operators. If there is indeed a problem, I think
> > clients
> > > can specify a global
> > > default(or other level default in future). In job graph generating
> phase,
> > > we could take that default
> > > into account for unspecified operators.
> > >
> > > @FLIP-156
> > > > Expose operator chaining. (Cons fo task level resource specifying)
> > >
> > > Is it inherent for all group level resource specifying ? They will
> either
> > > break chaining or obey it,
> > > or event could not work with.
> > >
> > > To sum up above, my suggestions are:
> > >
> > > In api side:
> > > * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> > > unspecified).
> > > * Operator: ResourceSpec.ZERO(unspecified) as default.
> > > * Task: sum of requirements from specified operators + global
> default(if
> > > there are any unspecified operators)
> > > * SSG: additional resource to physical slot.
> > >
> > > In runtime side:
> > > * Task: ResourceSpec.Task or ResourceSpec.ZERO
> > > * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
> > >
> > > Physical slot gets sum up resources from logical slots and SSG, if it
> > gets
> > > ResourceSpec.ZERO, it is
> > > just a default sized slot.
> > >
> > > In short, turn SSG resource speciying as "add" and drop
> > > ResourceSpec.UNKNOWN.
> > >
> > >
> > > Questions/Issues:
> > > * Could SSG express negative resource requirement ?
> > > * Is there concrete bar for partial resource configured not function ?
> I
> > > saw it will fail job submission in Dispatcher.submitJob.
> > > * An option(cluster/job level) to force slot sharing in scheduler ?
> This
> > > could be useful in case of migration from FLIP-156 to future approach.
> > > * An option(cluster) to ignore resource specifying(allow resource
> > specified
> > > job to run on open box environment) for no production usage ?
> > >
> > >
> > >
> > > On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com)
> wrote:
> > >
> > > Thanks for reply, Till and Xintong!
> > >
> > > I update the FLIP, including:
> > > - Edit the JavaDoc of the proposed
> > > StreamGraphGenerator#setSlotSharingGroupResource.
> > > - Add "Future Plan" section, which contains the potential follow-up
> > > issues and the limitations to be documented when fine-grained resource
> > > management is exposed to users.
> > >
> > > I'll start a vote in another thread.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org>
> > > wrote:
> > > >
> > > > Thanks for summarizing the discussion, Yangze. I agree that setting
> > > > resource requirements per operator is not very user friendly.
> > Moreover, I
> > > > couldn't come up with a different proposal which would be as easy to
> > use
> > > > and wouldn't expose internal scheduling details. In fact, following
> > this
> > > > argument then we shouldn't have exposed the slot sharing groups in
> the
> > > > first place.
> > > >
> > > > What is important for the user is that we properly document the
> > > limitations
> > > > and constraints the fine grained resource specification has. For
> > example,
> > > > we should explain how optimizations like chaining are affected by it
> > and
> > > > how different execution modes (batch vs. streaming) affect the
> > execution
> > > of
> > > > operators which have specified resources. These things shouldn't
> become
> > > > part of the contract of this feature and are more caused by internal
> > > > implementation details but it will be important to understand these
> > > things
> > > > properly in order to use this feature effectively.
> > > >
> > > > Hence, +1 for starting the vote for this FLIP.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for the summary, Yangze.
> > > > >
> > > > > The changes and follow-up issues LGTM. Let's wait for responses
> from
> > > the
> > > > > others before starting a vote.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > > > summarize the current convergence in the discussion. Please let
> me
> > > > > > know if I got things wrong or missed something crucial here.
> > > > > >
> > > > > > Change of this FLIP:
> > > > > > - Treat the SSG resource requirements as a hint instead of a
> > > > > > restriction for the runtime. That's should be explicitly
> explained
> > in
> > > > > > the JavaDocs.
> > > > > >
> > > > > > Potential follow-up issues if needed:
> > > > > > - Provide operator-level resource configuration interface.
> > > > > > - Provide multiple options for deciding resources for SSGs whose
> > > > > > requirement is not specified:
> > > > > > ** Default slot resource.
> > > > > > ** Default operator resource times number of operators.
> > > > > >
> > > > > > If there are no other issues, I'll update the FLIP accordingly
> and
> > > > > > start a vote thread. Thanks all for the valuable feedback again.
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > > Best,
> > > > > > Yangze Guo
> > > > > >
> > > > > >
> > > > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <
> > tonysong820@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > FGRuntimeInterface.png
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <
> > > tonysong820@gmail.com>
> > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> I think Chesnay's proposal could actually work. IIUC, the
> > keypoint
> > > is
> > > > > > to derive operator requirements from SSG requirements on the API
> > > side, so
> > > > > > that the runtime only deals with operator requirements. It's
> > > debatable
> > > > > how
> > > > > > the deriving should be done though. E.g., an alternative could be
> > to
> > > > > evenly
> > > > > > divide the SSG requirement into requirements of operators in the
> > > group.
> > > > > > >>
> > > > > > >>
> > > > > > >> However, I'm not entirely sure which option is more desired.
> > > > > > Illustrating my understanding in the following figure, in which
> on
> > > the
> > > > > top
> > > > > > is Chesnay's proposal and on the bottom is the SSG-based proposal
> > in
> > > this
> > > > > > FLIP.
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> I think the major difference between the two approaches is
> where
> > > > > > deriving operator requirements from SSG requirements happens.
> > > > > > >>
> > > > > > >> - Chesnay's proposal simplifies the runtime logic and the
> > > interface to
> > > > > > expose, at the price of moving more complexity (i.e. the
> deriving)
> > to
> > > the
> > > > > > API side. The question is, where do we prefer to keep the
> > complexity?
> > > I'm
> > > > > > slightly leaning towards having a thin API and keep the
> complexity
> > in
> > > > > > runtime if possible.
> > > > > > >>
> > > > > > >> - Notice that the dash line arrows represent optional steps
> that
> > > are
> > > > > > needed only for schedulers that do not respect SSGs, which we
> don't
> > > have
> > > > > at
> > > > > > the moment. If we only look at the solid line arrows, then the
> > > SSG-based
> > > > > > approach is much simpler, without needing to derive and aggregate
> > the
> > > > > > requirements back and forth. I'm not sure about complicating the
> > > current
> > > > > > design only for the potential future needs.
> > > > > > >>
> > > > > > >>
> > > > > > >> Thank you~
> > > > > > >>
> > > > > > >> Xintong Song
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> > > chesnay@apache.org>
> > > > > > wrote:
> > > > > > >>>
> > > > > > >>> You're raising a good point, but I think I can rectify that
> > with
> > > a
> > > > > > minor
> > > > > > >>> adjustment.
> > > > > > >>>
> > > > > > >>> Default requirements are whatever the default requirements
> are,
> > > > > setting
> > > > > > >>> the requirements for one operator has no effect on other
> > > operators.
> > > > > > >>>
> > > > > > >>> With these rules, and some API enhancements, the following
> > mockup
> > > > > would
> > > > > > >>> replicate the SSG-based behavior:
> > > > > > >>>
> > > > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > > > >>> vertices = slotSharingGroup.getVertices()
> > > > > > >>>
> > > > > >
> > > > >
> > >
> >
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > > > >>> }
> > > > > > >>>
> > > > > > >>> We could even allow setting requirements on
> slotsharing-groups
> > > > > > >>> colocation-groups and internally translate them accordingly.
> > > > > > >>> I can't help but feel this is a plain API issue.
> > > > > > >>>
> > > > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > > > >>> > If I understand you correctly Chesnay, then you want to
> > > decouple
> > > > > the
> > > > > > >>> > resource requirement specification from the slot sharing
> > group
> > > > > > >>> > assignment. Hence, per default all operators would be in
> the
> > > same
> > > > > > slot
> > > > > > >>> > sharing group. If there is no operator with a resource
> > > > > specification,
> > > > > > >>> > then the system would allocate a default slot for it. If
> > there
> > > is
> > > > > at
> > > > > > >>> > least one operator, then the system would sum up all the
> > > specified
> > > > > > >>> > resources and allocate a slot of this size. This
> effectively
> > > means
> > > > > > >>> > that all unspecified operators will implicitly have a zero
> > > resource
> > > > > > >>> > requirement. Did I understand your idea correctly?
> > > > > > >>> >
> > > > > > >>> > I am wondering whether this wouldn't lead to a surprising
> > > behaviour
> > > > > > >>> > for the user. If the user specifies the resource
> requirements
> > > for a
> > > > > > >>> > single operator, then he probably will assume that the
> other
> > > > > > operators
> > > > > > >>> > will get the default share of resources and not nothing.
> > > > > > >>> >
> > > > > > >>> > Cheers,
> > > > > > >>> > Till
> > > > > > >>> >
> > > > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > > > chesnay@apache.org
> > > > > > >>> > <ma...@apache.org>> wrote:
> > > > > > >>> >
> > > > > > >>> > Is there even a functional difference between specifying
> the
> > > > > > >>> > requirements for an SSG vs specifying the same requirements
> > on
> > > > > a
> > > > > > >>> > single
> > > > > > >>> > operator within that group (ideally a colocation group to
> > avoid
> > > > > > this
> > > > > > >>> > whole hint business)?
> > > > > > >>> >
> > > > > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > > > > >>> >
> > > > > > >>> > Users can take shortcuts to define shared requirements,
> > > > > > >>> > but refine them further as needed on a per-operator basis,
> > > > > > >>> > without changing semantics of slotsharing groups
> > > > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > > > >>> >
> > > > > > >>> > (And before anyone argues what happens if slotsharing
> groups
> > > > > > >>> > change or
> > > > > > >>> > whatnot, that's a plain API issue that we could surely
> solve.
> > > > > (A
> > > > > > >>> > plain
> > > > > > >>> > iteration over slotsharing groups and therein contained
> > > > > operators
> > > > > > >>> > would
> > > > > > >>> > suffice)).
> > > > > > >>> >
> > > > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > > > >>> > > Maybe a different minor idea: Would it be possible to
> treat
> > > > > > the SSG
> > > > > > >>> > > resource requirements as a hint for the runtime similar
> to
> > > > > how
> > > > > > >>> > slot sharing
> > > > > > >>> > > groups are designed at the moment? Meaning that we don't
> > give
> > > > > > >>> > the guarantee
> > > > > > >>> > > that Flink will always deploy this set of tasks together
> no
> > > > > > >>> > matter what
> > > > > > >>> > > comes. If, for example, the runtime can derive by some
> > means
> > > > > > the
> > > > > > >>> > resource
> > > > > > >>> > > requirements for each task based on the requirements for
> > the
> > > > > > >>> > SSG, this
> > > > > > >>> > > could be possible. One easy strategy would be to give
> every
> > > > > > task
> > > > > > >>> > the same
> > > > > > >>> > > resources as the whole slot sharing group. Another one
> > could
> > > > > be
> > > > > > >>> > > distributing the resources equally among the tasks. This
> > does
> > > > > > >>> > not even have
> > > > > > >>> > > to be implemented but we would give ourselves the freedom
> > to
> > > > > > change
> > > > > > >>> > > scheduling if need should arise.
> > > > > > >>> > >
> > > > > > >>> > > Cheers,
> > > > > > >>> > > Till
> > > > > > >>> > >
> > > > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > > > karmagyz@gmail.com
> > > > > > >>> > <ma...@gmail.com>> wrote:
> > > > > > >>> > >
> > > > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > > > >>> > >>
> > > > > > >>> > >> I second Xintong's comment that SSG-based runtime
> > interface
> > > > > > >>> > will give
> > > > > > >>> > >> us the flexibility to achieve op/task-based approach.
> > That's
> > > > > > one of
> > > > > > >>> > >> the most important reasons for our design choice.
> > > > > > >>> > >>
> > > > > > >>> > >> Some cents regarding the default operator resource:
> > > > > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > > > > >>> > >> ** For light-weight operators, the accumulative
> > > > > > >>> > configuration error
> > > > > > >>> > >> will not be significant. Then, the resource of a task
> used
> > > > > is
> > > > > > >>> > >> proportional to the number of operators it contains.
> > > > > > >>> > >> ** For heavy operators like join and window or operators
> > > > > > >>> > using the
> > > > > > >>> > >> external resources, user will turn to the fine-grained
> > > > > > resource
> > > > > > >>> > >> configuration.
> > > > > > >>> > >> - It can increase the stability for the standalone
> cluster
> > > > > > >>> > where task
> > > > > > >>> > >> executors registered are heterogeneous(with different
> > > > > default
> > > > > > slot
> > > > > > >>> > >> resources).
> > > > > > >>> > >> - It might not be good for SQL users. The operators that
> > SQL
> > > > > > >>> > will be
> > > > > > >>> > >> transferred to is a black box to the user. We also do
> not
> > > > > > guarantee
> > > > > > >>> > >> the cross-version of consistency of the transformation
> so
> > > > > far.
> > > > > > >>> > >>
> > > > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > > > fine-grained
> > > > > > >>> > >> resource management is end-to-end ready.
> > > > > > >>> > >>
> > > > > > >>> > >> Best,
> > > > > > >>> > >> Yangze Guo
> > > > > > >>> > >>
> > > > > > >>> > >>
> > > > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > > >>> > >> wrote:
> > > > > > >>> > >>> Thanks for the feedback, Till.
> > > > > > >>> > >>>
> > > > > > >>> > >>> ## I feel that what you proposed (operator-based +
> > default
> > > > > > >>> > value) might
> > > > > > >>> > >> be
> > > > > > >>> > >>> subsumed by the SSG-based approach.
> > > > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4
> > cases,
> > > > > > >>> > categorized by
> > > > > > >>> > >>> whether the resource requirements are known to the
> users.
> > > > > > >>> > >>>
> > > > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > > > >>> > reason to put
> > > > > > >>> > >>> multiple operators whose individual resource
> > > > > requirements
> > > > > > >>> > are already
> > > > > > >>> > >> known
> > > > > > >>> > >>> into the same group in fine-grained resource
> > > > > management.
> > > > > > >>> > And if op_1
> > > > > > >>> > >> and
> > > > > > >>> > >>> op_2 are in different groups, there should be no
> > > > > problem
> > > > > > >>> > switching
> > > > > > >>> > >> data
> > > > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > > > >>> > equivalent to
> > > > > > >>> > >> specifying
> > > > > > >>> > >>> operator resource requirements in your proposal.
> > > > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > > > that
> > > > > > >>> > op_2 is in a
> > > > > > >>> > >>> SSG whose resource is not specified thus would have the
> > > > > > >>> > default slot
> > > > > > >>> > >>> resource. This is equivalent to having default operator
> > > > > > >>> > resources in
> > > > > > >>> > >> your
> > > > > > >>> > >>> proposal.
> > > > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > > > op_2
> > > > > > >>> > to the same
> > > > > > >>> > >> SSG
> > > > > > >>> > >>> or separate SSGs.
> > > > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > > > >>> > equivalent to
> > > > > > >>> > >> the
> > > > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > > > op_2
> > > > > > >>> > share a
> > > > > > >>> > >> default
> > > > > > >>> > >>> size slot no matter which data exchange mode is
> > > > > used.
> > > > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > > > of
> > > > > > >>> > them will
> > > > > > >>> > >> use
> > > > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > > > them
> > > > > > >>> > with
> > > > > > >>> > >> default
> > > > > > >>> > >>> operator resources in your proposal.
> > > > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > > > > is
> > > > > > >>> > known.*
> > > > > > >>> > >>> - It is possible that the user learns the total /
> > > > > max
> > > > > > >>> > resource
> > > > > > >>> > >>> requirement from executing and monitoring the job,
> > > > > > >>> > while not
> > > > > > >>> > >>> being aware of
> > > > > > >>> > >>> individual operator requirements.
> > > > > > >>> > >>> - I believe this is the case your proposal does not
> > > > > > >>> > cover. And TBH,
> > > > > > >>> > >>> this is probably how most users learn the resource
> > > > > > >>> > requirements,
> > > > > > >>> > >>> according
> > > > > > >>> > >>> to my experiences.
> > > > > > >>> > >>> - In this case, the user might need to specify
> > > > > > >>> > different resources
> > > > > > >>> > >> if
> > > > > > >>> > >>> he wants to switch the execution mode, which should
> > > > > > not
> > > > > > >>> > be worse
> > > > > > >>> > >> than not
> > > > > > >>> > >>> being able to use fine-grained resource management.
> > > > > > >>> > >>>
> > > > > > >>> > >>>
> > > > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > > > >>> > >>> We may provide multiple options for deciding resources
> > for
> > > > > > >>> > SSGs whose
> > > > > > >>> > >>> requirement is not specified, if needed.
> > > > > > >>> > >>>
> > > > > > >>> > >>> - Default slot resource (current design)
> > > > > > >>> > >>> - Default operator resource times number of operators
> > > > > > >>> > (equivalent to
> > > > > > >>> > >>> your proposal)
> > > > > > >>> > >>>
> > > > > > >>> > >>>
> > > > > > >>> > >>> ## Exposing internal runtime strategies
> > > > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > > > >>> > requirements might be
> > > > > > >>> > >>> affected if how SSGs are internally handled changes in
> > > > > > future.
> > > > > > >>> > >> Practically,
> > > > > > >>> > >>> I do not concretely see at the moment what kind of
> > changes
> > > > > we
> > > > > > >>> > may want in
> > > > > > >>> > >>> future that might conflict with this FLIP proposal, as
> > the
> > > > > > >>> > question of
> > > > > > >>> > >>> switching data exchange mode answered above. I'd
> suggest
> > to
> > > > > > >>> > not give up
> > > > > > >>> > >> the
> > > > > > >>> > >>> user friendliness we may gain now for the future
> problems
> > > > > > that
> > > > > > >>> > may or may
> > > > > > >>> > >>> not exist.
> > > > > > >>> > >>>
> > > > > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > > > > >>> > achieve the
> > > > > > >>> > >>> equivalent behavior as the operator-based approach, if
> we
> > > > > > set each
> > > > > > >>> > >> operator
> > > > > > >>> > >>> (or task) to a separate SSG. We can even provide a
> > shortcut
> > > > > > >>> > option to
> > > > > > >>> > >>> automatically do that for users, if needed.
> > > > > > >>> > >>>
> > > > > > >>> > >>>
> > > > > > >>> > >>> Thank you~
> > > > > > >>> > >>>
> > > > > > >>> > >>> Xintong Song
> > > > > > >>> > >>>
> > > > > > >>> > >>>
> > > > > > >>> > >>>
> > > > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > > > > >>> > >> wrote:
> > > > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > > > >>> > >>>>
> > > > > > >>> > >>>> I agree that being able to define the resource
> > > > > requirements
> > > > > > for a
> > > > > > >>> > >> group of
> > > > > > >>> > >>>> operators is more user friendly. However, my concern
> is
> > > > > that
> > > > > > >>> > we are
> > > > > > >>> > >>>> exposing thereby internal runtime strategies which
> might
> > > > > > >>> > limit our
> > > > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > > > semantics
> > > > > > of
> > > > > > >>> > >> configuring
> > > > > > >>> > >>>> resource requirements for SSGs could break if
> switching
> > > > > from
> > > > > > >>> > streaming
> > > > > > >>> > >> to
> > > > > > >>> > >>>> batch execution. If one defines the resource
> > requirements
> > > > > > for
> > > > > > >>> > op_1 ->
> > > > > > >>> > >> op_2
> > > > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > > > >>> > execution, then
> > > > > > >>> > >> how do
> > > > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > > > > >>> > executed with a
> > > > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > > > Consequently,
> > > > > > >>> > I am
> > > > > > >>> > >> still
> > > > > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > > > > >>> > requirements per
> > > > > > >>> > >>>> operator.
> > > > > > >>> > >>>>
> > > > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > > > easier:
> > > > > > >>> > If the
> > > > > > >>> > >> user
> > > > > > >>> > >>>> wants to use fine-grained resource requirements, then
> > she
> > > > > > >>> > needs to
> > > > > > >>> > >> specify
> > > > > > >>> > >>>> the default size which is used for operators which
> have
> > no
> > > > > > >>> > explicit
> > > > > > >>> > >>>> resource annotation. If this holds true, then every
> > > > > operator
> > > > > > >>> > would
> > > > > > >>> > >> have a
> > > > > > >>> > >>>> resource requirement and the system can try to execute
> > the
> > > > > > >>> > operators
> > > > > > >>> > >> in the
> > > > > > >>> > >>>> best possible manner w/o being constrained by how the
> > user
> > > > > > >>> > set the SSG
> > > > > > >>> > >>>> requirements.
> > > > > > >>> > >>>>
> > > > > > >>> > >>>> Cheers,
> > > > > > >>> > >>>> Till
> > > > > > >>> > >>>>
> > > > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > > >>> > >>>> wrote:
> > > > > > >>> > >>>>
> > > > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> Actually, your proposal has also come to my mind at
> > some
> > > > > > >>> > point. And I
> > > > > > >>> > >>>> have
> > > > > > >>> > >>>>> some concerns about it.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> 1. It does not give users the same control as the
> > > > > SSG-based
> > > > > > >>> > approach.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> While both approaches do not require specifying for
> > each
> > > > > > >>> > operator,
> > > > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > > > operators
> > > > > > >>> > >> together
> > > > > > >>> > >>>> use
> > > > > > >>> > >>>>> this much resource" while the operator-based approach
> > > > > > doesn't.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2,
> > ...,
> > > > > > >>> > o_m), and
> > > > > > >>> > >> at
> > > > > > >>> > >>>> some
> > > > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which
> > significantly
> > > > > > >>> > reduces the
> > > > > > >>> > >> data
> > > > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups
> > SSG_1
> > > > > > >>> > (o_1, ...,
> > > > > > >>> > >> o_n)
> > > > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > > > > higher
> > > > > > >>> > >> parallelisms
> > > > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2
> > won't
> > > > > > >>> > lead to too
> > > > > > >>> > >> much
> > > > > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > > > > different
> > > > > > >>> > >> resources,
> > > > > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > > > > >>> > resources for
> > > > > > >>> > >> the
> > > > > > >>> > >>>> two
> > > > > > >>> > >>>>> groups. However, with the operator-based approach,
> the
> > > > > > user will
> > > > > > >>> > >> have to
> > > > > > >>> > >>>>> specify resources for each operator in one of the two
> > > > > > >>> > groups, and
> > > > > > >>> > >> tune
> > > > > > >>> > >>>> the
> > > > > > >>> > >>>>> default slot resource via configurations to fit the
> > other
> > > > > > group.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> 2. It increases the chance of breaking operator
> chains.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> Setting chainnable operators into different slot
> > sharing
> > > > > > >>> > groups will
> > > > > > >>> > >>>>> prevent them from being chained. In the current
> > > > > > implementation,
> > > > > > >>> > >>>> downstream
> > > > > > >>> > >>>>> operators, if SSG not explicitly specified, will be
> set
> > > > > to
> > > > > > >>> > the same
> > > > > > >>> > >> group
> > > > > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > > > > upstream
> > > > > > >>> > >> operators
> > > > > > >>> > >>>> in
> > > > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > > > chains.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 ->
> > o_3,
> > > > > > >>> > deciding
> > > > > > >>> > >> SSGs
> > > > > > >>> > >>>>> based on whether resource is specified we will easily
> > get
> > > > > > >>> > groups like
> > > > > > >>> > >>>> (o_1,
> > > > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > > > > >>> > chained. This
> > > > > > >>> > >> is
> > > > > > >>> > >>>> also
> > > > > > >>> > >>>>> possible for the SSG-based approach, but I believe
> the
> > > > > > >>> > chance is much
> > > > > > >>> > >>>>> smaller because there's no strong reason for users to
> > > > > > >>> > specify the
> > > > > > >>> > >> groups
> > > > > > >>> > >>>>> with alternate operators like that. We are more
> likely
> > to
> > > > > > >>> > get groups
> > > > > > >>> > >> like
> > > > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > > > > between
> > > > > > >>> > o_2 and
> > > > > > >>> > >> o_3.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> 3. It complicates the system by having two different
> > > > > > >>> > mechanisms for
> > > > > > >>> > >>>> sharing
> > > > > > >>> > >>>>> managed memory in a slot.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > > > memory
> > > > > > >>> > sharing
> > > > > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > > > > >>> > according to the
> > > > > > >>> > >>>>> consumer type, then further distributed across
> > operators
> > > > > > of that
> > > > > > >>> > >> consumer
> > > > > > >>> > >>>>> type.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> - With the operator-based approach, managed memory
> size
> > > > > > >>> > specified
> > > > > > >>> > >> for an
> > > > > > >>> > >>>>> operator should account for all the consumer types of
> > > > > that
> > > > > > >>> > operator.
> > > > > > >>> > >> That
> > > > > > >>> > >>>>> means the managed memory is first distributed across
> > > > > > >>> > operators, then
> > > > > > >>> > >>>>> distributed to different consumer types of each
> > operator.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> Unfortunately, the different order of the two
> > calculation
> > > > > > >>> > steps can
> > > > > > >>> > >> lead
> > > > > > >>> > >>>> to
> > > > > > >>> > >>>>> different results. To be specific, the semantic of
> the
> > > > > > >>> > configuration
> > > > > > >>> > >>>> option
> > > > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within
> an
> > > > > > >>> > operator).
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> To sum up things:
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> While (3) might be a bit more implementation related,
> I
> > > > > > >>> > think (1)
> > > > > > >>> > >> and (2)
> > > > > > >>> > >>>>> somehow suggest that, the price for the proposed
> > approach
> > > > > > to
> > > > > > >>> > avoid
> > > > > > >>> > >>>>> specifying resource for every operator is that it's
> not
> > > > > as
> > > > > > >>> > >> independent
> > > > > > >>> > >>>> from
> > > > > > >>> > >>>>> operator chaining and slot sharing as the
> > operator-based
> > > > > > >>> > approach
> > > > > > >>> > >>>> discussed
> > > > > > >>> > >>>>> in the FLIP.
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> Thank you~
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> Xintong Song
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>>
> > > > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > > > > >>> > >> wrote:
> > > > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > > > > >>> > written. And
> > > > > > >>> > >> the
> > > > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > > > >>> > configuration to
> > > > > > >>> > >>>> users
> > > > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > > > >>> > >>>>>> So good job here!
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>> About how to let users specify the resource
> profiles.
> > > > > If I
> > > > > > >>> > can sum
> > > > > > >>> > >> the
> > > > > > >>> > >>>>> FLIP
> > > > > > >>> > >>>>>> and previous discussion up in my own words, the
> > problem
> > > > > > is the
> > > > > > >>> > >>>> following:
> > > > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > > > cleanest
> > > > > > >>> > approach,
> > > > > > >>> > >>>>> because
> > > > > > >>> > >>>>>>> it avoids mixing operator configuration (resource)
> > and
> > > > > > >>> > >> scheduling. No
> > > > > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > > > > sharing,
> > > > > > >>> > >>>> switching
> > > > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource
> > profiles
> > > > > > >>> > stay the
> > > > > > >>> > >>>> same.
> > > > > > >>> > >>>>>>> But it would require that a user specifies
> resources
> > on
> > > > > > all
> > > > > > >>> > >>>> operators,
> > > > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > > > suggests
> > > > > > going
> > > > > > >>> > >> with
> > > > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>> I think both thoughts are important, so can we find
> a
> > > > > > solution
> > > > > > >>> > >> where
> > > > > > >>> > >>>> the
> > > > > > >>> > >>>>>> Resource Profiles are specified on an Operator, but
> we
> > > > > > >>> > still avoid
> > > > > > >>> > >> that
> > > > > > >>> > >>>>> we
> > > > > > >>> > >>>>>> need to specify a resource profile on every
> operator?
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>> What do you think about something like the
> following:
> > > > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > > > level.
> > > > > > >>> > >>>>>> - Not all operators need profiles
> > > > > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > > > > in
> > > > > > the
> > > > > > >>> > >> default
> > > > > > >>> > >>>> slot
> > > > > > >>> > >>>>>> sharing group with a default profile (will get a
> > default
> > > > > > slot).
> > > > > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > > > > >>> > another slot
> > > > > > >>> > >>>>> sharing
> > > > > > >>> > >>>>>> group (the resource-specified-group).
> > > > > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > > > > >>> > operators
> > > > > > >>> > >> like
> > > > > > >>> > >>>>> they
> > > > > > >>> > >>>>>> do now, with the exception that you cannot mix
> > operators
> > > > > > >>> > that have
> > > > > > >>> > >> a
> > > > > > >>> > >>>>>> resource profile and operators that have no resource
> > > > > > profile.
> > > > > > >>> > >>>>>> - The default case where no operator has a resource
> > > > > > >>> > profile is
> > > > > > >>> > >> just a
> > > > > > >>> > >>>>>> special case of this model
> > > > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > > > operator,
> > > > > > >>> > like it
> > > > > > >>> > >> does
> > > > > > >>> > >>>>> now,
> > > > > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks
> > that
> > > > > > it
> > > > > > >>> > >> schedules
> > > > > > >>> > >>>>>> together.
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>> There is another question about reactive scaling
> > raised
> > > > > > in the
> > > > > > >>> > >> FLIP. I
> > > > > > >>> > >>>>> need
> > > > > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > > > > tricky
> > > > > > >>> > once we
> > > > > > >>> > >>>> have
> > > > > > >>> > >>>>>> slots of different sizes.
> > > > > > >>> > >>>>>> It is not clear then which of the different slot
> > > > > requests
> > > > > > the
> > > > > > >>> > >>>>>> ResourceManager should fulfill when new resources
> > (TMs)
> > > > > > >>> > show up,
> > > > > > >>> > >> or how
> > > > > > >>> > >>>>> the
> > > > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > > > resources
> > > > > > >>> > (TMs)
> > > > > > >>> > >>>>> disappear
> > > > > > >>> > >>>>>> This question is pretty orthogonal, though, to the
> > "how
> > > > > to
> > > > > > >>> > specify
> > > > > > >>> > >> the
> > > > > > >>> > >>>>>> resources".
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>> Best,
> > > > > > >>> > >>>>>> Stephan
> > > > > > >>> > >>>>>>
> > > > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > > > > >>> > >>>>> wrote:
> > > > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > > > discussion,
> > > > > > >>> > Yangze.
> > > > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> @Till,
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > > > > that
> > > > > > SSGs
> > > > > > >>> > >> need to
> > > > > > >>> > >>>>> be
> > > > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > > > otherwise
> > > > > > each
> > > > > > >>> > >>>> operator
> > > > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > > > However,
> > > > > > I
> > > > > > >>> > cannot
> > > > > > >>> > >>>> think
> > > > > > >>> > >>>>>> of
> > > > > > >>> > >>>>>>> a strong reason for not supporting SSGs in
> > fine-grained
> > > > > > >>> > resource
> > > > > > >>> > >>>>>>> management.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>>> Interestingly, if all operators have their
> resources
> > > > > > properly
> > > > > > >>> > >>>>>> specified,
> > > > > > >>> > >>>>>>>> then slot sharing is no longer needed because
> Flink
> > > > > > could
> > > > > > >>> > >> slice off
> > > > > > >>> > >>>>> the
> > > > > > >>> > >>>>>>>> appropriately sized slots for every Task
> > individually.
> > > > > > >>> > >>>>>>>>
> > > > > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > > > > >>> > operator op_1
> > > > > > >>> > >> and
> > > > > > >>> > >>>>> op_2
> > > > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would
> then
> > > > > say
> > > > > > that
> > > > > > >>> > >> the
> > > > > > >>> > >>>> slot
> > > > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > > > have
> > > > > > a
> > > > > > >>> > >> cluster
> > > > > > >>> > >>>>> with
> > > > > > >>> > >>>>>> 2
> > > > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > > > cannot run
> > > > > > >>> > >> this
> > > > > > >>> > >>>>> job.
> > > > > > >>> > >>>>>> If
> > > > > > >>> > >>>>>>>> the resources were specified on an operator level,
> > > > > then
> > > > > > the
> > > > > > >>> > >> system
> > > > > > >>> > >>>>>> could
> > > > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > > > op_2
> > > > > > to
> > > > > > >>> > >> TM_2.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> Couldn't agree more that if all operators'
> > requirements
> > > > > > are
> > > > > > >>> > >> properly
> > > > > > >>> > >>>>>>> specified, slot sharing should be no longer needed.
> I
> > > > > > >>> > think this
> > > > > > >>> > >>>>> exactly
> > > > > > >>> > >>>>>>> disproves the example. If we already know op_1 and
> > op_2
> > > > > > each
> > > > > > >>> > >> needs
> > > > > > >>> > >>>> 100
> > > > > > >>> > >>>>> MB
> > > > > > >>> > >>>>>>> of memory, why would we put them in the same group?
> > If
> > > > > > >>> > they are
> > > > > > >>> > >> in
> > > > > > >>> > >>>>>> separate
> > > > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > > > freely
> > > > > > >>> > deploy
> > > > > > >>> > >> them
> > > > > > >>> > >>>> to
> > > > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> Moreover, the precondition for not needing slot
> > sharing
> > > > > > is
> > > > > > >>> > having
> > > > > > >>> > >>>>>> resource
> > > > > > >>> > >>>>>>> requirements properly specified for all operators.
> > This
> > > > > > is not
> > > > > > >>> > >> always
> > > > > > >>> > >>>>>>> possible, and usually requires tremendous efforts.
> > One
> > > > > > of the
> > > > > > >>> > >>>> benefits
> > > > > > >>> > >>>>>> for
> > > > > > >>> > >>>>>>> SSG-based requirements is that it allows the user
> to
> > > > > > freely
> > > > > > >>> > >> decide
> > > > > > >>> > >>>> the
> > > > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > > > > >>> > consider SSG
> > > > > > >>> > >> in
> > > > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > > > operators
> > > > > > >>> > that the
> > > > > > >>> > >>>> user
> > > > > > >>> > >>>>>>> would like to specify the total resource for. There
> > can
> > > > > > be
> > > > > > >>> > only
> > > > > > >>> > >> one
> > > > > > >>> > >>>>> group
> > > > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > > > > major
> > > > > > >>> > parts,
> > > > > > >>> > >> or as
> > > > > > >>> > >>>>>> many
> > > > > > >>> > >>>>>>> groups as the number of tasks/operators, depending
> on
> > > > > how
> > > > > > >>> > >>>> fine-grained
> > > > > > >>> > >>>>>> the
> > > > > > >>> > >>>>>>> user is able to specify the resources.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But
> > given
> > > > > > >>> > that all
> > > > > > >>> > >> the
> > > > > > >>> > >>>>>>> current scheduler implementations already support
> > > > > SSGs, I
> > > > > > >>> > tend to
> > > > > > >>> > >>>> think
> > > > > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > > > > >>> > usability and
> > > > > > >>> > >>>>>>> flexibility.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> @Chesnay
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > > > > waste
> > > > > > >>> > >> resources
> > > > > > >>> > >>>> if
> > > > > > >>> > >>>>>> the
> > > > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > > > different?
> > > > > > >>> > >>>>>>>>
> > > > > > >>> > >>>>>>> Yes. It's a trade-off between usability and
> resource
> > > > > > >>> > >> utilization. To
> > > > > > >>> > >>>>>> avoid
> > > > > > >>> > >>>>>>> such wasting, the user can define more groups, so
> > that
> > > > > > >>> > each group
> > > > > > >>> > >>>>>> contains
> > > > > > >>> > >>>>>>> less operators and the chance of having operators
> > with
> > > > > > >>> > different
> > > > > > >>> > >>>>>>> parallelism will be reduced. The price is to have
> > more
> > > > > > >>> > resource
> > > > > > >>> > >>>>>>> requirements to specify.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> It also seems like quite a hassle for users having
> to
> > > > > > >>> > >> recalculate the
> > > > > > >>> > >>>>>>>> resource requirements if they change the slot
> > sharing.
> > > > > > >>> > >>>>>>>> I'd think that it's not really workable for users
> > that
> > > > > > create
> > > > > > >>> > >> a set
> > > > > > >>> > >>>>> of
> > > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > > > their
> > > > > > >>> > >>>>> applications;
> > > > > > >>> > >>>>>>>> managing the resources requirements in such a
> > setting
> > > > > > >>> > would be
> > > > > > >>> > >> a
> > > > > > >>> > >>>>>>>> nightmare, and in the end would require
> > operator-level
> > > > > > >>> > >> requirements
> > > > > > >>> > >>>>> any
> > > > > > >>> > >>>>>>>> way.
> > > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > > > increases
> > > > > > >>> > >>>>> usability.
> > > > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > > > there's no
> > > > > > >>> > >> reason to
> > > > > > >>> > >>>>> put
> > > > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > > > >>> > requirements are
> > > > > > >>> > >>>>> already
> > > > > > >>> > >>>>>>> known
> > > > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > > > management.
> > > > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > > > multiple
> > > > > > >>> > >>>>> applications,
> > > > > > >>> > >>>>>>> it does not guarantee the same resource
> > > > > requirements.
> > > > > > >>> > During
> > > > > > >>> > >> our
> > > > > > >>> > >>>>> years
> > > > > > >>> > >>>>>>> of
> > > > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > > > requirements
> > > > > > >>> > >> specified for
> > > > > > >>> > >>>>>>> Blink's
> > > > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > > > >>> > (including
> > > > > > >>> > >> our
> > > > > > >>> > >>>>>>> specialists
> > > > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > > > > >>> > >> experienced as
> > > > > > >>> > >>>>> to
> > > > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > > > >>> > >> requirements.
> > > > > > >>> > >>>> Most
> > > > > > >>> > >>>>>>> people
> > > > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > > > delay, cpu
> > > > > > >>> > >> load,
> > > > > > >>> > >>>>>> memory
> > > > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > > > specification.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> To sum up:
> > > > > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > > > > >>> > requirements
> > > > > > >>> > >> for
> > > > > > >>> > >>>>>> every
> > > > > > >>> > >>>>>>> operator, that's definitely a good thing and we
> would
> > > > > not
> > > > > > >>> > need to
> > > > > > >>> > >>>> rely
> > > > > > >>> > >>>>> on
> > > > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for
> the
> > > > > > >>> > >> fine-grained
> > > > > > >>> > >>>>>> resource
> > > > > > >>> > >>>>>>> management to work. For those users who are capable
> > and
> > > > > > do not
> > > > > > >>> > >> like
> > > > > > >>> > >>>>>> having
> > > > > > >>> > >>>>>>> to set each operator to a separate SSG, I would be
> ok
> > > > > to
> > > > > > have
> > > > > > >>> > >> both
> > > > > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and
> > to
> > > > > > only
> > > > > > >>> > >> fallback
> > > > > > >>> > >>>> to
> > > > > > >>> > >>>>>> the
> > > > > > >>> > >>>>>>> SSG requirements when the operator requirements are
> > not
> > > > > > >>> > >> specified.
> > > > > > >>> > >>>>>> However,
> > > > > > >>> > >>>>>>> as the first step, I think we should prioritise the
> > use
> > > > > > cases
> > > > > > >>> > >> where
> > > > > > >>> > >>>>> users
> > > > > > >>> > >>>>>>> are not that experienced.
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> Thank you~
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> Xintong Song
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > > > > >>> > >> chesnay@apache.org <ma...@apache.org>>
> > > > > > >>> > >>>>>>> wrote:
> > > > > > >>> > >>>>>>>
> > > > > > >>> > >>>>>>>> Will declaring them on slot sharing groups not
> also
> > > > > > waste
> > > > > > >>> > >> resources
> > > > > > >>> > >>>>> if
> > > > > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > > > > different?
> > > > > > >>> > >>>>>>>>
> > > > > > >>> > >>>>>>>> It also seems like quite a hassle for users having
> > to
> > > > > > >>> > >> recalculate
> > > > > > >>> > >>>> the
> > > > > > >>> > >>>>>>>> resource requirements if they change the slot
> > sharing.
> > > > > > >>> > >>>>>>>> I'd think that it's not really workable for users
> > that
> > > > > > create
> > > > > > >>> > >> a set
> > > > > > >>> > >>>>> of
> > > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > > > their
> > > > > > >>> > >>>>> applications;
> > > > > > >>> > >>>>>>>> managing the resources requirements in such a
> > setting
> > > > > > >>> > would be
> > > > > > >>> > >> a
> > > > > > >>> > >>>>>>>> nightmare, and in the end would require
> > operator-level
> > > > > > >>> > >> requirements
> > > > > > >>> > >>>>> any
> > > > > > >>> > >>>>>>>> way.
> > > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > > > increases
> > > > > > >>> > >>>>> usability.
> > > > > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > > > > work
> > > > > > >>> > on SSGs
> > > > > > >>> > >>>> it's
> > > > > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > > > > approaches,
> > > > > > >>> > >> which
> > > > > > >>> > >>>>>>>> would not be the case if, for the runtime, they
> are
> > > > > > always
> > > > > > >>> > >> defined
> > > > > > >>> > >>>> on
> > > > > > >>> > >>>>>> an
> > > > > > >>> > >>>>>>>> operator-level.
> > > > > > >>> > >>>>>>>>
> > > > > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > > > > discussion
> > > > > > >>> > >>>> Yangze.
> > > > > > >>> > >>>>>>>>> I like that defining resource requirements on a
> > slot
> > > > > > sharing
> > > > > > >>> > >>>> group
> > > > > > >>> > >>>>>>> makes
> > > > > > >>> > >>>>>>>>> the overall setup easier and improves usability
> of
> > > > > > resource
> > > > > > >>> > >>>>>>> requirements.
> > > > > > >>> > >>>>>>>>> What I do not like about it is that it changes
> slot
> > > > > > sharing
> > > > > > >>> > >>>> groups
> > > > > > >>> > >>>>>> from
> > > > > > >>> > >>>>>>>>> being a scheduling hint to something which needs
> to
> > > > > be
> > > > > > >>> > >> supported
> > > > > > >>> > >>>> in
> > > > > > >>> > >>>>>>> order
> > > > > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > > > > far,
> > > > > > the
> > > > > > >>> > >> idea
> > > > > > >>> > >>>> of
> > > > > > >>> > >>>>>> slot
> > > > > > >>> > >>>>>>>>> sharing groups was that it tells the system that
> a
> > > > > set
> > > > > > of
> > > > > > >>> > >>>> operators
> > > > > > >>> > >>>>>> can
> > > > > > >>> > >>>>>>>> be
> > > > > > >>> > >>>>>>>>> deployed in the same slot. But the system still
> had
> > > > > the
> > > > > > >>> > >> freedom
> > > > > > >>> > >>>> to
> > > > > > >>> > >>>>>> say
> > > > > > >>> > >>>>>>>> that
> > > > > > >>> > >>>>>>>>> it would rather place these tasks in different
> > slots
> > > > > > if it
> > > > > > >>> > >>>> wanted.
> > > > > > >>> > >>>>> If
> > > > > > >>> > >>>>>>> we
> > > > > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > > > > sharing
> > > > > > >>> > >> group,
> > > > > > >>> > >>>> then
> > > > > > >>> > >>>>>> the
> > > > > > >>> > >>>>>>>>> only option for a scheduler which does not
> support
> > > > > slot
> > > > > > >>> > >> sharing
> > > > > > >>> > >>>>>> groups
> > > > > > >>> > >>>>>>> is
> > > > > > >>> > >>>>>>>>> to say that every operator in this slot sharing
> > group
> > > > > > >>> > needs a
> > > > > > >>> > >>>> slot
> > > > > > >>> > >>>>>> with
> > > > > > >>> > >>>>>>>> the
> > > > > > >>> > >>>>>>>>> same resources as the whole group.
> > > > > > >>> > >>>>>>>>>
> > > > > > >>> > >>>>>>>>> So for example, if we have a job consisting of
> two
> > > > > > operator
> > > > > > >>> > >> op_1
> > > > > > >>> > >>>>> and
> > > > > > >>> > >>>>>>> op_2
> > > > > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would
> then
> > > > > > say that
> > > > > > >>> > >> the
> > > > > > >>> > >>>>> slot
> > > > > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If
> we
> > > > > > have a
> > > > > > >>> > >> cluster
> > > > > > >>> > >>>>>> with
> > > > > > >>> > >>>>>>> 2
> > > > > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > > > cannot run
> > > > > > >>> > >> this
> > > > > > >>> > >>>>>> job.
> > > > > > >>> > >>>>>>> If
> > > > > > >>> > >>>>>>>>> the resources were specified on an operator
> level,
> > > > > > then the
> > > > > > >>> > >>>> system
> > > > > > >>> > >>>>>>> could
> > > > > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1
> and
> > > > > > op_2 to
> > > > > > >>> > >> TM_2.
> > > > > > >>> > >>>>>>>>> Originally, one of the primary goals of slot
> > sharing
> > > > > > groups
> > > > > > >>> > >> was
> > > > > > >>> > >>>> to
> > > > > > >>> > >>>>>> make
> > > > > > >>> > >>>>>>>> it
> > > > > > >>> > >>>>>>>>> easier for the user to reason about how many
> slots
> > a
> > > > > > job
> > > > > > >>> > >> needs
> > > > > > >>> > >>>>>>>> independent
> > > > > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > > > > Interestingly,
> > > > > > >>> > >> if
> > > > > > >>> > >>>> all
> > > > > > >>> > >>>>>>>>> operators have their resources properly
> specified,
> > > > > > then slot
> > > > > > >>> > >>>>> sharing
> > > > > > >>> > >>>>>> is
> > > > > > >>> > >>>>>>>> no
> > > > > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > > > > >>> > appropriately
> > > > > > >>> > >>>> sized
> > > > > > >>> > >>>>>>> slots
> > > > > > >>> > >>>>>>>>> for every Task individually. What matters is
> > whether
> > > > > > the
> > > > > > >>> > >> whole
> > > > > > >>> > >>>>>> cluster
> > > > > > >>> > >>>>>>>> has
> > > > > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > > > > >>> > >>>>>>>>>
> > > > > > >>> > >>>>>>>>> Cheers,
> > > > > > >>> > >>>>>>>>> Till
> > > > > > >>> > >>>>>>>>>
> > > > > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > > > > >>> > >> karmagyz@gmail.com <ma...@gmail.com>>
> > > > > > >>> > >>>>>> wrote:
> > > > > > >>> > >>>>>>>>>> Hi, there,
> > > > > > >>> > >>>>>>>>>>
> > > > > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > > > > "FLIP-156:
> > > > > > >>> > >> Runtime
> > > > > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > > > > Requirements"[1],
> > > > > > >>> > >> where we
> > > > > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > > > > interfaces
> > > > > > >>> > >> for
> > > > > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > > > > >>> > >>>>>>>>>>
> > > > > > >>> > >>>>>>>>>> In this FLIP:
> > > > > > >>> > >>>>>>>>>> - Expound the user story of fine-grained
> resource
> > > > > > >>> > >> management.
> > > > > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > > > > SSG-based
> > > > > > >>> > >> resource
> > > > > > >>> > >>>>>>>>>> requirements.
> > > > > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three
> potential
> > > > > > >>> > >> granularities
> > > > > > >>> > >>>>> for
> > > > > > >>> > >>>>>>>>>> specifying the resource requirements (op, task
> and
> > > > > > slot
> > > > > > >>> > >> sharing
> > > > > > >>> > >>>>>> group)
> > > > > > >>> > >>>>>>>>>> and explain why we choose the slot sharing
> group.
> > > > > > >>> > >>>>>>>>>>
> > > > > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki
> document
> > > > > > [1].
> > > > > > >>> > >> Looking
> > > > > > >>> > >>>>>>>>>> forward to your feedback.
> > > > > > >>> > >>>>>>>>>>
> > > > > > >>> > >>>>>>>>>> [1]
> > > > > > >>> > >>>>>>>>>>
> > > > > > >>> > >>
> > > > > > >>> >
> > > > > >
> > > > >
> > >
> > >
> >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > >>> > <
> > > > > >
> > > > >
> > >
> > >
> >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > >
> > > > > > >>> > >>>>>>>>>> Best,
> > > > > > >>> > >>>>>>>>>> Yangze Guo
> > > > > > >>> > >>>>>>>>>>
> > > > > > >>> > >>>>>>>>
> > > > > > >>> >
> > > > > > >>>
> > > > > >
> > > > >
> > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Kezhu Wang <ke...@gmail.com>.

Hi Till,

Based on what I understood, if not wrong, the door is not closed after SSG
resource specifying. So, hope it could be useful in potential future
improvement.

Best,
Kezhu Wang


On February 3, 2021 at 18:07:21, Till Rohrmann (trohrmann@apache.org) wrote:

Thanks for sharing your thoughts Kezhu. I like your ideas of how
per-operator and SSG requirements can be combined. I've also thought about
defining a default resource profile for all tasks which have no resources
configured. That way all operators would have resources assigned if the
user chooses to use this feature.

As Yangze and Xintong have said, we have decided to first only support
specifying resources for SSGs as this seems more user friendly. Based on
the feedback for this feature one potential development direction might be
to allow the resource specification on per-operator basis. Here we could
pick up your ideas.

Cheers,
Till

On Wed, Feb 3, 2021 at 7:31 AM Xintong Song <to...@gmail.com> wrote:

> Thanks for your feedback, Kezhu.
>
> I think Flink *runtime* already has an ideal granularity for resource
> > management 'task'. If there is
> > a slot shared by multiple tasks, that slot's resource requirement is
> simple
> > sum of all its logical
> > slots. So basically, this is no resource requirement for
SlotSharingGroup
> > in runtime until now,
> > right ?
>
> That is a halfly-cooked implementation, coming from the previous attempts
> (years ago) trying to deliver the fine-grained resource management
feature,
> and never really put into use.
>
> From the FLIP and dicusssion, I assume that SSG resource specifying will
> > override operator level
> > resource specifying if both are specified ?
> >
> Actually, I think we should use the finer-grained resources (i.e.
operator
> level) if both are specified. And more importantly, that is based on the
> assumption that we do need two different levels of interfaces.
>
> So, I wonder whether we could interpret SSG resource specifying as an
"add"
> > but not an "set" on
> > resource requirement ?
> >
> IIUC, this is the core idea behind your proposal. I think it provides an
> interesting idea of how we combine operator level and SSG level
resources,
> *if
> we allow configuring resources at both levels*. However, I'm not sure
> whether the configuring resources on the operator level is indeed needed.
> Therefore, as a first step, this FLIP proposes to only introduce the
> SSG-level interfaces. As listed in the future plan, we would consider
> allowing operator level resource configuration later if we do see a need
> for it. At that time, we definitely should discuss what to do if
resources
> are configured at both levels.
>
> * Could SSG express negative resource requirement ?
> >
> No.
>
> Is there concrete bar for partial resource configured not function ? I
> > saw it will fail job submission in Dispatcher.submitJob.
> >
> With the SSG-based approach, this should no longer be needed. The
> constraint was introduced because we can neither properly define what is
> the resource of a task chained from an operator with specified resource
and
> another with unspecified resource, nor for a slot shared by a task with
> specified resource and another with unspecified resource. With the
> SSG-based approach, we no longer have those problems.
>
> An option(cluster/job level) to force slot sharing in scheduler ? This
> > could be useful in case of migration from FLIP-156 to future approach.
> >
> I think this is exactly what we are trying to avoid, requiring the
> scheduler to enforce slot sharing.
>
> An option(cluster) to ignore resource specifying(allow resource specified
> > job to run on open box environment) for no production usage ?
> >
> That's possible. Actually, we are planning to introduce an option for
> activating the fine-grained resource management, for development
purposes.
> We might consider to keep that option after the feature is completed, to
> allow disable the feature without having to touch the job codes.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <ke...@gmail.com> wrote:
>
> > Hi all, sorry for join discussion even after voting started.
> >
> > I want to share my thoughts on this after reading above discussions.
> >
> > I think Flink *runtime* already has an ideal granularity for resource
> > management 'task'. If there is
> > a slot shared by multiple tasks, that slot's resource requirement is
> simple
> > sum of all its logical
> > slots. So basically, this is no resource requirement for
SlotSharingGroup
> > in runtime until now,
> > right ?
> >
> > As in discussion, we already agree upon that: "If all operators have
> their
> > resources properly
> > specified, then slot sharing is no longer needed. "
> >
> > So seems to me, naturally in mind path, what we would discuss is that:
> how
> > to bridge impractical
> > operator level resource specifying to runtime task level resource
> > requirement ? This is actually a
> > pure api thing as Chesnay has pointed out.
> >
> > But FLIP-156 brings another direction on table: how about using SSG for
> > both api and runtime
> > resource specifying ?
> >
> > From the FLIP and dicusssion, I assume that SSG resource specifying
will
> > override operator level
> > resource specifying if both are specified ?
> >
> > So, I wonder whether we could interpret SSG resource specifying as an
> "add"
> > but not an "set" on
> > resource requirement ?
> >
> > The semantics is that SSG resource specifying adds additional resource
to
> > shared slot to express
> > concerns on possible high thoughput and resource requirement for tasks
in
> > one physical slot.
> >
> > The result is that if scheduler indeed respect slot sharing, allocated
> slot
> > will gain extra resource
> > specified for that SSG.
> >
> > I think one of coding barrier from "add" approach is
ResourceSpec.UNKNOWN
> > which didn't support
> > 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> > executor should be aware of
> > this.
> >
> > @Chesnay
> > > My main worry is that it if we wire the runtime to work on SSGs it's
> > > gonna be difficult to implement more fine-grained approaches, which
> > > would not be the case if, for the runtime, they are always defined on
> an
> > > operator-level.
> >
> > An "add" operation should be less invasive and enforce low barrier for
> > future find-grained
> > approaches.
> >
> > @Stephan
> > > - Users can define different slot sharing groups for operators like
> > they
> > > do now, with the exception that you cannot mix operators that have a
> > > resource profile and operators that have no resource profile.
> >
> > @Till
> > > This effectively means that all unspecified operators
> > > will implicitly have a zero resource requirement.
> > > I am wondering whether this wouldn't lead to a surprising behaviour
for
> > the
> > > user. If the user specifies the resource requirements for a single
> > > operator, then he probably will assume that the other operators will
> get
> > > the default share of resources and not nothing.
> >
> > I think it is inherent due to fact that we could not defining
> > ResourceSpec.ONE, eg. resource
> > requirement for exact one default slot, with concrete numbers ? I tend
to
> > squash out unspecified one
> > if there are operators in chaining with explicit resource specifying.
> > Otherwise, the protocol tends
> > to verbose as say "give me this much resource and a default". I think
if
> we
> > have explict resource
> > specifying for partial operators, it is just saying "I don't care other
> > operators that much, just
> > get them places to run". It is most likely be cases there are stateless
> > fliter/map or other less
> > resource consuming operators. If there is indeed a problem, I think
> clients
> > can specify a global
> > default(or other level default in future). In job graph generating
phase,
> > we could take that default
> > into account for unspecified operators.
> >
> > @FLIP-156
> > > Expose operator chaining. (Cons fo task level resource specifying)
> >
> > Is it inherent for all group level resource specifying ? They will
either
> > break chaining or obey it,
> > or event could not work with.
> >
> > To sum up above, my suggestions are:
> >
> > In api side:
> > * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> > unspecified).
> > * Operator: ResourceSpec.ZERO(unspecified) as default.
> > * Task: sum of requirements from specified operators + global
default(if
> > there are any unspecified operators)
> > * SSG: additional resource to physical slot.
> >
> > In runtime side:
> > * Task: ResourceSpec.Task or ResourceSpec.ZERO
> > * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
> >
> > Physical slot gets sum up resources from logical slots and SSG, if it
> gets
> > ResourceSpec.ZERO, it is
> > just a default sized slot.
> >
> > In short, turn SSG resource speciying as "add" and drop
> > ResourceSpec.UNKNOWN.
> >
> >
> > Questions/Issues:
> > * Could SSG express negative resource requirement ?
> > * Is there concrete bar for partial resource configured not function ?
I
> > saw it will fail job submission in Dispatcher.submitJob.
> > * An option(cluster/job level) to force slot sharing in scheduler ?
This
> > could be useful in case of migration from FLIP-156 to future approach.
> > * An option(cluster) to ignore resource specifying(allow resource
> specified
> > job to run on open box environment) for no production usage ?
> >
> >
> >
> > On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com) wrote:
> >
> > Thanks for reply, Till and Xintong!
> >
> > I update the FLIP, including:
> > - Edit the JavaDoc of the proposed
> > StreamGraphGenerator#setSlotSharingGroupResource.
> > - Add "Future Plan" section, which contains the potential follow-up
> > issues and the limitations to be documented when fine-grained resource
> > management is exposed to users.
> >
> > I'll start a vote in another thread.
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > Thanks for summarizing the discussion, Yangze. I agree that setting
> > > resource requirements per operator is not very user friendly.
> Moreover, I
> > > couldn't come up with a different proposal which would be as easy to
> use
> > > and wouldn't expose internal scheduling details. In fact, following
> this
> > > argument then we shouldn't have exposed the slot sharing groups in
the
> > > first place.
> > >
> > > What is important for the user is that we properly document the
> > limitations
> > > and constraints the fine grained resource specification has. For
> example,
> > > we should explain how optimizations like chaining are affected by it
> and
> > > how different execution modes (batch vs. streaming) affect the
> execution
> > of
> > > operators which have specified resources. These things shouldn't
become
> > > part of the contract of this feature and are more caused by internal
> > > implementation details but it will be important to understand these
> > things
> > > properly in order to use this feature effectively.
> > >
> > > Hence, +1 for starting the vote for this FLIP.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > > Thanks for the summary, Yangze.
> > > >
> > > > The changes and follow-up issues LGTM. Let's wait for responses
from
> > the
> > > > others before starting a vote.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > > summarize the current convergence in the discussion. Please let
me
> > > > > know if I got things wrong or missed something crucial here.
> > > > >
> > > > > Change of this FLIP:
> > > > > - Treat the SSG resource requirements as a hint instead of a
> > > > > restriction for the runtime. That's should be explicitly
explained
> in
> > > > > the JavaDocs.
> > > > >
> > > > > Potential follow-up issues if needed:
> > > > > - Provide operator-level resource configuration interface.
> > > > > - Provide multiple options for deciding resources for SSGs whose
> > > > > requirement is not specified:
> > > > > ** Default slot resource.
> > > > > ** Default operator resource times number of operators.
> > > > >
> > > > > If there are no other issues, I'll update the FLIP accordingly
and
> > > > > start a vote thread. Thanks all for the valuable feedback again.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > >
> > > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <
> tonysong820@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > FGRuntimeInterface.png
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <
> > tonysong820@gmail.com>
> >
> > > > > wrote:
> > > > > >>
> > > > > >> I think Chesnay's proposal could actually work. IIUC, the
> keypoint
> > is
> > > > > to derive operator requirements from SSG requirements on the API
> > side, so
> > > > > that the runtime only deals with operator requirements. It's
> > debatable
> > > > how
> > > > > the deriving should be done though. E.g., an alternative could be
> to
> > > > evenly
> > > > > divide the SSG requirement into requirements of operators in the
> > group.
> > > > > >>
> > > > > >>
> > > > > >> However, I'm not entirely sure which option is more desired.
> > > > > Illustrating my understanding in the following figure, in which
on
> > the
> > > > top
> > > > > is Chesnay's proposal and on the bottom is the SSG-based proposal
> in
> > this
> > > > > FLIP.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> I think the major difference between the two approaches is
where
> > > > > deriving operator requirements from SSG requirements happens.
> > > > > >>
> > > > > >> - Chesnay's proposal simplifies the runtime logic and the
> > interface to
> > > > > expose, at the price of moving more complexity (i.e. the
deriving)
> to
> > the
> > > > > API side. The question is, where do we prefer to keep the
> complexity?
> > I'm
> > > > > slightly leaning towards having a thin API and keep the
complexity
> in
> > > > > runtime if possible.
> > > > > >>
> > > > > >> - Notice that the dash line arrows represent optional steps
that
> > are
> > > > > needed only for schedulers that do not respect SSGs, which we
don't
> > have
> > > > at
> > > > > the moment. If we only look at the solid line arrows, then the
> > SSG-based
> > > > > approach is much simpler, without needing to derive and aggregate
> the
> > > > > requirements back and forth. I'm not sure about complicating the
> > current
> > > > > design only for the potential future needs.
> > > > > >>
> > > > > >>
> > > > > >> Thank you~
> > > > > >>
> > > > > >> Xintong Song
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> > chesnay@apache.org>
> > > > > wrote:
> > > > > >>>
> > > > > >>> You're raising a good point, but I think I can rectify that
> with
> > a
> > > > > minor
> > > > > >>> adjustment.
> > > > > >>>
> > > > > >>> Default requirements are whatever the default requirements
are,
> > > > setting
> > > > > >>> the requirements for one operator has no effect on other
> > operators.
> > > > > >>>
> > > > > >>> With these rules, and some API enhancements, the following
> mockup
> > > > would
> > > > > >>> replicate the SSG-based behavior:
> > > > > >>>
> > > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > > >>> vertices = slotSharingGroup.getVertices()
> > > > > >>>
> > > > >
> > > >
> >
>
vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > > >>> }
> > > > > >>>
> > > > > >>> We could even allow setting requirements on
slotsharing-groups
> > > > > >>> colocation-groups and internally translate them accordingly.
> > > > > >>> I can't help but feel this is a plain API issue.
> > > > > >>>
> > > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > > >>> > If I understand you correctly Chesnay, then you want to
> > decouple
> > > > the
> > > > > >>> > resource requirement specification from the slot sharing
> group
> > > > > >>> > assignment. Hence, per default all operators would be in
the
> > same
> > > > > slot
> > > > > >>> > sharing group. If there is no operator with a resource
> > > > specification,
> > > > > >>> > then the system would allocate a default slot for it. If
> there
> > is
> > > > at
> > > > > >>> > least one operator, then the system would sum up all the
> > specified
> > > > > >>> > resources and allocate a slot of this size. This
effectively
> > means
> > > > > >>> > that all unspecified operators will implicitly have a zero
> > resource
> > > > > >>> > requirement. Did I understand your idea correctly?
> > > > > >>> >
> > > > > >>> > I am wondering whether this wouldn't lead to a surprising
> > behaviour
> > > > > >>> > for the user. If the user specifies the resource
requirements
> > for a
> > > > > >>> > single operator, then he probably will assume that the
other
> > > > > operators
> > > > > >>> > will get the default share of resources and not nothing.
> > > > > >>> >
> > > > > >>> > Cheers,
> > > > > >>> > Till
> > > > > >>> >
> > > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > > chesnay@apache.org
> > > > > >>> > <ma...@apache.org>> wrote:
> > > > > >>> >
> > > > > >>> > Is there even a functional difference between specifying
the
> > > > > >>> > requirements for an SSG vs specifying the same requirements
> on
> > > > a
> > > > > >>> > single
> > > > > >>> > operator within that group (ideally a colocation group to
> avoid
> > > > > this
> > > > > >>> > whole hint business)?
> > > > > >>> >
> > > > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > > > >>> >
> > > > > >>> > Users can take shortcuts to define shared requirements,
> > > > > >>> > but refine them further as needed on a per-operator basis,
> > > > > >>> > without changing semantics of slotsharing groups
> > > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > > >>> >
> > > > > >>> > (And before anyone argues what happens if slotsharing
groups
> > > > > >>> > change or
> > > > > >>> > whatnot, that's a plain API issue that we could surely
solve.
> > > > (A
> > > > > >>> > plain
> > > > > >>> > iteration over slotsharing groups and therein contained
> > > > operators
> > > > > >>> > would
> > > > > >>> > suffice)).
> > > > > >>> >
> > > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > > >>> > > Maybe a different minor idea: Would it be possible to
treat
> > > > > the SSG
> > > > > >>> > > resource requirements as a hint for the runtime similar
to
> > > > how
> > > > > >>> > slot sharing
> > > > > >>> > > groups are designed at the moment? Meaning that we don't
> give
> > > > > >>> > the guarantee
> > > > > >>> > > that Flink will always deploy this set of tasks together
no
> > > > > >>> > matter what
> > > > > >>> > > comes. If, for example, the runtime can derive by some
> means
> > > > > the
> > > > > >>> > resource
> > > > > >>> > > requirements for each task based on the requirements for
> the
> > > > > >>> > SSG, this
> > > > > >>> > > could be possible. One easy strategy would be to give
every
> > > > > task
> > > > > >>> > the same
> > > > > >>> > > resources as the whole slot sharing group. Another one
> could
> > > > be
> > > > > >>> > > distributing the resources equally among the tasks. This
> does
> > > > > >>> > not even have
> > > > > >>> > > to be implemented but we would give ourselves the freedom
> to
> > > > > change
> > > > > >>> > > scheduling if need should arise.
> > > > > >>> > >
> > > > > >>> > > Cheers,
> > > > > >>> > > Till
> > > > > >>> > >
> > > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > > karmagyz@gmail.com
> > > > > >>> > <ma...@gmail.com>> wrote:
> > > > > >>> > >
> > > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > > >>> > >>
> > > > > >>> > >> I second Xintong's comment that SSG-based runtime
> interface
> > > > > >>> > will give
> > > > > >>> > >> us the flexibility to achieve op/task-based approach.
> That's
> > > > > one of
> > > > > >>> > >> the most important reasons for our design choice.
> > > > > >>> > >>
> > > > > >>> > >> Some cents regarding the default operator resource:
> > > > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > > > >>> > >> ** For light-weight operators, the accumulative
> > > > > >>> > configuration error
> > > > > >>> > >> will not be significant. Then, the resource of a task
used
> > > > is
> > > > > >>> > >> proportional to the number of operators it contains.
> > > > > >>> > >> ** For heavy operators like join and window or operators
> > > > > >>> > using the
> > > > > >>> > >> external resources, user will turn to the fine-grained
> > > > > resource
> > > > > >>> > >> configuration.
> > > > > >>> > >> - It can increase the stability for the standalone
cluster
> > > > > >>> > where task
> > > > > >>> > >> executors registered are heterogeneous(with different
> > > > default
> > > > > slot
> > > > > >>> > >> resources).
> > > > > >>> > >> - It might not be good for SQL users. The operators that
> SQL
> > > > > >>> > will be
> > > > > >>> > >> transferred to is a black box to the user. We also do
not
> > > > > guarantee
> > > > > >>> > >> the cross-version of consistency of the transformation
so
> > > > far.
> > > > > >>> > >>
> > > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > > fine-grained
> > > > > >>> > >> resource management is end-to-end ready.
> > > > > >>> > >>
> > > > > >>> > >> Best,
> > > > > >>> > >> Yangze Guo
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > >>> > >> wrote:
> > > > > >>> > >>> Thanks for the feedback, Till.
> > > > > >>> > >>>
> > > > > >>> > >>> ## I feel that what you proposed (operator-based +
> default
> > > > > >>> > value) might
> > > > > >>> > >> be
> > > > > >>> > >>> subsumed by the SSG-based approach.
> > > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4
> cases,
> > > > > >>> > categorized by
> > > > > >>> > >>> whether the resource requirements are known to the
users.
> > > > > >>> > >>>
> > > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > > >>> > reason to put
> > > > > >>> > >>> multiple operators whose individual resource
> > > > requirements
> > > > > >>> > are already
> > > > > >>> > >> known
> > > > > >>> > >>> into the same group in fine-grained resource
> > > > management.
> > > > > >>> > And if op_1
> > > > > >>> > >> and
> > > > > >>> > >>> op_2 are in different groups, there should be no
> > > > problem
> > > > > >>> > switching
> > > > > >>> > >> data
> > > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > > >>> > equivalent to
> > > > > >>> > >> specifying
> > > > > >>> > >>> operator resource requirements in your proposal.
> > > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > > that
> > > > > >>> > op_2 is in a
> > > > > >>> > >>> SSG whose resource is not specified thus would have the
> > > > > >>> > default slot
> > > > > >>> > >>> resource. This is equivalent to having default operator
> > > > > >>> > resources in
> > > > > >>> > >> your
> > > > > >>> > >>> proposal.
> > > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > > op_2
> > > > > >>> > to the same
> > > > > >>> > >> SSG
> > > > > >>> > >>> or separate SSGs.
> > > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > > >>> > equivalent to
> > > > > >>> > >> the
> > > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > > op_2
> > > > > >>> > share a
> > > > > >>> > >> default
> > > > > >>> > >>> size slot no matter which data exchange mode is
> > > > used.
> > > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > > of
> > > > > >>> > them will
> > > > > >>> > >> use
> > > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > > them
> > > > > >>> > with
> > > > > >>> > >> default
> > > > > >>> > >>> operator resources in your proposal.
> > > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > > > is
> > > > > >>> > known.*
> > > > > >>> > >>> - It is possible that the user learns the total /
> > > > max
> > > > > >>> > resource
> > > > > >>> > >>> requirement from executing and monitoring the job,
> > > > > >>> > while not
> > > > > >>> > >>> being aware of
> > > > > >>> > >>> individual operator requirements.
> > > > > >>> > >>> - I believe this is the case your proposal does not
> > > > > >>> > cover. And TBH,
> > > > > >>> > >>> this is probably how most users learn the resource
> > > > > >>> > requirements,
> > > > > >>> > >>> according
> > > > > >>> > >>> to my experiences.
> > > > > >>> > >>> - In this case, the user might need to specify
> > > > > >>> > different resources
> > > > > >>> > >> if
> > > > > >>> > >>> he wants to switch the execution mode, which should
> > > > > not
> > > > > >>> > be worse
> > > > > >>> > >> than not
> > > > > >>> > >>> being able to use fine-grained resource management.
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > > >>> > >>> We may provide multiple options for deciding resources
> for
> > > > > >>> > SSGs whose
> > > > > >>> > >>> requirement is not specified, if needed.
> > > > > >>> > >>>
> > > > > >>> > >>> - Default slot resource (current design)
> > > > > >>> > >>> - Default operator resource times number of operators
> > > > > >>> > (equivalent to
> > > > > >>> > >>> your proposal)
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> ## Exposing internal runtime strategies
> > > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > > >>> > requirements might be
> > > > > >>> > >>> affected if how SSGs are internally handled changes in
> > > > > future.
> > > > > >>> > >> Practically,
> > > > > >>> > >>> I do not concretely see at the moment what kind of
> changes
> > > > we
> > > > > >>> > may want in
> > > > > >>> > >>> future that might conflict with this FLIP proposal, as
> the
> > > > > >>> > question of
> > > > > >>> > >>> switching data exchange mode answered above. I'd
suggest
> to
> > > > > >>> > not give up
> > > > > >>> > >> the
> > > > > >>> > >>> user friendliness we may gain now for the future
problems
> > > > > that
> > > > > >>> > may or may
> > > > > >>> > >>> not exist.
> > > > > >>> > >>>
> > > > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > > > >>> > achieve the
> > > > > >>> > >>> equivalent behavior as the operator-based approach, if
we
> > > > > set each
> > > > > >>> > >> operator
> > > > > >>> > >>> (or task) to a separate SSG. We can even provide a
> shortcut
> > > > > >>> > option to
> > > > > >>> > >>> automatically do that for users, if needed.
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> Thank you~
> > > > > >>> > >>>
> > > > > >>> > >>> Xintong Song
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > > > >>> > >> wrote:
> > > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > > >>> > >>>>
> > > > > >>> > >>>> I agree that being able to define the resource
> > > > requirements
> > > > > for a
> > > > > >>> > >> group of
> > > > > >>> > >>>> operators is more user friendly. However, my concern
is
> > > > that
> > > > > >>> > we are
> > > > > >>> > >>>> exposing thereby internal runtime strategies which
might
> > > > > >>> > limit our
> > > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > > semantics
> > > > > of
> > > > > >>> > >> configuring
> > > > > >>> > >>>> resource requirements for SSGs could break if
switching
> > > > from
> > > > > >>> > streaming
> > > > > >>> > >> to
> > > > > >>> > >>>> batch execution. If one defines the resource
> requirements
> > > > > for
> > > > > >>> > op_1 ->
> > > > > >>> > >> op_2
> > > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > > >>> > execution, then
> > > > > >>> > >> how do
> > > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > > > >>> > executed with a
> > > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > > Consequently,
> > > > > >>> > I am
> > > > > >>> > >> still
> > > > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > > > >>> > requirements per
> > > > > >>> > >>>> operator.
> > > > > >>> > >>>>
> > > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > > easier:
> > > > > >>> > If the
> > > > > >>> > >> user
> > > > > >>> > >>>> wants to use fine-grained resource requirements, then
> she
> > > > > >>> > needs to
> > > > > >>> > >> specify
> > > > > >>> > >>>> the default size which is used for operators which
have
> no
> > > > > >>> > explicit
> > > > > >>> > >>>> resource annotation. If this holds true, then every
> > > > operator
> > > > > >>> > would
> > > > > >>> > >> have a
> > > > > >>> > >>>> resource requirement and the system can try to execute
> the
> > > > > >>> > operators
> > > > > >>> > >> in the
> > > > > >>> > >>>> best possible manner w/o being constrained by how the
> user
> > > > > >>> > set the SSG
> > > > > >>> > >>>> requirements.
> > > > > >>> > >>>>
> > > > > >>> > >>>> Cheers,
> > > > > >>> > >>>> Till
> > > > > >>> > >>>>
> > > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > >>> > >>>> wrote:
> > > > > >>> > >>>>
> > > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Actually, your proposal has also come to my mind at
> some
> > > > > >>> > point. And I
> > > > > >>> > >>>> have
> > > > > >>> > >>>>> some concerns about it.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> 1. It does not give users the same control as the
> > > > SSG-based
> > > > > >>> > approach.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> While both approaches do not require specifying for
> each
> > > > > >>> > operator,
> > > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > > operators
> > > > > >>> > >> together
> > > > > >>> > >>>> use
> > > > > >>> > >>>>> this much resource" while the operator-based approach
> > > > > doesn't.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2,
> ...,
> > > > > >>> > o_m), and
> > > > > >>> > >> at
> > > > > >>> > >>>> some
> > > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which
> significantly
> > > > > >>> > reduces the
> > > > > >>> > >> data
> > > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups
> SSG_1
> > > > > >>> > (o_1, ...,
> > > > > >>> > >> o_n)
> > > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > > > higher
> > > > > >>> > >> parallelisms
> > > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2
> won't
> > > > > >>> > lead to too
> > > > > >>> > >> much
> > > > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > > > different
> > > > > >>> > >> resources,
> > > > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > > > >>> > resources for
> > > > > >>> > >> the
> > > > > >>> > >>>> two
> > > > > >>> > >>>>> groups. However, with the operator-based approach,
the
> > > > > user will
> > > > > >>> > >> have to
> > > > > >>> > >>>>> specify resources for each operator in one of the two
> > > > > >>> > groups, and
> > > > > >>> > >> tune
> > > > > >>> > >>>> the
> > > > > >>> > >>>>> default slot resource via configurations to fit the
> other
> > > > > group.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> 2. It increases the chance of breaking operator
chains.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Setting chainnable operators into different slot
> sharing
> > > > > >>> > groups will
> > > > > >>> > >>>>> prevent them from being chained. In the current
> > > > > implementation,
> > > > > >>> > >>>> downstream
> > > > > >>> > >>>>> operators, if SSG not explicitly specified, will be
set
> > > > to
> > > > > >>> > the same
> > > > > >>> > >> group
> > > > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > > > upstream
> > > > > >>> > >> operators
> > > > > >>> > >>>> in
> > > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > > chains.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 ->
> o_3,
> > > > > >>> > deciding
> > > > > >>> > >> SSGs
> > > > > >>> > >>>>> based on whether resource is specified we will easily
> get
> > > > > >>> > groups like
> > > > > >>> > >>>> (o_1,
> > > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > > > >>> > chained. This
> > > > > >>> > >> is
> > > > > >>> > >>>> also
> > > > > >>> > >>>>> possible for the SSG-based approach, but I believe
the
> > > > > >>> > chance is much
> > > > > >>> > >>>>> smaller because there's no strong reason for users to
> > > > > >>> > specify the
> > > > > >>> > >> groups
> > > > > >>> > >>>>> with alternate operators like that. We are more
likely
> to
> > > > > >>> > get groups
> > > > > >>> > >> like
> > > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > > > between
> > > > > >>> > o_2 and
> > > > > >>> > >> o_3.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> 3. It complicates the system by having two different
> > > > > >>> > mechanisms for
> > > > > >>> > >>>> sharing
> > > > > >>> > >>>>> managed memory in a slot.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > > memory
> > > > > >>> > sharing
> > > > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > > > >>> > according to the
> > > > > >>> > >>>>> consumer type, then further distributed across
> operators
> > > > > of that
> > > > > >>> > >> consumer
> > > > > >>> > >>>>> type.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> - With the operator-based approach, managed memory
size
> > > > > >>> > specified
> > > > > >>> > >> for an
> > > > > >>> > >>>>> operator should account for all the consumer types of
> > > > that
> > > > > >>> > operator.
> > > > > >>> > >> That
> > > > > >>> > >>>>> means the managed memory is first distributed across
> > > > > >>> > operators, then
> > > > > >>> > >>>>> distributed to different consumer types of each
> operator.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Unfortunately, the different order of the two
> calculation
> > > > > >>> > steps can
> > > > > >>> > >> lead
> > > > > >>> > >>>> to
> > > > > >>> > >>>>> different results. To be specific, the semantic of
the
> > > > > >>> > configuration
> > > > > >>> > >>>> option
> > > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within
an
> > > > > >>> > operator).
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> To sum up things:
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> While (3) might be a bit more implementation related,
I
> > > > > >>> > think (1)
> > > > > >>> > >> and (2)
> > > > > >>> > >>>>> somehow suggest that, the price for the proposed
> approach
> > > > > to
> > > > > >>> > avoid
> > > > > >>> > >>>>> specifying resource for every operator is that it's
not
> > > > as
> > > > > >>> > >> independent
> > > > > >>> > >>>> from
> > > > > >>> > >>>>> operator chaining and slot sharing as the
> operator-based
> > > > > >>> > approach
> > > > > >>> > >>>> discussed
> > > > > >>> > >>>>> in the FLIP.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Thank you~
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Xintong Song
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > > > >>> > >> wrote:
> > > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > > > >>> > written. And
> > > > > >>> > >> the
> > > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > > >>> > configuration to
> > > > > >>> > >>>> users
> > > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > > >>> > >>>>>> So good job here!
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> About how to let users specify the resource
profiles.
> > > > If I
> > > > > >>> > can sum
> > > > > >>> > >> the
> > > > > >>> > >>>>> FLIP
> > > > > >>> > >>>>>> and previous discussion up in my own words, the
> problem
> > > > > is the
> > > > > >>> > >>>> following:
> > > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > > cleanest
> > > > > >>> > approach,
> > > > > >>> > >>>>> because
> > > > > >>> > >>>>>>> it avoids mixing operator configuration (resource)
> and
> > > > > >>> > >> scheduling. No
> > > > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > > > sharing,
> > > > > >>> > >>>> switching
> > > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource
> profiles
> > > > > >>> > stay the
> > > > > >>> > >>>> same.
> > > > > >>> > >>>>>>> But it would require that a user specifies
resources
> on
> > > > > all
> > > > > >>> > >>>> operators,
> > > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > > suggests
> > > > > going
> > > > > >>> > >> with
> > > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> I think both thoughts are important, so can we find
a
> > > > > solution
> > > > > >>> > >> where
> > > > > >>> > >>>> the
> > > > > >>> > >>>>>> Resource Profiles are specified on an Operator, but
we
> > > > > >>> > still avoid
> > > > > >>> > >> that
> > > > > >>> > >>>>> we
> > > > > >>> > >>>>>> need to specify a resource profile on every
operator?
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> What do you think about something like the
following:
> > > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > > level.
> > > > > >>> > >>>>>> - Not all operators need profiles
> > > > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > > > in
> > > > > the
> > > > > >>> > >> default
> > > > > >>> > >>>> slot
> > > > > >>> > >>>>>> sharing group with a default profile (will get a
> default
> > > > > slot).
> > > > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > > > >>> > another slot
> > > > > >>> > >>>>> sharing
> > > > > >>> > >>>>>> group (the resource-specified-group).
> > > > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > > > >>> > operators
> > > > > >>> > >> like
> > > > > >>> > >>>>> they
> > > > > >>> > >>>>>> do now, with the exception that you cannot mix
> operators
> > > > > >>> > that have
> > > > > >>> > >> a
> > > > > >>> > >>>>>> resource profile and operators that have no resource
> > > > > profile.
> > > > > >>> > >>>>>> - The default case where no operator has a resource
> > > > > >>> > profile is
> > > > > >>> > >> just a
> > > > > >>> > >>>>>> special case of this model
> > > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > > operator,
> > > > > >>> > like it
> > > > > >>> > >> does
> > > > > >>> > >>>>> now,
> > > > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks
> that
> > > > > it
> > > > > >>> > >> schedules
> > > > > >>> > >>>>>> together.
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> There is another question about reactive scaling
> raised
> > > > > in the
> > > > > >>> > >> FLIP. I
> > > > > >>> > >>>>> need
> > > > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > > > tricky
> > > > > >>> > once we
> > > > > >>> > >>>> have
> > > > > >>> > >>>>>> slots of different sizes.
> > > > > >>> > >>>>>> It is not clear then which of the different slot
> > > > requests
> > > > > the
> > > > > >>> > >>>>>> ResourceManager should fulfill when new resources
> (TMs)
> > > > > >>> > show up,
> > > > > >>> > >> or how
> > > > > >>> > >>>>> the
> > > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > > resources
> > > > > >>> > (TMs)
> > > > > >>> > >>>>> disappear
> > > > > >>> > >>>>>> This question is pretty orthogonal, though, to the
> "how
> > > > to
> > > > > >>> > specify
> > > > > >>> > >> the
> > > > > >>> > >>>>>> resources".
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> Best,
> > > > > >>> > >>>>>> Stephan
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > > > >>> > >>>>> wrote:
> > > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > > discussion,
> > > > > >>> > Yangze.
> > > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> @Till,
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > > > that
> > > > > SSGs
> > > > > >>> > >> need to
> > > > > >>> > >>>>> be
> > > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > > otherwise
> > > > > each
> > > > > >>> > >>>> operator
> > > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > > However,
> > > > > I
> > > > > >>> > cannot
> > > > > >>> > >>>> think
> > > > > >>> > >>>>>> of
> > > > > >>> > >>>>>>> a strong reason for not supporting SSGs in
> fine-grained
> > > > > >>> > resource
> > > > > >>> > >>>>>>> management.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>>> Interestingly, if all operators have their
resources
> > > > > properly
> > > > > >>> > >>>>>> specified,
> > > > > >>> > >>>>>>>> then slot sharing is no longer needed because
Flink
> > > > > could
> > > > > >>> > >> slice off
> > > > > >>> > >>>>> the
> > > > > >>> > >>>>>>>> appropriately sized slots for every Task
> individually.
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > > > >>> > operator op_1
> > > > > >>> > >> and
> > > > > >>> > >>>>> op_2
> > > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would
then
> > > > say
> > > > > that
> > > > > >>> > >> the
> > > > > >>> > >>>> slot
> > > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > > have
> > > > > a
> > > > > >>> > >> cluster
> > > > > >>> > >>>>> with
> > > > > >>> > >>>>>> 2
> > > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > > cannot run
> > > > > >>> > >> this
> > > > > >>> > >>>>> job.
> > > > > >>> > >>>>>> If
> > > > > >>> > >>>>>>>> the resources were specified on an operator level,
> > > > then
> > > > > the
> > > > > >>> > >> system
> > > > > >>> > >>>>>> could
> > > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > > op_2
> > > > > to
> > > > > >>> > >> TM_2.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Couldn't agree more that if all operators'
> requirements
> > > > > are
> > > > > >>> > >> properly
> > > > > >>> > >>>>>>> specified, slot sharing should be no longer needed.
I
> > > > > >>> > think this
> > > > > >>> > >>>>> exactly
> > > > > >>> > >>>>>>> disproves the example. If we already know op_1 and
> op_2
> > > > > each
> > > > > >>> > >> needs
> > > > > >>> > >>>> 100
> > > > > >>> > >>>>> MB
> > > > > >>> > >>>>>>> of memory, why would we put them in the same group?
> If
> > > > > >>> > they are
> > > > > >>> > >> in
> > > > > >>> > >>>>>> separate
> > > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > > freely
> > > > > >>> > deploy
> > > > > >>> > >> them
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Moreover, the precondition for not needing slot
> sharing
> > > > > is
> > > > > >>> > having
> > > > > >>> > >>>>>> resource
> > > > > >>> > >>>>>>> requirements properly specified for all operators.
> This
> > > > > is not
> > > > > >>> > >> always
> > > > > >>> > >>>>>>> possible, and usually requires tremendous efforts.
> One
> > > > > of the
> > > > > >>> > >>>> benefits
> > > > > >>> > >>>>>> for
> > > > > >>> > >>>>>>> SSG-based requirements is that it allows the user
to
> > > > > freely
> > > > > >>> > >> decide
> > > > > >>> > >>>> the
> > > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > > > >>> > consider SSG
> > > > > >>> > >> in
> > > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > > operators
> > > > > >>> > that the
> > > > > >>> > >>>> user
> > > > > >>> > >>>>>>> would like to specify the total resource for. There
> can
> > > > > be
> > > > > >>> > only
> > > > > >>> > >> one
> > > > > >>> > >>>>> group
> > > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > > > major
> > > > > >>> > parts,
> > > > > >>> > >> or as
> > > > > >>> > >>>>>> many
> > > > > >>> > >>>>>>> groups as the number of tasks/operators, depending
on
> > > > how
> > > > > >>> > >>>> fine-grained
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>> user is able to specify the resources.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But
> given
> > > > > >>> > that all
> > > > > >>> > >> the
> > > > > >>> > >>>>>>> current scheduler implementations already support
> > > > SSGs, I
> > > > > >>> > tend to
> > > > > >>> > >>>> think
> > > > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > > > >>> > usability and
> > > > > >>> > >>>>>>> flexibility.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> @Chesnay
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > > > waste
> > > > > >>> > >> resources
> > > > > >>> > >>>> if
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > > different?
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>> Yes. It's a trade-off between usability and
resource
> > > > > >>> > >> utilization. To
> > > > > >>> > >>>>>> avoid
> > > > > >>> > >>>>>>> such wasting, the user can define more groups, so
> that
> > > > > >>> > each group
> > > > > >>> > >>>>>> contains
> > > > > >>> > >>>>>>> less operators and the chance of having operators
> with
> > > > > >>> > different
> > > > > >>> > >>>>>>> parallelism will be reduced. The price is to have
> more
> > > > > >>> > resource
> > > > > >>> > >>>>>>> requirements to specify.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> It also seems like quite a hassle for users having
to
> > > > > >>> > >> recalculate the
> > > > > >>> > >>>>>>>> resource requirements if they change the slot
> sharing.
> > > > > >>> > >>>>>>>> I'd think that it's not really workable for users
> that
> > > > > create
> > > > > >>> > >> a set
> > > > > >>> > >>>>> of
> > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > > their
> > > > > >>> > >>>>> applications;
> > > > > >>> > >>>>>>>> managing the resources requirements in such a
> setting
> > > > > >>> > would be
> > > > > >>> > >> a
> > > > > >>> > >>>>>>>> nightmare, and in the end would require
> operator-level
> > > > > >>> > >> requirements
> > > > > >>> > >>>>> any
> > > > > >>> > >>>>>>>> way.
> > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > > increases
> > > > > >>> > >>>>> usability.
> > > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > > there's no
> > > > > >>> > >> reason to
> > > > > >>> > >>>>> put
> > > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > > >>> > requirements are
> > > > > >>> > >>>>> already
> > > > > >>> > >>>>>>> known
> > > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > > management.
> > > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > > multiple
> > > > > >>> > >>>>> applications,
> > > > > >>> > >>>>>>> it does not guarantee the same resource
> > > > requirements.
> > > > > >>> > During
> > > > > >>> > >> our
> > > > > >>> > >>>>> years
> > > > > >>> > >>>>>>> of
> > > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > > requirements
> > > > > >>> > >> specified for
> > > > > >>> > >>>>>>> Blink's
> > > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > > >>> > (including
> > > > > >>> > >> our
> > > > > >>> > >>>>>>> specialists
> > > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > > > >>> > >> experienced as
> > > > > >>> > >>>>> to
> > > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > > >>> > >> requirements.
> > > > > >>> > >>>> Most
> > > > > >>> > >>>>>>> people
> > > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > > delay, cpu
> > > > > >>> > >> load,
> > > > > >>> > >>>>>> memory
> > > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > > specification.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> To sum up:
> > > > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > > > >>> > requirements
> > > > > >>> > >> for
> > > > > >>> > >>>>>> every
> > > > > >>> > >>>>>>> operator, that's definitely a good thing and we
would
> > > > not
> > > > > >>> > need to
> > > > > >>> > >>>> rely
> > > > > >>> > >>>>> on
> > > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for
the
> > > > > >>> > >> fine-grained
> > > > > >>> > >>>>>> resource
> > > > > >>> > >>>>>>> management to work. For those users who are capable
> and
> > > > > do not
> > > > > >>> > >> like
> > > > > >>> > >>>>>> having
> > > > > >>> > >>>>>>> to set each operator to a separate SSG, I would be
ok
> > > > to
> > > > > have
> > > > > >>> > >> both
> > > > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and
> to
> > > > > only
> > > > > >>> > >> fallback
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>> SSG requirements when the operator requirements are
> not
> > > > > >>> > >> specified.
> > > > > >>> > >>>>>> However,
> > > > > >>> > >>>>>>> as the first step, I think we should prioritise the
> use
> > > > > cases
> > > > > >>> > >> where
> > > > > >>> > >>>>> users
> > > > > >>> > >>>>>>> are not that experienced.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Thank you~
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Xintong Song
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > > > >>> > >> chesnay@apache.org <ma...@apache.org>>
> > > > > >>> > >>>>>>> wrote:
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>>> Will declaring them on slot sharing groups not
also
> > > > > waste
> > > > > >>> > >> resources
> > > > > >>> > >>>>> if
> > > > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > > > different?
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>>> It also seems like quite a hassle for users having
> to
> > > > > >>> > >> recalculate
> > > > > >>> > >>>> the
> > > > > >>> > >>>>>>>> resource requirements if they change the slot
> sharing.
> > > > > >>> > >>>>>>>> I'd think that it's not really workable for users
> that
> > > > > create
> > > > > >>> > >> a set
> > > > > >>> > >>>>> of
> > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > > their
> > > > > >>> > >>>>> applications;
> > > > > >>> > >>>>>>>> managing the resources requirements in such a
> setting
> > > > > >>> > would be
> > > > > >>> > >> a
> > > > > >>> > >>>>>>>> nightmare, and in the end would require
> operator-level
> > > > > >>> > >> requirements
> > > > > >>> > >>>>> any
> > > > > >>> > >>>>>>>> way.
> > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > > increases
> > > > > >>> > >>>>> usability.
> > > > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > > > work
> > > > > >>> > on SSGs
> > > > > >>> > >>>> it's
> > > > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > > > approaches,
> > > > > >>> > >> which
> > > > > >>> > >>>>>>>> would not be the case if, for the runtime, they
are
> > > > > always
> > > > > >>> > >> defined
> > > > > >>> > >>>> on
> > > > > >>> > >>>>>> an
> > > > > >>> > >>>>>>>> operator-level.
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > > > discussion
> > > > > >>> > >>>> Yangze.
> > > > > >>> > >>>>>>>>> I like that defining resource requirements on a
> slot
> > > > > sharing
> > > > > >>> > >>>> group
> > > > > >>> > >>>>>>> makes
> > > > > >>> > >>>>>>>>> the overall setup easier and improves usability
of
> > > > > resource
> > > > > >>> > >>>>>>> requirements.
> > > > > >>> > >>>>>>>>> What I do not like about it is that it changes
slot
> > > > > sharing
> > > > > >>> > >>>> groups
> > > > > >>> > >>>>>> from
> > > > > >>> > >>>>>>>>> being a scheduling hint to something which needs
to
> > > > be
> > > > > >>> > >> supported
> > > > > >>> > >>>> in
> > > > > >>> > >>>>>>> order
> > > > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > > > far,
> > > > > the
> > > > > >>> > >> idea
> > > > > >>> > >>>> of
> > > > > >>> > >>>>>> slot
> > > > > >>> > >>>>>>>>> sharing groups was that it tells the system that
a
> > > > set
> > > > > of
> > > > > >>> > >>>> operators
> > > > > >>> > >>>>>> can
> > > > > >>> > >>>>>>>> be
> > > > > >>> > >>>>>>>>> deployed in the same slot. But the system still
had
> > > > the
> > > > > >>> > >> freedom
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>> say
> > > > > >>> > >>>>>>>> that
> > > > > >>> > >>>>>>>>> it would rather place these tasks in different
> slots
> > > > > if it
> > > > > >>> > >>>> wanted.
> > > > > >>> > >>>>> If
> > > > > >>> > >>>>>>> we
> > > > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > > > sharing
> > > > > >>> > >> group,
> > > > > >>> > >>>> then
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>>>> only option for a scheduler which does not
support
> > > > slot
> > > > > >>> > >> sharing
> > > > > >>> > >>>>>> groups
> > > > > >>> > >>>>>>> is
> > > > > >>> > >>>>>>>>> to say that every operator in this slot sharing
> group
> > > > > >>> > needs a
> > > > > >>> > >>>> slot
> > > > > >>> > >>>>>> with
> > > > > >>> > >>>>>>>> the
> > > > > >>> > >>>>>>>>> same resources as the whole group.
> > > > > >>> > >>>>>>>>>
> > > > > >>> > >>>>>>>>> So for example, if we have a job consisting of
two
> > > > > operator
> > > > > >>> > >> op_1
> > > > > >>> > >>>>> and
> > > > > >>> > >>>>>>> op_2
> > > > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would
then
> > > > > say that
> > > > > >>> > >> the
> > > > > >>> > >>>>> slot
> > > > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If
we
> > > > > have a
> > > > > >>> > >> cluster
> > > > > >>> > >>>>>> with
> > > > > >>> > >>>>>>> 2
> > > > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > > cannot run
> > > > > >>> > >> this
> > > > > >>> > >>>>>> job.
> > > > > >>> > >>>>>>> If
> > > > > >>> > >>>>>>>>> the resources were specified on an operator
level,
> > > > > then the
> > > > > >>> > >>>> system
> > > > > >>> > >>>>>>> could
> > > > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1
and
> > > > > op_2 to
> > > > > >>> > >> TM_2.
> > > > > >>> > >>>>>>>>> Originally, one of the primary goals of slot
> sharing
> > > > > groups
> > > > > >>> > >> was
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>> make
> > > > > >>> > >>>>>>>> it
> > > > > >>> > >>>>>>>>> easier for the user to reason about how many
slots
> a
> > > > > job
> > > > > >>> > >> needs
> > > > > >>> > >>>>>>>> independent
> > > > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > > > Interestingly,
> > > > > >>> > >> if
> > > > > >>> > >>>> all
> > > > > >>> > >>>>>>>>> operators have their resources properly
specified,
> > > > > then slot
> > > > > >>> > >>>>> sharing
> > > > > >>> > >>>>>> is
> > > > > >>> > >>>>>>>> no
> > > > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > > > >>> > appropriately
> > > > > >>> > >>>> sized
> > > > > >>> > >>>>>>> slots
> > > > > >>> > >>>>>>>>> for every Task individually. What matters is
> whether
> > > > > the
> > > > > >>> > >> whole
> > > > > >>> > >>>>>> cluster
> > > > > >>> > >>>>>>>> has
> > > > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > > > >>> > >>>>>>>>>
> > > > > >>> > >>>>>>>>> Cheers,
> > > > > >>> > >>>>>>>>> Till
> > > > > >>> > >>>>>>>>>
> > > > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > > > >>> > >> karmagyz@gmail.com <ma...@gmail.com>>
> > > > > >>> > >>>>>> wrote:
> > > > > >>> > >>>>>>>>>> Hi, there,
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > > > "FLIP-156:
> > > > > >>> > >> Runtime
> > > > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > > > Requirements"[1],
> > > > > >>> > >> where we
> > > > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > > > interfaces
> > > > > >>> > >> for
> > > > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> In this FLIP:
> > > > > >>> > >>>>>>>>>> - Expound the user story of fine-grained
resource
> > > > > >>> > >> management.
> > > > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > > > SSG-based
> > > > > >>> > >> resource
> > > > > >>> > >>>>>>>>>> requirements.
> > > > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three
potential
> > > > > >>> > >> granularities
> > > > > >>> > >>>>> for
> > > > > >>> > >>>>>>>>>> specifying the resource requirements (op, task
and
> > > > > slot
> > > > > >>> > >> sharing
> > > > > >>> > >>>>>> group)
> > > > > >>> > >>>>>>>>>> and explain why we choose the slot sharing
group.
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki
document
> > > > > [1].
> > > > > >>> > >> Looking
> > > > > >>> > >>>>>>>>>> forward to your feedback.
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> [1]
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>
> > > > > >>> >
> > > > >
> > > >
> >
> >
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > >>> > <
> > > > >
> > > >
> >
> >
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > >
> > > > > >>> > >>>>>>>>>> Best,
> > > > > >>> > >>>>>>>>>> Yangze Guo
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for sharing your thoughts Kezhu. I like your ideas of how
per-operator and SSG requirements can be combined. I've also thought about
defining a default resource profile for all tasks which have no resources
configured. That way all operators would have resources assigned if the
user chooses to use this feature.

As Yangze and Xintong have said, we have decided to first only support
specifying resources for SSGs as this seems more user friendly. Based on
the feedback for this feature one potential development direction might be
to allow the resource specification on per-operator basis. Here we could
pick up your ideas.

Cheers,
Till

On Wed, Feb 3, 2021 at 7:31 AM Xintong Song <to...@gmail.com> wrote:

> Thanks for your feedback, Kezhu.
>
> I think Flink *runtime* already has an ideal granularity for resource
> > management 'task'. If there is
> > a slot shared by multiple tasks, that slot's resource requirement is
> simple
> > sum of all its logical
> > slots. So basically, this is no resource requirement for SlotSharingGroup
> > in runtime until now,
> > right ?
>
> That is a halfly-cooked implementation, coming from the previous attempts
> (years ago) trying to deliver the fine-grained resource management feature,
> and never really put into use.
>
> From the FLIP and dicusssion, I assume that SSG resource specifying will
> > override operator level
> > resource specifying if both are specified ?
> >
> Actually, I think we should use the finer-grained resources (i.e. operator
> level) if both are specified. And more importantly, that is based on the
> assumption that we do need two different levels of interfaces.
>
> So, I wonder whether we could interpret SSG resource specifying as an "add"
> > but not an "set" on
> > resource requirement ?
> >
> IIUC, this is the core idea behind your proposal. I think it provides an
> interesting idea of how we combine operator level and SSG level resources,
> *if
> we allow configuring resources at both levels*. However, I'm not sure
> whether the configuring resources on the operator level is indeed needed.
> Therefore, as a first step, this FLIP proposes to only introduce the
> SSG-level interfaces. As listed in the future plan, we would consider
> allowing operator level resource configuration later if we do see a need
> for it. At that time, we definitely should discuss what to do if resources
> are configured at both levels.
>
> * Could SSG express negative resource requirement ?
> >
> No.
>
> Is there concrete bar for partial resource configured not function ? I
> > saw it will fail job submission in Dispatcher.submitJob.
> >
> With the SSG-based approach, this should no longer be needed. The
> constraint was introduced because we can neither properly define what is
> the resource of a task chained from an operator with specified resource and
> another with unspecified resource, nor for a slot shared by a task with
> specified resource and another with unspecified resource. With the
> SSG-based approach, we no longer have those problems.
>
> An option(cluster/job level) to force slot sharing in scheduler ? This
> > could be useful in case of migration from FLIP-156 to future approach.
> >
> I think this is exactly what we are trying to avoid, requiring the
> scheduler to enforce slot sharing.
>
> An option(cluster) to ignore resource specifying(allow resource specified
> > job to run on open box environment) for no production usage ?
> >
> That's possible. Actually, we are planning to introduce an option for
> activating the fine-grained resource management, for development purposes.
> We might consider to keep that option after the feature is completed, to
> allow disable the feature without having to touch the job codes.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <ke...@gmail.com> wrote:
>
> > Hi all, sorry for join discussion even after voting started.
> >
> > I want to share my thoughts on this after reading above discussions.
> >
> > I think Flink *runtime* already has an ideal granularity for resource
> > management 'task'. If there is
> > a slot shared by multiple tasks, that slot's resource requirement is
> simple
> > sum of all its logical
> > slots. So basically, this is no resource requirement for SlotSharingGroup
> > in runtime until now,
> > right ?
> >
> > As in discussion, we already agree upon that: "If all operators have
> their
> > resources properly
> > specified, then slot sharing is no longer needed. "
> >
> > So seems to me, naturally in mind path, what we would discuss is that:
> how
> > to bridge impractical
> > operator level resource specifying to runtime task level resource
> > requirement ? This is actually a
> > pure api thing as Chesnay has pointed out.
> >
> > But FLIP-156 brings another direction on table: how about using SSG for
> > both api and runtime
> > resource specifying ?
> >
> > From the FLIP and dicusssion, I assume that SSG resource specifying will
> > override operator level
> > resource specifying if both are specified ?
> >
> > So, I wonder whether we could interpret SSG resource specifying as an
> "add"
> > but not an "set" on
> > resource requirement ?
> >
> > The semantics is that SSG resource specifying adds additional resource to
> > shared slot to express
> > concerns on possible high thoughput and resource requirement for tasks in
> > one physical slot.
> >
> > The result is that if scheduler indeed respect slot sharing, allocated
> slot
> > will gain extra resource
> > specified for that SSG.
> >
> > I think one of coding barrier from "add" approach is ResourceSpec.UNKNOWN
> > which didn't support
> > 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> > executor should be aware of
> > this.
> >
> > @Chesnay
> > > My main worry is that it if we wire the runtime to work on SSGs it's
> > > gonna be difficult to implement more fine-grained approaches, which
> > > would not be the case if, for the runtime, they are always defined on
> an
> > > operator-level.
> >
> > An "add" operation should be less invasive and enforce low barrier for
> > future find-grained
> > approaches.
> >
> > @Stephan
> > >   - Users can define different slot sharing groups for operators like
> > they
> > > do now, with the exception that you cannot mix operators that have a
> > > resource profile and operators that have no resource profile.
> >
> > @Till
> > > This effectively means that all unspecified operators
> > > will implicitly have a zero resource requirement.
> > > I am wondering whether this wouldn't lead to a surprising behaviour for
> > the
> > > user. If the user specifies the resource requirements for a single
> > > operator, then he probably will assume that the other operators will
> get
> > > the default share of resources and not nothing.
> >
> > I think it is inherent due to fact that we could not defining
> > ResourceSpec.ONE, eg. resource
> > requirement for exact one default slot, with concrete numbers ? I tend to
> > squash out unspecified one
> > if there are operators in chaining with explicit resource specifying.
> > Otherwise, the protocol tends
> > to verbose as say "give me this much resource and a default". I think if
> we
> > have explict resource
> > specifying for partial operators, it is just saying "I don't care other
> > operators that much, just
> > get them places to run". It is most likely be cases there are stateless
> > fliter/map or other less
> > resource consuming operators. If there is indeed a problem, I think
> clients
> > can specify a global
> > default(or other level default in future). In job graph generating phase,
> > we could take that default
> > into account for unspecified operators.
> >
> > @FLIP-156
> > > Expose operator chaining. (Cons fo task level resource specifying)
> >
> > Is it inherent for all group level resource specifying ? They will either
> > break chaining or obey it,
> > or event could not work with.
> >
> > To sum up above, my suggestions are:
> >
> > In api side:
> > * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> > unspecified).
> > * Operator: ResourceSpec.ZERO(unspecified) as default.
> > * Task: sum of requirements from specified operators + global default(if
> > there are any unspecified operators)
> > * SSG: additional resource to physical slot.
> >
> > In runtime side:
> > * Task: ResourceSpec.Task or ResourceSpec.ZERO
> > * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
> >
> > Physical slot gets sum up resources from logical slots and SSG, if it
> gets
> > ResourceSpec.ZERO, it is
> > just a default sized slot.
> >
> > In short, turn SSG resource speciying as "add" and drop
> > ResourceSpec.UNKNOWN.
> >
> >
> > Questions/Issues:
> > * Could SSG express negative resource requirement ?
> > * Is there concrete bar for partial resource configured not function ? I
> > saw it will fail job submission in Dispatcher.submitJob.
> > * An option(cluster/job level) to force slot sharing in scheduler ? This
> > could be useful in case of migration from FLIP-156 to future approach.
> > * An option(cluster) to ignore resource specifying(allow resource
> specified
> > job to run on open box environment) for no production usage ?
> >
> >
> >
> > On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com) wrote:
> >
> > Thanks for reply, Till and Xintong!
> >
> > I update the FLIP, including:
> > - Edit the JavaDoc of the proposed
> > StreamGraphGenerator#setSlotSharingGroupResource.
> > - Add "Future Plan" section, which contains the potential follow-up
> > issues and the limitations to be documented when fine-grained resource
> > management is exposed to users.
> >
> > I'll start a vote in another thread.
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org>
> > wrote:
> > >
> > > Thanks for summarizing the discussion, Yangze. I agree that setting
> > > resource requirements per operator is not very user friendly.
> Moreover, I
> > > couldn't come up with a different proposal which would be as easy to
> use
> > > and wouldn't expose internal scheduling details. In fact, following
> this
> > > argument then we shouldn't have exposed the slot sharing groups in the
> > > first place.
> > >
> > > What is important for the user is that we properly document the
> > limitations
> > > and constraints the fine grained resource specification has. For
> example,
> > > we should explain how optimizations like chaining are affected by it
> and
> > > how different execution modes (batch vs. streaming) affect the
> execution
> > of
> > > operators which have specified resources. These things shouldn't become
> > > part of the contract of this feature and are more caused by internal
> > > implementation details but it will be important to understand these
> > things
> > > properly in order to use this feature effectively.
> > >
> > > Hence, +1 for starting the vote for this FLIP.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > > Thanks for the summary, Yangze.
> > > >
> > > > The changes and follow-up issues LGTM. Let's wait for responses from
> > the
> > > > others before starting a vote.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > > summarize the current convergence in the discussion. Please let me
> > > > > know if I got things wrong or missed something crucial here.
> > > > >
> > > > > Change of this FLIP:
> > > > > - Treat the SSG resource requirements as a hint instead of a
> > > > > restriction for the runtime. That's should be explicitly explained
> in
> > > > > the JavaDocs.
> > > > >
> > > > > Potential follow-up issues if needed:
> > > > > - Provide operator-level resource configuration interface.
> > > > > - Provide multiple options for deciding resources for SSGs whose
> > > > > requirement is not specified:
> > > > > ** Default slot resource.
> > > > > ** Default operator resource times number of operators.
> > > > >
> > > > > If there are no other issues, I'll update the FLIP accordingly and
> > > > > start a vote thread. Thanks all for the valuable feedback again.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > >
> > > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <
> tonysong820@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > FGRuntimeInterface.png
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <
> > tonysong820@gmail.com>
> >
> > > > > wrote:
> > > > > >>
> > > > > >> I think Chesnay's proposal could actually work. IIUC, the
> keypoint
> > is
> > > > > to derive operator requirements from SSG requirements on the API
> > side, so
> > > > > that the runtime only deals with operator requirements. It's
> > debatable
> > > > how
> > > > > the deriving should be done though. E.g., an alternative could be
> to
> > > > evenly
> > > > > divide the SSG requirement into requirements of operators in the
> > group.
> > > > > >>
> > > > > >>
> > > > > >> However, I'm not entirely sure which option is more desired.
> > > > > Illustrating my understanding in the following figure, in which on
> > the
> > > > top
> > > > > is Chesnay's proposal and on the bottom is the SSG-based proposal
> in
> > this
> > > > > FLIP.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> I think the major difference between the two approaches is where
> > > > > deriving operator requirements from SSG requirements happens.
> > > > > >>
> > > > > >> - Chesnay's proposal simplifies the runtime logic and the
> > interface to
> > > > > expose, at the price of moving more complexity (i.e. the deriving)
> to
> > the
> > > > > API side. The question is, where do we prefer to keep the
> complexity?
> > I'm
> > > > > slightly leaning towards having a thin API and keep the complexity
> in
> > > > > runtime if possible.
> > > > > >>
> > > > > >> - Notice that the dash line arrows represent optional steps that
> > are
> > > > > needed only for schedulers that do not respect SSGs, which we don't
> > have
> > > > at
> > > > > the moment. If we only look at the solid line arrows, then the
> > SSG-based
> > > > > approach is much simpler, without needing to derive and aggregate
> the
> > > > > requirements back and forth. I'm not sure about complicating the
> > current
> > > > > design only for the potential future needs.
> > > > > >>
> > > > > >>
> > > > > >> Thank you~
> > > > > >>
> > > > > >> Xintong Song
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> > chesnay@apache.org>
> > > > > wrote:
> > > > > >>>
> > > > > >>> You're raising a good point, but I think I can rectify that
> with
> > a
> > > > > minor
> > > > > >>> adjustment.
> > > > > >>>
> > > > > >>> Default requirements are whatever the default requirements are,
> > > > setting
> > > > > >>> the requirements for one operator has no effect on other
> > operators.
> > > > > >>>
> > > > > >>> With these rules, and some API enhancements, the following
> mockup
> > > > would
> > > > > >>> replicate the SSG-based behavior:
> > > > > >>>
> > > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > > >>> vertices = slotSharingGroup.getVertices()
> > > > > >>>
> > > > >
> > > >
> >
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > > >>> }
> > > > > >>>
> > > > > >>> We could even allow setting requirements on slotsharing-groups
> > > > > >>> colocation-groups and internally translate them accordingly.
> > > > > >>> I can't help but feel this is a plain API issue.
> > > > > >>>
> > > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > > >>> > If I understand you correctly Chesnay, then you want to
> > decouple
> > > > the
> > > > > >>> > resource requirement specification from the slot sharing
> group
> > > > > >>> > assignment. Hence, per default all operators would be in the
> > same
> > > > > slot
> > > > > >>> > sharing group. If there is no operator with a resource
> > > > specification,
> > > > > >>> > then the system would allocate a default slot for it. If
> there
> > is
> > > > at
> > > > > >>> > least one operator, then the system would sum up all the
> > specified
> > > > > >>> > resources and allocate a slot of this size. This effectively
> > means
> > > > > >>> > that all unspecified operators will implicitly have a zero
> > resource
> > > > > >>> > requirement. Did I understand your idea correctly?
> > > > > >>> >
> > > > > >>> > I am wondering whether this wouldn't lead to a surprising
> > behaviour
> > > > > >>> > for the user. If the user specifies the resource requirements
> > for a
> > > > > >>> > single operator, then he probably will assume that the other
> > > > > operators
> > > > > >>> > will get the default share of resources and not nothing.
> > > > > >>> >
> > > > > >>> > Cheers,
> > > > > >>> > Till
> > > > > >>> >
> > > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > > chesnay@apache.org
> > > > > >>> > <ma...@apache.org>> wrote:
> > > > > >>> >
> > > > > >>> > Is there even a functional difference between specifying the
> > > > > >>> > requirements for an SSG vs specifying the same requirements
> on
> > > > a
> > > > > >>> > single
> > > > > >>> > operator within that group (ideally a colocation group to
> avoid
> > > > > this
> > > > > >>> > whole hint business)?
> > > > > >>> >
> > > > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > > > >>> >
> > > > > >>> > Users can take shortcuts to define shared requirements,
> > > > > >>> > but refine them further as needed on a per-operator basis,
> > > > > >>> > without changing semantics of slotsharing groups
> > > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > > >>> >
> > > > > >>> > (And before anyone argues what happens if slotsharing groups
> > > > > >>> > change or
> > > > > >>> > whatnot, that's a plain API issue that we could surely solve.
> > > > (A
> > > > > >>> > plain
> > > > > >>> > iteration over slotsharing groups and therein contained
> > > > operators
> > > > > >>> > would
> > > > > >>> > suffice)).
> > > > > >>> >
> > > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > > >>> > > Maybe a different minor idea: Would it be possible to treat
> > > > > the SSG
> > > > > >>> > > resource requirements as a hint for the runtime similar to
> > > > how
> > > > > >>> > slot sharing
> > > > > >>> > > groups are designed at the moment? Meaning that we don't
> give
> > > > > >>> > the guarantee
> > > > > >>> > > that Flink will always deploy this set of tasks together no
> > > > > >>> > matter what
> > > > > >>> > > comes. If, for example, the runtime can derive by some
> means
> > > > > the
> > > > > >>> > resource
> > > > > >>> > > requirements for each task based on the requirements for
> the
> > > > > >>> > SSG, this
> > > > > >>> > > could be possible. One easy strategy would be to give every
> > > > > task
> > > > > >>> > the same
> > > > > >>> > > resources as the whole slot sharing group. Another one
> could
> > > > be
> > > > > >>> > > distributing the resources equally among the tasks. This
> does
> > > > > >>> > not even have
> > > > > >>> > > to be implemented but we would give ourselves the freedom
> to
> > > > > change
> > > > > >>> > > scheduling if need should arise.
> > > > > >>> > >
> > > > > >>> > > Cheers,
> > > > > >>> > > Till
> > > > > >>> > >
> > > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > > karmagyz@gmail.com
> > > > > >>> > <ma...@gmail.com>> wrote:
> > > > > >>> > >
> > > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > > >>> > >>
> > > > > >>> > >> I second Xintong's comment that SSG-based runtime
> interface
> > > > > >>> > will give
> > > > > >>> > >> us the flexibility to achieve op/task-based approach.
> That's
> > > > > one of
> > > > > >>> > >> the most important reasons for our design choice.
> > > > > >>> > >>
> > > > > >>> > >> Some cents regarding the default operator resource:
> > > > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > > > >>> > >> ** For light-weight operators, the accumulative
> > > > > >>> > configuration error
> > > > > >>> > >> will not be significant. Then, the resource of a task used
> > > > is
> > > > > >>> > >> proportional to the number of operators it contains.
> > > > > >>> > >> ** For heavy operators like join and window or operators
> > > > > >>> > using the
> > > > > >>> > >> external resources, user will turn to the fine-grained
> > > > > resource
> > > > > >>> > >> configuration.
> > > > > >>> > >> - It can increase the stability for the standalone cluster
> > > > > >>> > where task
> > > > > >>> > >> executors registered are heterogeneous(with different
> > > > default
> > > > > slot
> > > > > >>> > >> resources).
> > > > > >>> > >> - It might not be good for SQL users. The operators that
> SQL
> > > > > >>> > will be
> > > > > >>> > >> transferred to is a black box to the user. We also do not
> > > > > guarantee
> > > > > >>> > >> the cross-version of consistency of the transformation so
> > > > far.
> > > > > >>> > >>
> > > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > > fine-grained
> > > > > >>> > >> resource management is end-to-end ready.
> > > > > >>> > >>
> > > > > >>> > >> Best,
> > > > > >>> > >> Yangze Guo
> > > > > >>> > >>
> > > > > >>> > >>
> > > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > >>> > >> wrote:
> > > > > >>> > >>> Thanks for the feedback, Till.
> > > > > >>> > >>>
> > > > > >>> > >>> ## I feel that what you proposed (operator-based +
> default
> > > > > >>> > value) might
> > > > > >>> > >> be
> > > > > >>> > >>> subsumed by the SSG-based approach.
> > > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4
> cases,
> > > > > >>> > categorized by
> > > > > >>> > >>> whether the resource requirements are known to the users.
> > > > > >>> > >>>
> > > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > > >>> > reason to put
> > > > > >>> > >>> multiple operators whose individual resource
> > > > requirements
> > > > > >>> > are already
> > > > > >>> > >> known
> > > > > >>> > >>> into the same group in fine-grained resource
> > > > management.
> > > > > >>> > And if op_1
> > > > > >>> > >> and
> > > > > >>> > >>> op_2 are in different groups, there should be no
> > > > problem
> > > > > >>> > switching
> > > > > >>> > >> data
> > > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > > >>> > equivalent to
> > > > > >>> > >> specifying
> > > > > >>> > >>> operator resource requirements in your proposal.
> > > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > > that
> > > > > >>> > op_2 is in a
> > > > > >>> > >>> SSG whose resource is not specified thus would have the
> > > > > >>> > default slot
> > > > > >>> > >>> resource. This is equivalent to having default operator
> > > > > >>> > resources in
> > > > > >>> > >> your
> > > > > >>> > >>> proposal.
> > > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > > op_2
> > > > > >>> > to the same
> > > > > >>> > >> SSG
> > > > > >>> > >>> or separate SSGs.
> > > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > > >>> > equivalent to
> > > > > >>> > >> the
> > > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > > op_2
> > > > > >>> > share a
> > > > > >>> > >> default
> > > > > >>> > >>> size slot no matter which data exchange mode is
> > > > used.
> > > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > > of
> > > > > >>> > them will
> > > > > >>> > >> use
> > > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > > them
> > > > > >>> > with
> > > > > >>> > >> default
> > > > > >>> > >>> operator resources in your proposal.
> > > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > > > is
> > > > > >>> > known.*
> > > > > >>> > >>> - It is possible that the user learns the total /
> > > > max
> > > > > >>> > resource
> > > > > >>> > >>> requirement from executing and monitoring the job,
> > > > > >>> > while not
> > > > > >>> > >>> being aware of
> > > > > >>> > >>> individual operator requirements.
> > > > > >>> > >>> - I believe this is the case your proposal does not
> > > > > >>> > cover. And TBH,
> > > > > >>> > >>> this is probably how most users learn the resource
> > > > > >>> > requirements,
> > > > > >>> > >>> according
> > > > > >>> > >>> to my experiences.
> > > > > >>> > >>> - In this case, the user might need to specify
> > > > > >>> > different resources
> > > > > >>> > >> if
> > > > > >>> > >>> he wants to switch the execution mode, which should
> > > > > not
> > > > > >>> > be worse
> > > > > >>> > >> than not
> > > > > >>> > >>> being able to use fine-grained resource management.
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > > >>> > >>> We may provide multiple options for deciding resources
> for
> > > > > >>> > SSGs whose
> > > > > >>> > >>> requirement is not specified, if needed.
> > > > > >>> > >>>
> > > > > >>> > >>> - Default slot resource (current design)
> > > > > >>> > >>> - Default operator resource times number of operators
> > > > > >>> > (equivalent to
> > > > > >>> > >>> your proposal)
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> ## Exposing internal runtime strategies
> > > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > > >>> > requirements might be
> > > > > >>> > >>> affected if how SSGs are internally handled changes in
> > > > > future.
> > > > > >>> > >> Practically,
> > > > > >>> > >>> I do not concretely see at the moment what kind of
> changes
> > > > we
> > > > > >>> > may want in
> > > > > >>> > >>> future that might conflict with this FLIP proposal, as
> the
> > > > > >>> > question of
> > > > > >>> > >>> switching data exchange mode answered above. I'd suggest
> to
> > > > > >>> > not give up
> > > > > >>> > >> the
> > > > > >>> > >>> user friendliness we may gain now for the future problems
> > > > > that
> > > > > >>> > may or may
> > > > > >>> > >>> not exist.
> > > > > >>> > >>>
> > > > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > > > >>> > achieve the
> > > > > >>> > >>> equivalent behavior as the operator-based approach, if we
> > > > > set each
> > > > > >>> > >> operator
> > > > > >>> > >>> (or task) to a separate SSG. We can even provide a
> shortcut
> > > > > >>> > option to
> > > > > >>> > >>> automatically do that for users, if needed.
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> Thank you~
> > > > > >>> > >>>
> > > > > >>> > >>> Xintong Song
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>>
> > > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > > > >>> > >> wrote:
> > > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > > >>> > >>>>
> > > > > >>> > >>>> I agree that being able to define the resource
> > > > requirements
> > > > > for a
> > > > > >>> > >> group of
> > > > > >>> > >>>> operators is more user friendly. However, my concern is
> > > > that
> > > > > >>> > we are
> > > > > >>> > >>>> exposing thereby internal runtime strategies which might
> > > > > >>> > limit our
> > > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > > semantics
> > > > > of
> > > > > >>> > >> configuring
> > > > > >>> > >>>> resource requirements for SSGs could break if switching
> > > > from
> > > > > >>> > streaming
> > > > > >>> > >> to
> > > > > >>> > >>>> batch execution. If one defines the resource
> requirements
> > > > > for
> > > > > >>> > op_1 ->
> > > > > >>> > >> op_2
> > > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > > >>> > execution, then
> > > > > >>> > >> how do
> > > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > > > >>> > executed with a
> > > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > > Consequently,
> > > > > >>> > I am
> > > > > >>> > >> still
> > > > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > > > >>> > requirements per
> > > > > >>> > >>>> operator.
> > > > > >>> > >>>>
> > > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > > easier:
> > > > > >>> > If the
> > > > > >>> > >> user
> > > > > >>> > >>>> wants to use fine-grained resource requirements, then
> she
> > > > > >>> > needs to
> > > > > >>> > >> specify
> > > > > >>> > >>>> the default size which is used for operators which have
> no
> > > > > >>> > explicit
> > > > > >>> > >>>> resource annotation. If this holds true, then every
> > > > operator
> > > > > >>> > would
> > > > > >>> > >> have a
> > > > > >>> > >>>> resource requirement and the system can try to execute
> the
> > > > > >>> > operators
> > > > > >>> > >> in the
> > > > > >>> > >>>> best possible manner w/o being constrained by how the
> user
> > > > > >>> > set the SSG
> > > > > >>> > >>>> requirements.
> > > > > >>> > >>>>
> > > > > >>> > >>>> Cheers,
> > > > > >>> > >>>> Till
> > > > > >>> > >>>>
> > > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > > >>> > >>>> wrote:
> > > > > >>> > >>>>
> > > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Actually, your proposal has also come to my mind at
> some
> > > > > >>> > point. And I
> > > > > >>> > >>>> have
> > > > > >>> > >>>>> some concerns about it.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> 1. It does not give users the same control as the
> > > > SSG-based
> > > > > >>> > approach.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> While both approaches do not require specifying for
> each
> > > > > >>> > operator,
> > > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > > operators
> > > > > >>> > >> together
> > > > > >>> > >>>> use
> > > > > >>> > >>>>> this much resource" while the operator-based approach
> > > > > doesn't.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2,
> ...,
> > > > > >>> > o_m), and
> > > > > >>> > >> at
> > > > > >>> > >>>> some
> > > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which
> significantly
> > > > > >>> > reduces the
> > > > > >>> > >> data
> > > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups
> SSG_1
> > > > > >>> > (o_1, ...,
> > > > > >>> > >> o_n)
> > > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > > > higher
> > > > > >>> > >> parallelisms
> > > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2
> won't
> > > > > >>> > lead to too
> > > > > >>> > >> much
> > > > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > > > different
> > > > > >>> > >> resources,
> > > > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > > > >>> > resources for
> > > > > >>> > >> the
> > > > > >>> > >>>> two
> > > > > >>> > >>>>> groups. However, with the operator-based approach, the
> > > > > user will
> > > > > >>> > >> have to
> > > > > >>> > >>>>> specify resources for each operator in one of the two
> > > > > >>> > groups, and
> > > > > >>> > >> tune
> > > > > >>> > >>>> the
> > > > > >>> > >>>>> default slot resource via configurations to fit the
> other
> > > > > group.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> 2. It increases the chance of breaking operator chains.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Setting chainnable operators into different slot
> sharing
> > > > > >>> > groups will
> > > > > >>> > >>>>> prevent them from being chained. In the current
> > > > > implementation,
> > > > > >>> > >>>> downstream
> > > > > >>> > >>>>> operators, if SSG not explicitly specified, will be set
> > > > to
> > > > > >>> > the same
> > > > > >>> > >> group
> > > > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > > > upstream
> > > > > >>> > >> operators
> > > > > >>> > >>>> in
> > > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > > chains.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 ->
> o_3,
> > > > > >>> > deciding
> > > > > >>> > >> SSGs
> > > > > >>> > >>>>> based on whether resource is specified we will easily
> get
> > > > > >>> > groups like
> > > > > >>> > >>>> (o_1,
> > > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > > > >>> > chained. This
> > > > > >>> > >> is
> > > > > >>> > >>>> also
> > > > > >>> > >>>>> possible for the SSG-based approach, but I believe the
> > > > > >>> > chance is much
> > > > > >>> > >>>>> smaller because there's no strong reason for users to
> > > > > >>> > specify the
> > > > > >>> > >> groups
> > > > > >>> > >>>>> with alternate operators like that. We are more likely
> to
> > > > > >>> > get groups
> > > > > >>> > >> like
> > > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > > > between
> > > > > >>> > o_2 and
> > > > > >>> > >> o_3.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> 3. It complicates the system by having two different
> > > > > >>> > mechanisms for
> > > > > >>> > >>>> sharing
> > > > > >>> > >>>>> managed memory in a slot.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > > memory
> > > > > >>> > sharing
> > > > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > > > >>> > according to the
> > > > > >>> > >>>>> consumer type, then further distributed across
> operators
> > > > > of that
> > > > > >>> > >> consumer
> > > > > >>> > >>>>> type.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> - With the operator-based approach, managed memory size
> > > > > >>> > specified
> > > > > >>> > >> for an
> > > > > >>> > >>>>> operator should account for all the consumer types of
> > > > that
> > > > > >>> > operator.
> > > > > >>> > >> That
> > > > > >>> > >>>>> means the managed memory is first distributed across
> > > > > >>> > operators, then
> > > > > >>> > >>>>> distributed to different consumer types of each
> operator.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Unfortunately, the different order of the two
> calculation
> > > > > >>> > steps can
> > > > > >>> > >> lead
> > > > > >>> > >>>> to
> > > > > >>> > >>>>> different results. To be specific, the semantic of the
> > > > > >>> > configuration
> > > > > >>> > >>>> option
> > > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within an
> > > > > >>> > operator).
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> To sum up things:
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> While (3) might be a bit more implementation related, I
> > > > > >>> > think (1)
> > > > > >>> > >> and (2)
> > > > > >>> > >>>>> somehow suggest that, the price for the proposed
> approach
> > > > > to
> > > > > >>> > avoid
> > > > > >>> > >>>>> specifying resource for every operator is that it's not
> > > > as
> > > > > >>> > >> independent
> > > > > >>> > >>>> from
> > > > > >>> > >>>>> operator chaining and slot sharing as the
> operator-based
> > > > > >>> > approach
> > > > > >>> > >>>> discussed
> > > > > >>> > >>>>> in the FLIP.
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Thank you~
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> Xintong Song
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>>
> > > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > > > >>> > >> wrote:
> > > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > > > >>> > written. And
> > > > > >>> > >> the
> > > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > > >>> > configuration to
> > > > > >>> > >>>> users
> > > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > > >>> > >>>>>> So good job here!
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> About how to let users specify the resource profiles.
> > > > If I
> > > > > >>> > can sum
> > > > > >>> > >> the
> > > > > >>> > >>>>> FLIP
> > > > > >>> > >>>>>> and previous discussion up in my own words, the
> problem
> > > > > is the
> > > > > >>> > >>>> following:
> > > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > > cleanest
> > > > > >>> > approach,
> > > > > >>> > >>>>> because
> > > > > >>> > >>>>>>> it avoids mixing operator configuration (resource)
> and
> > > > > >>> > >> scheduling. No
> > > > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > > > sharing,
> > > > > >>> > >>>> switching
> > > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource
> profiles
> > > > > >>> > stay the
> > > > > >>> > >>>> same.
> > > > > >>> > >>>>>>> But it would require that a user specifies resources
> on
> > > > > all
> > > > > >>> > >>>> operators,
> > > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > > suggests
> > > > > going
> > > > > >>> > >> with
> > > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> I think both thoughts are important, so can we find a
> > > > > solution
> > > > > >>> > >> where
> > > > > >>> > >>>> the
> > > > > >>> > >>>>>> Resource Profiles are specified on an Operator, but we
> > > > > >>> > still avoid
> > > > > >>> > >> that
> > > > > >>> > >>>>> we
> > > > > >>> > >>>>>> need to specify a resource profile on every operator?
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> What do you think about something like the following:
> > > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > > level.
> > > > > >>> > >>>>>> - Not all operators need profiles
> > > > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > > > in
> > > > > the
> > > > > >>> > >> default
> > > > > >>> > >>>> slot
> > > > > >>> > >>>>>> sharing group with a default profile (will get a
> default
> > > > > slot).
> > > > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > > > >>> > another slot
> > > > > >>> > >>>>> sharing
> > > > > >>> > >>>>>> group (the resource-specified-group).
> > > > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > > > >>> > operators
> > > > > >>> > >> like
> > > > > >>> > >>>>> they
> > > > > >>> > >>>>>> do now, with the exception that you cannot mix
> operators
> > > > > >>> > that have
> > > > > >>> > >> a
> > > > > >>> > >>>>>> resource profile and operators that have no resource
> > > > > profile.
> > > > > >>> > >>>>>> - The default case where no operator has a resource
> > > > > >>> > profile is
> > > > > >>> > >> just a
> > > > > >>> > >>>>>> special case of this model
> > > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > > operator,
> > > > > >>> > like it
> > > > > >>> > >> does
> > > > > >>> > >>>>> now,
> > > > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks
> that
> > > > > it
> > > > > >>> > >> schedules
> > > > > >>> > >>>>>> together.
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> There is another question about reactive scaling
> raised
> > > > > in the
> > > > > >>> > >> FLIP. I
> > > > > >>> > >>>>> need
> > > > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > > > tricky
> > > > > >>> > once we
> > > > > >>> > >>>> have
> > > > > >>> > >>>>>> slots of different sizes.
> > > > > >>> > >>>>>> It is not clear then which of the different slot
> > > > requests
> > > > > the
> > > > > >>> > >>>>>> ResourceManager should fulfill when new resources
> (TMs)
> > > > > >>> > show up,
> > > > > >>> > >> or how
> > > > > >>> > >>>>> the
> > > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > > resources
> > > > > >>> > (TMs)
> > > > > >>> > >>>>> disappear
> > > > > >>> > >>>>>> This question is pretty orthogonal, though, to the
> "how
> > > > to
> > > > > >>> > specify
> > > > > >>> > >> the
> > > > > >>> > >>>>>> resources".
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> Best,
> > > > > >>> > >>>>>> Stephan
> > > > > >>> > >>>>>>
> > > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > > > >>> > >>>>> wrote:
> > > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > > discussion,
> > > > > >>> > Yangze.
> > > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> @Till,
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > > > that
> > > > > SSGs
> > > > > >>> > >> need to
> > > > > >>> > >>>>> be
> > > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > > otherwise
> > > > > each
> > > > > >>> > >>>> operator
> > > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > > However,
> > > > > I
> > > > > >>> > cannot
> > > > > >>> > >>>> think
> > > > > >>> > >>>>>> of
> > > > > >>> > >>>>>>> a strong reason for not supporting SSGs in
> fine-grained
> > > > > >>> > resource
> > > > > >>> > >>>>>>> management.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>>> Interestingly, if all operators have their resources
> > > > > properly
> > > > > >>> > >>>>>> specified,
> > > > > >>> > >>>>>>>> then slot sharing is no longer needed because Flink
> > > > > could
> > > > > >>> > >> slice off
> > > > > >>> > >>>>> the
> > > > > >>> > >>>>>>>> appropriately sized slots for every Task
> individually.
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > > > >>> > operator op_1
> > > > > >>> > >> and
> > > > > >>> > >>>>> op_2
> > > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would then
> > > > say
> > > > > that
> > > > > >>> > >> the
> > > > > >>> > >>>> slot
> > > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > > have
> > > > > a
> > > > > >>> > >> cluster
> > > > > >>> > >>>>> with
> > > > > >>> > >>>>>> 2
> > > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > > cannot run
> > > > > >>> > >> this
> > > > > >>> > >>>>> job.
> > > > > >>> > >>>>>> If
> > > > > >>> > >>>>>>>> the resources were specified on an operator level,
> > > > then
> > > > > the
> > > > > >>> > >> system
> > > > > >>> > >>>>>> could
> > > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > > op_2
> > > > > to
> > > > > >>> > >> TM_2.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Couldn't agree more that if all operators'
> requirements
> > > > > are
> > > > > >>> > >> properly
> > > > > >>> > >>>>>>> specified, slot sharing should be no longer needed. I
> > > > > >>> > think this
> > > > > >>> > >>>>> exactly
> > > > > >>> > >>>>>>> disproves the example. If we already know op_1 and
> op_2
> > > > > each
> > > > > >>> > >> needs
> > > > > >>> > >>>> 100
> > > > > >>> > >>>>> MB
> > > > > >>> > >>>>>>> of memory, why would we put them in the same group?
> If
> > > > > >>> > they are
> > > > > >>> > >> in
> > > > > >>> > >>>>>> separate
> > > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > > freely
> > > > > >>> > deploy
> > > > > >>> > >> them
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Moreover, the precondition for not needing slot
> sharing
> > > > > is
> > > > > >>> > having
> > > > > >>> > >>>>>> resource
> > > > > >>> > >>>>>>> requirements properly specified for all operators.
> This
> > > > > is not
> > > > > >>> > >> always
> > > > > >>> > >>>>>>> possible, and usually requires tremendous efforts.
> One
> > > > > of the
> > > > > >>> > >>>> benefits
> > > > > >>> > >>>>>> for
> > > > > >>> > >>>>>>> SSG-based requirements is that it allows the user to
> > > > > freely
> > > > > >>> > >> decide
> > > > > >>> > >>>> the
> > > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > > > >>> > consider SSG
> > > > > >>> > >> in
> > > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > > operators
> > > > > >>> > that the
> > > > > >>> > >>>> user
> > > > > >>> > >>>>>>> would like to specify the total resource for. There
> can
> > > > > be
> > > > > >>> > only
> > > > > >>> > >> one
> > > > > >>> > >>>>> group
> > > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > > > major
> > > > > >>> > parts,
> > > > > >>> > >> or as
> > > > > >>> > >>>>>> many
> > > > > >>> > >>>>>>> groups as the number of tasks/operators, depending on
> > > > how
> > > > > >>> > >>>> fine-grained
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>> user is able to specify the resources.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But
> given
> > > > > >>> > that all
> > > > > >>> > >> the
> > > > > >>> > >>>>>>> current scheduler implementations already support
> > > > SSGs, I
> > > > > >>> > tend to
> > > > > >>> > >>>> think
> > > > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > > > >>> > usability and
> > > > > >>> > >>>>>>> flexibility.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> @Chesnay
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > > > waste
> > > > > >>> > >> resources
> > > > > >>> > >>>> if
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > > different?
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>> Yes. It's a trade-off between usability and resource
> > > > > >>> > >> utilization. To
> > > > > >>> > >>>>>> avoid
> > > > > >>> > >>>>>>> such wasting, the user can define more groups, so
> that
> > > > > >>> > each group
> > > > > >>> > >>>>>> contains
> > > > > >>> > >>>>>>> less operators and the chance of having operators
> with
> > > > > >>> > different
> > > > > >>> > >>>>>>> parallelism will be reduced. The price is to have
> more
> > > > > >>> > resource
> > > > > >>> > >>>>>>> requirements to specify.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> It also seems like quite a hassle for users having to
> > > > > >>> > >> recalculate the
> > > > > >>> > >>>>>>>> resource requirements if they change the slot
> sharing.
> > > > > >>> > >>>>>>>> I'd think that it's not really workable for users
> that
> > > > > create
> > > > > >>> > >> a set
> > > > > >>> > >>>>> of
> > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > > their
> > > > > >>> > >>>>> applications;
> > > > > >>> > >>>>>>>> managing the resources requirements in such a
> setting
> > > > > >>> > would be
> > > > > >>> > >> a
> > > > > >>> > >>>>>>>> nightmare, and in the end would require
> operator-level
> > > > > >>> > >> requirements
> > > > > >>> > >>>>> any
> > > > > >>> > >>>>>>>> way.
> > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > > increases
> > > > > >>> > >>>>> usability.
> > > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > > there's no
> > > > > >>> > >> reason to
> > > > > >>> > >>>>> put
> > > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > > >>> > requirements are
> > > > > >>> > >>>>> already
> > > > > >>> > >>>>>>> known
> > > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > > management.
> > > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > > multiple
> > > > > >>> > >>>>> applications,
> > > > > >>> > >>>>>>> it does not guarantee the same resource
> > > > requirements.
> > > > > >>> > During
> > > > > >>> > >> our
> > > > > >>> > >>>>> years
> > > > > >>> > >>>>>>> of
> > > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > > requirements
> > > > > >>> > >> specified for
> > > > > >>> > >>>>>>> Blink's
> > > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > > >>> > (including
> > > > > >>> > >> our
> > > > > >>> > >>>>>>> specialists
> > > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > > > >>> > >> experienced as
> > > > > >>> > >>>>> to
> > > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > > >>> > >> requirements.
> > > > > >>> > >>>> Most
> > > > > >>> > >>>>>>> people
> > > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > > delay, cpu
> > > > > >>> > >> load,
> > > > > >>> > >>>>>> memory
> > > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > > specification.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> To sum up:
> > > > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > > > >>> > requirements
> > > > > >>> > >> for
> > > > > >>> > >>>>>> every
> > > > > >>> > >>>>>>> operator, that's definitely a good thing and we would
> > > > not
> > > > > >>> > need to
> > > > > >>> > >>>> rely
> > > > > >>> > >>>>> on
> > > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > > > > >>> > >> fine-grained
> > > > > >>> > >>>>>> resource
> > > > > >>> > >>>>>>> management to work. For those users who are capable
> and
> > > > > do not
> > > > > >>> > >> like
> > > > > >>> > >>>>>> having
> > > > > >>> > >>>>>>> to set each operator to a separate SSG, I would be ok
> > > > to
> > > > > have
> > > > > >>> > >> both
> > > > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and
> to
> > > > > only
> > > > > >>> > >> fallback
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>> SSG requirements when the operator requirements are
> not
> > > > > >>> > >> specified.
> > > > > >>> > >>>>>> However,
> > > > > >>> > >>>>>>> as the first step, I think we should prioritise the
> use
> > > > > cases
> > > > > >>> > >> where
> > > > > >>> > >>>>> users
> > > > > >>> > >>>>>>> are not that experienced.
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Thank you~
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> Xintong Song
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > > > >>> > >> chesnay@apache.org <ma...@apache.org>>
> > > > > >>> > >>>>>>> wrote:
> > > > > >>> > >>>>>>>
> > > > > >>> > >>>>>>>> Will declaring them on slot sharing groups not also
> > > > > waste
> > > > > >>> > >> resources
> > > > > >>> > >>>>> if
> > > > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > > > different?
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>>> It also seems like quite a hassle for users having
> to
> > > > > >>> > >> recalculate
> > > > > >>> > >>>> the
> > > > > >>> > >>>>>>>> resource requirements if they change the slot
> sharing.
> > > > > >>> > >>>>>>>> I'd think that it's not really workable for users
> that
> > > > > create
> > > > > >>> > >> a set
> > > > > >>> > >>>>> of
> > > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > > their
> > > > > >>> > >>>>> applications;
> > > > > >>> > >>>>>>>> managing the resources requirements in such a
> setting
> > > > > >>> > would be
> > > > > >>> > >> a
> > > > > >>> > >>>>>>>> nightmare, and in the end would require
> operator-level
> > > > > >>> > >> requirements
> > > > > >>> > >>>>> any
> > > > > >>> > >>>>>>>> way.
> > > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > > increases
> > > > > >>> > >>>>> usability.
> > > > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > > > work
> > > > > >>> > on SSGs
> > > > > >>> > >>>> it's
> > > > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > > > approaches,
> > > > > >>> > >> which
> > > > > >>> > >>>>>>>> would not be the case if, for the runtime, they are
> > > > > always
> > > > > >>> > >> defined
> > > > > >>> > >>>> on
> > > > > >>> > >>>>>> an
> > > > > >>> > >>>>>>>> operator-level.
> > > > > >>> > >>>>>>>>
> > > > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > > > discussion
> > > > > >>> > >>>> Yangze.
> > > > > >>> > >>>>>>>>> I like that defining resource requirements on a
> slot
> > > > > sharing
> > > > > >>> > >>>> group
> > > > > >>> > >>>>>>> makes
> > > > > >>> > >>>>>>>>> the overall setup easier and improves usability of
> > > > > resource
> > > > > >>> > >>>>>>> requirements.
> > > > > >>> > >>>>>>>>> What I do not like about it is that it changes slot
> > > > > sharing
> > > > > >>> > >>>> groups
> > > > > >>> > >>>>>> from
> > > > > >>> > >>>>>>>>> being a scheduling hint to something which needs to
> > > > be
> > > > > >>> > >> supported
> > > > > >>> > >>>> in
> > > > > >>> > >>>>>>> order
> > > > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > > > far,
> > > > > the
> > > > > >>> > >> idea
> > > > > >>> > >>>> of
> > > > > >>> > >>>>>> slot
> > > > > >>> > >>>>>>>>> sharing groups was that it tells the system that a
> > > > set
> > > > > of
> > > > > >>> > >>>> operators
> > > > > >>> > >>>>>> can
> > > > > >>> > >>>>>>>> be
> > > > > >>> > >>>>>>>>> deployed in the same slot. But the system still had
> > > > the
> > > > > >>> > >> freedom
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>> say
> > > > > >>> > >>>>>>>> that
> > > > > >>> > >>>>>>>>> it would rather place these tasks in different
> slots
> > > > > if it
> > > > > >>> > >>>> wanted.
> > > > > >>> > >>>>> If
> > > > > >>> > >>>>>>> we
> > > > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > > > sharing
> > > > > >>> > >> group,
> > > > > >>> > >>>> then
> > > > > >>> > >>>>>> the
> > > > > >>> > >>>>>>>>> only option for a scheduler which does not support
> > > > slot
> > > > > >>> > >> sharing
> > > > > >>> > >>>>>> groups
> > > > > >>> > >>>>>>> is
> > > > > >>> > >>>>>>>>> to say that every operator in this slot sharing
> group
> > > > > >>> > needs a
> > > > > >>> > >>>> slot
> > > > > >>> > >>>>>> with
> > > > > >>> > >>>>>>>> the
> > > > > >>> > >>>>>>>>> same resources as the whole group.
> > > > > >>> > >>>>>>>>>
> > > > > >>> > >>>>>>>>> So for example, if we have a job consisting of two
> > > > > operator
> > > > > >>> > >> op_1
> > > > > >>> > >>>>> and
> > > > > >>> > >>>>>>> op_2
> > > > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would then
> > > > > say that
> > > > > >>> > >> the
> > > > > >>> > >>>>> slot
> > > > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > > > have a
> > > > > >>> > >> cluster
> > > > > >>> > >>>>>> with
> > > > > >>> > >>>>>>> 2
> > > > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > > cannot run
> > > > > >>> > >> this
> > > > > >>> > >>>>>> job.
> > > > > >>> > >>>>>>> If
> > > > > >>> > >>>>>>>>> the resources were specified on an operator level,
> > > > > then the
> > > > > >>> > >>>> system
> > > > > >>> > >>>>>>> could
> > > > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > > > op_2 to
> > > > > >>> > >> TM_2.
> > > > > >>> > >>>>>>>>> Originally, one of the primary goals of slot
> sharing
> > > > > groups
> > > > > >>> > >> was
> > > > > >>> > >>>> to
> > > > > >>> > >>>>>> make
> > > > > >>> > >>>>>>>> it
> > > > > >>> > >>>>>>>>> easier for the user to reason about how many slots
> a
> > > > > job
> > > > > >>> > >> needs
> > > > > >>> > >>>>>>>> independent
> > > > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > > > Interestingly,
> > > > > >>> > >> if
> > > > > >>> > >>>> all
> > > > > >>> > >>>>>>>>> operators have their resources properly specified,
> > > > > then slot
> > > > > >>> > >>>>> sharing
> > > > > >>> > >>>>>> is
> > > > > >>> > >>>>>>>> no
> > > > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > > > >>> > appropriately
> > > > > >>> > >>>> sized
> > > > > >>> > >>>>>>> slots
> > > > > >>> > >>>>>>>>> for every Task individually. What matters is
> whether
> > > > > the
> > > > > >>> > >> whole
> > > > > >>> > >>>>>> cluster
> > > > > >>> > >>>>>>>> has
> > > > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > > > >>> > >>>>>>>>>
> > > > > >>> > >>>>>>>>> Cheers,
> > > > > >>> > >>>>>>>>> Till
> > > > > >>> > >>>>>>>>>
> > > > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > > > >>> > >> karmagyz@gmail.com <ma...@gmail.com>>
> > > > > >>> > >>>>>> wrote:
> > > > > >>> > >>>>>>>>>> Hi, there,
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > > > "FLIP-156:
> > > > > >>> > >> Runtime
> > > > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > > > Requirements"[1],
> > > > > >>> > >> where we
> > > > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > > > interfaces
> > > > > >>> > >> for
> > > > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> In this FLIP:
> > > > > >>> > >>>>>>>>>> - Expound the user story of fine-grained resource
> > > > > >>> > >> management.
> > > > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > > > SSG-based
> > > > > >>> > >> resource
> > > > > >>> > >>>>>>>>>> requirements.
> > > > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three potential
> > > > > >>> > >> granularities
> > > > > >>> > >>>>> for
> > > > > >>> > >>>>>>>>>> specifying the resource requirements (op, task and
> > > > > slot
> > > > > >>> > >> sharing
> > > > > >>> > >>>>>> group)
> > > > > >>> > >>>>>>>>>> and explain why we choose the slot sharing group.
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki document
> > > > > [1].
> > > > > >>> > >> Looking
> > > > > >>> > >>>>>>>>>> forward to your feedback.
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>>> [1]
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>
> > > > > >>> >
> > > > >
> > > >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > >>> > <
> > > > >
> > > >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > >
> > > > > >>> > >>>>>>>>>> Best,
> > > > > >>> > >>>>>>>>>> Yangze Guo
> > > > > >>> > >>>>>>>>>>
> > > > > >>> > >>>>>>>>
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Kezhu Wang <ke...@gmail.com>.

Hi, Yangze and Xintong, thank you for replies.

I indeed make assumptions, I list them here in order:
1. There is only task/LogicalSlot level resource specification in runtime.
And it comes from api side and is respected in runtime.
2. Current operator level resource specification in client side is
respected and used to aggregate
   task resource specification for runtime usage.
3. It is possible that other find-grained group level resource specfiying,
which could obey chaing, emerge in future.

My proposal is basing on first, and try to make room for the last two in
SSG resource specfiying.

@Xintong

> I think this is exactly what we are trying to avoid, requiring the
scheduler to enforce slot sharing.

I saw the dicussion to keep slot sharing as an hint, but in reality, will
SSG jobs expect to fail or
run slowly if scheduler does not respect it ? A slot with 20GB memory is
different from two 1GB
default sized slots. So, we actually depends on scheduler
version/implementation/de-fact if we
claim it is an hint.

@Xintong
> So, I wonder whether we could interpret SSG resource specifying as an
"add"
> but not an "set" on resource requirement ?

> IIUC, this is the core idea behind your proposal.

You are right, all other changes are serving for this. It is also the
semantics divergence between
the two: my suggestion treat SSG as an hint and extra resource specfiying
place while FLIP-156 tends
to treat SSG as restriction and authoritative resource specfiying. With
this change, I think FLIP-156
is just a special case by forcing only SSG and no other specifications.
That is, if there is no other
resource specifications, "set" equals to "add" to zero. So if this is the
case after FLIP-156, then
there is still room for this direction, if indeed required.

@Yangze, @Xintong
> never really used

Do you mean code-path or production environment ? If it is code-path, could
you please point out where
the story breaks ?

From the dicussion and history, could I consider FLIP-156 is an redirection
more than inheritance/enhancement
of current halfly-cooked/ancient implmentation ?

Thank you, Yangze and Xintong.


On February 3, 2021 at 14:31:28, Xintong Song (tonysong820@gmail.com) wrote:

Thanks for your feedback, Kezhu.

I think Flink *runtime* already has an ideal granularity for resource
> management 'task'. If there is
> a slot shared by multiple tasks, that slot's resource requirement is
simple
> sum of all its logical
> slots. So basically, this is no resource requirement for SlotSharingGroup
> in runtime until now,
> right ?

That is a halfly-cooked implementation, coming from the previous attempts
(years ago) trying to deliver the fine-grained resource management feature,
and never really put into use.

From the FLIP and dicusssion, I assume that SSG resource specifying will
> override operator level
> resource specifying if both are specified ?
>
Actually, I think we should use the finer-grained resources (i.e. operator
level) if both are specified. And more importantly, that is based on the
assumption that we do need two different levels of interfaces.

So, I wonder whether we could interpret SSG resource specifying as an "add"
> but not an "set" on
> resource requirement ?
>
IIUC, this is the core idea behind your proposal. I think it provides an
interesting idea of how we combine operator level and SSG level resources,
*if
we allow configuring resources at both levels*. However, I'm not sure
whether the configuring resources on the operator level is indeed needed.
Therefore, as a first step, this FLIP proposes to only introduce the
SSG-level interfaces. As listed in the future plan, we would consider
allowing operator level resource configuration later if we do see a need
for it. At that time, we definitely should discuss what to do if resources
are configured at both levels.

* Could SSG express negative resource requirement ?
>
No.

Is there concrete bar for partial resource configured not function ? I
> saw it will fail job submission in Dispatcher.submitJob.
>
With the SSG-based approach, this should no longer be needed. The
constraint was introduced because we can neither properly define what is
the resource of a task chained from an operator with specified resource and
another with unspecified resource, nor for a slot shared by a task with
specified resource and another with unspecified resource. With the
SSG-based approach, we no longer have those problems.

An option(cluster/job level) to force slot sharing in scheduler ? This
> could be useful in case of migration from FLIP-156 to future approach.
>
I think this is exactly what we are trying to avoid, requiring the
scheduler to enforce slot sharing.

An option(cluster) to ignore resource specifying(allow resource specified
> job to run on open box environment) for no production usage ?
>
That's possible. Actually, we are planning to introduce an option for
activating the fine-grained resource management, for development purposes.
We might consider to keep that option after the feature is completed, to
allow disable the feature without having to touch the job codes.

Thank you~

Xintong Song



On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <ke...@gmail.com> wrote:

> Hi all, sorry for join discussion even after voting started.
>
> I want to share my thoughts on this after reading above discussions.
>
> I think Flink *runtime* already has an ideal granularity for resource
> management 'task'. If there is
> a slot shared by multiple tasks, that slot's resource requirement is
simple
> sum of all its logical
> slots. So basically, this is no resource requirement for SlotSharingGroup
> in runtime until now,
> right ?
>
> As in discussion, we already agree upon that: "If all operators have
their
> resources properly
> specified, then slot sharing is no longer needed. "
>
> So seems to me, naturally in mind path, what we would discuss is that:
how
> to bridge impractical
> operator level resource specifying to runtime task level resource
> requirement ? This is actually a
> pure api thing as Chesnay has pointed out.
>
> But FLIP-156 brings another direction on table: how about using SSG for
> both api and runtime
> resource specifying ?
>
> From the FLIP and dicusssion, I assume that SSG resource specifying will
> override operator level
> resource specifying if both are specified ?
>
> So, I wonder whether we could interpret SSG resource specifying as an
"add"
> but not an "set" on
> resource requirement ?
>
> The semantics is that SSG resource specifying adds additional resource to
> shared slot to express
> concerns on possible high thoughput and resource requirement for tasks in
> one physical slot.
>
> The result is that if scheduler indeed respect slot sharing, allocated
slot
> will gain extra resource
> specified for that SSG.
>
> I think one of coding barrier from "add" approach is ResourceSpec.UNKNOWN
> which didn't support
> 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> executor should be aware of
> this.
>
> @Chesnay
> > My main worry is that it if we wire the runtime to work on SSGs it's
> > gonna be difficult to implement more fine-grained approaches, which
> > would not be the case if, for the runtime, they are always defined on
an
> > operator-level.
>
> An "add" operation should be less invasive and enforce low barrier for
> future find-grained
> approaches.
>
> @Stephan
> > - Users can define different slot sharing groups for operators like
> they
> > do now, with the exception that you cannot mix operators that have a
> > resource profile and operators that have no resource profile.
>
> @Till
> > This effectively means that all unspecified operators
> > will implicitly have a zero resource requirement.
> > I am wondering whether this wouldn't lead to a surprising behaviour for
> the
> > user. If the user specifies the resource requirements for a single
> > operator, then he probably will assume that the other operators will
get
> > the default share of resources and not nothing.
>
> I think it is inherent due to fact that we could not defining
> ResourceSpec.ONE, eg. resource
> requirement for exact one default slot, with concrete numbers ? I tend to
> squash out unspecified one
> if there are operators in chaining with explicit resource specifying.
> Otherwise, the protocol tends
> to verbose as say "give me this much resource and a default". I think if
we
> have explict resource
> specifying for partial operators, it is just saying "I don't care other
> operators that much, just
> get them places to run". It is most likely be cases there are stateless
> fliter/map or other less
> resource consuming operators. If there is indeed a problem, I think
clients
> can specify a global
> default(or other level default in future). In job graph generating phase,
> we could take that default
> into account for unspecified operators.
>
> @FLIP-156
> > Expose operator chaining. (Cons fo task level resource specifying)
>
> Is it inherent for all group level resource specifying ? They will either
> break chaining or obey it,
> or event could not work with.
>
> To sum up above, my suggestions are:
>
> In api side:
> * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> unspecified).
> * Operator: ResourceSpec.ZERO(unspecified) as default.
> * Task: sum of requirements from specified operators + global default(if
> there are any unspecified operators)
> * SSG: additional resource to physical slot.
>
> In runtime side:
> * Task: ResourceSpec.Task or ResourceSpec.ZERO
> * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
>
> Physical slot gets sum up resources from logical slots and SSG, if it
gets
> ResourceSpec.ZERO, it is
> just a default sized slot.
>
> In short, turn SSG resource speciying as "add" and drop
> ResourceSpec.UNKNOWN.
>
>
> Questions/Issues:
> * Could SSG express negative resource requirement ?
> * Is there concrete bar for partial resource configured not function ? I
> saw it will fail job submission in Dispatcher.submitJob.
> * An option(cluster/job level) to force slot sharing in scheduler ? This
> could be useful in case of migration from FLIP-156 to future approach.
> * An option(cluster) to ignore resource specifying(allow resource
specified
> job to run on open box environment) for no production usage ?
>
>
>
> On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com) wrote:
>
> Thanks for reply, Till and Xintong!
>
> I update the FLIP, including:
> - Edit the JavaDoc of the proposed
> StreamGraphGenerator#setSlotSharingGroupResource.
> - Add "Future Plan" section, which contains the potential follow-up
> issues and the limitations to be documented when fine-grained resource
> management is exposed to users.
>
> I'll start a vote in another thread.
>
> Best,
> Yangze Guo
>
> On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > Thanks for summarizing the discussion, Yangze. I agree that setting
> > resource requirements per operator is not very user friendly. Moreover,
I
> > couldn't come up with a different proposal which would be as easy to
use
> > and wouldn't expose internal scheduling details. In fact, following
this
> > argument then we shouldn't have exposed the slot sharing groups in the
> > first place.
> >
> > What is important for the user is that we properly document the
> limitations
> > and constraints the fine grained resource specification has. For
example,
> > we should explain how optimizations like chaining are affected by it
and
> > how different execution modes (batch vs. streaming) affect the
execution
> of
> > operators which have specified resources. These things shouldn't become
> > part of the contract of this feature and are more caused by internal
> > implementation details but it will be important to understand these
> things
> > properly in order to use this feature effectively.
> >
> > Hence, +1 for starting the vote for this FLIP.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks for the summary, Yangze.
> > >
> > > The changes and follow-up issues LGTM. Let's wait for responses from
> the
> > > others before starting a vote.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > >
> > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > summarize the current convergence in the discussion. Please let me
> > > > know if I got things wrong or missed something crucial here.
> > > >
> > > > Change of this FLIP:
> > > > - Treat the SSG resource requirements as a hint instead of a
> > > > restriction for the runtime. That's should be explicitly explained
in
> > > > the JavaDocs.
> > > >
> > > > Potential follow-up issues if needed:
> > > > - Provide operator-level resource configuration interface.
> > > > - Provide multiple options for deciding resources for SSGs whose
> > > > requirement is not specified:
> > > > ** Default slot resource.
> > > > ** Default operator resource times number of operators.
> > > >
> > > > If there are no other issues, I'll update the FLIP accordingly and
> > > > start a vote thread. Thanks all for the valuable feedback again.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > >
> > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <tonysong820@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >
> > > > > FGRuntimeInterface.png
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <
> tonysong820@gmail.com>
>
> > > > wrote:
> > > > >>
> > > > >> I think Chesnay's proposal could actually work. IIUC, the
keypoint
> is
> > > > to derive operator requirements from SSG requirements on the API
> side, so
> > > > that the runtime only deals with operator requirements. It's
> debatable
> > > how
> > > > the deriving should be done though. E.g., an alternative could be
to
> > > evenly
> > > > divide the SSG requirement into requirements of operators in the
> group.
> > > > >>
> > > > >>
> > > > >> However, I'm not entirely sure which option is more desired.
> > > > Illustrating my understanding in the following figure, in which on
> the
> > > top
> > > > is Chesnay's proposal and on the bottom is the SSG-based proposal
in
> this
> > > > FLIP.
> > > > >>
> > > > >>
> > > > >>
> > > > >> I think the major difference between the two approaches is where
> > > > deriving operator requirements from SSG requirements happens.
> > > > >>
> > > > >> - Chesnay's proposal simplifies the runtime logic and the
> interface to
> > > > expose, at the price of moving more complexity (i.e. the deriving)
to
> the
> > > > API side. The question is, where do we prefer to keep the
complexity?
> I'm
> > > > slightly leaning towards having a thin API and keep the complexity
in
> > > > runtime if possible.
> > > > >>
> > > > >> - Notice that the dash line arrows represent optional steps that
> are
> > > > needed only for schedulers that do not respect SSGs, which we don't
> have
> > > at
> > > > the moment. If we only look at the solid line arrows, then the
> SSG-based
> > > > approach is much simpler, without needing to derive and aggregate
the
> > > > requirements back and forth. I'm not sure about complicating the
> current
> > > > design only for the potential future needs.
> > > > >>
> > > > >>
> > > > >> Thank you~
> > > > >>
> > > > >> Xintong Song
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> chesnay@apache.org>
> > > > wrote:
> > > > >>>
> > > > >>> You're raising a good point, but I think I can rectify that
with
> a
> > > > minor
> > > > >>> adjustment.
> > > > >>>
> > > > >>> Default requirements are whatever the default requirements are,
> > > setting
> > > > >>> the requirements for one operator has no effect on other
> operators.
> > > > >>>
> > > > >>> With these rules, and some API enhancements, the following
mockup
> > > would
> > > > >>> replicate the SSG-based behavior:
> > > > >>>
> > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > >>> vertices = slotSharingGroup.getVertices()
> > > > >>>
> > > >
> > >
>
vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > >>> }
> > > > >>>
> > > > >>> We could even allow setting requirements on slotsharing-groups
> > > > >>> colocation-groups and internally translate them accordingly.
> > > > >>> I can't help but feel this is a plain API issue.
> > > > >>>
> > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > >>> > If I understand you correctly Chesnay, then you want to
> decouple
> > > the
> > > > >>> > resource requirement specification from the slot sharing
group
> > > > >>> > assignment. Hence, per default all operators would be in the
> same
> > > > slot
> > > > >>> > sharing group. If there is no operator with a resource
> > > specification,
> > > > >>> > then the system would allocate a default slot for it. If
there
> is
> > > at
> > > > >>> > least one operator, then the system would sum up all the
> specified
> > > > >>> > resources and allocate a slot of this size. This effectively
> means
> > > > >>> > that all unspecified operators will implicitly have a zero
> resource
> > > > >>> > requirement. Did I understand your idea correctly?
> > > > >>> >
> > > > >>> > I am wondering whether this wouldn't lead to a surprising
> behaviour
> > > > >>> > for the user. If the user specifies the resource requirements
> for a
> > > > >>> > single operator, then he probably will assume that the other
> > > > operators
> > > > >>> > will get the default share of resources and not nothing.
> > > > >>> >
> > > > >>> > Cheers,
> > > > >>> > Till
> > > > >>> >
> > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > chesnay@apache.org
> > > > >>> > <ma...@apache.org>> wrote:
> > > > >>> >
> > > > >>> > Is there even a functional difference between specifying the
> > > > >>> > requirements for an SSG vs specifying the same requirements
on
> > > a
> > > > >>> > single
> > > > >>> > operator within that group (ideally a colocation group to
avoid
> > > > this
> > > > >>> > whole hint business)?
> > > > >>> >
> > > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > > >>> >
> > > > >>> > Users can take shortcuts to define shared requirements,
> > > > >>> > but refine them further as needed on a per-operator basis,
> > > > >>> > without changing semantics of slotsharing groups
> > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > >>> >
> > > > >>> > (And before anyone argues what happens if slotsharing groups
> > > > >>> > change or
> > > > >>> > whatnot, that's a plain API issue that we could surely solve.
> > > (A
> > > > >>> > plain
> > > > >>> > iteration over slotsharing groups and therein contained
> > > operators
> > > > >>> > would
> > > > >>> > suffice)).
> > > > >>> >
> > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > >>> > > Maybe a different minor idea: Would it be possible to treat
> > > > the SSG
> > > > >>> > > resource requirements as a hint for the runtime similar to
> > > how
> > > > >>> > slot sharing
> > > > >>> > > groups are designed at the moment? Meaning that we don't
give
> > > > >>> > the guarantee
> > > > >>> > > that Flink will always deploy this set of tasks together no
> > > > >>> > matter what
> > > > >>> > > comes. If, for example, the runtime can derive by some
means
> > > > the
> > > > >>> > resource
> > > > >>> > > requirements for each task based on the requirements for
the
> > > > >>> > SSG, this
> > > > >>> > > could be possible. One easy strategy would be to give every
> > > > task
> > > > >>> > the same
> > > > >>> > > resources as the whole slot sharing group. Another one
could
> > > be
> > > > >>> > > distributing the resources equally among the tasks. This
does
> > > > >>> > not even have
> > > > >>> > > to be implemented but we would give ourselves the freedom
to
> > > > change
> > > > >>> > > scheduling if need should arise.
> > > > >>> > >
> > > > >>> > > Cheers,
> > > > >>> > > Till
> > > > >>> > >
> > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > karmagyz@gmail.com
> > > > >>> > <ma...@gmail.com>> wrote:
> > > > >>> > >
> > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > >>> > >>
> > > > >>> > >> I second Xintong's comment that SSG-based runtime
interface
> > > > >>> > will give
> > > > >>> > >> us the flexibility to achieve op/task-based approach.
That's
> > > > one of
> > > > >>> > >> the most important reasons for our design choice.
> > > > >>> > >>
> > > > >>> > >> Some cents regarding the default operator resource:
> > > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > > >>> > >> ** For light-weight operators, the accumulative
> > > > >>> > configuration error
> > > > >>> > >> will not be significant. Then, the resource of a task used
> > > is
> > > > >>> > >> proportional to the number of operators it contains.
> > > > >>> > >> ** For heavy operators like join and window or operators
> > > > >>> > using the
> > > > >>> > >> external resources, user will turn to the fine-grained
> > > > resource
> > > > >>> > >> configuration.
> > > > >>> > >> - It can increase the stability for the standalone cluster
> > > > >>> > where task
> > > > >>> > >> executors registered are heterogeneous(with different
> > > default
> > > > slot
> > > > >>> > >> resources).
> > > > >>> > >> - It might not be good for SQL users. The operators that
SQL
> > > > >>> > will be
> > > > >>> > >> transferred to is a black box to the user. We also do not
> > > > guarantee
> > > > >>> > >> the cross-version of consistency of the transformation so
> > > far.
> > > > >>> > >>
> > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > fine-grained
> > > > >>> > >> resource management is end-to-end ready.
> > > > >>> > >>
> > > > >>> > >> Best,
> > > > >>> > >> Yangze Guo
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > >>> > >> wrote:
> > > > >>> > >>> Thanks for the feedback, Till.
> > > > >>> > >>>
> > > > >>> > >>> ## I feel that what you proposed (operator-based +
default
> > > > >>> > value) might
> > > > >>> > >> be
> > > > >>> > >>> subsumed by the SSG-based approach.
> > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4
cases,
> > > > >>> > categorized by
> > > > >>> > >>> whether the resource requirements are known to the users.
> > > > >>> > >>>
> > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > >>> > reason to put
> > > > >>> > >>> multiple operators whose individual resource
> > > requirements
> > > > >>> > are already
> > > > >>> > >> known
> > > > >>> > >>> into the same group in fine-grained resource
> > > management.
> > > > >>> > And if op_1
> > > > >>> > >> and
> > > > >>> > >>> op_2 are in different groups, there should be no
> > > problem
> > > > >>> > switching
> > > > >>> > >> data
> > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > >>> > equivalent to
> > > > >>> > >> specifying
> > > > >>> > >>> operator resource requirements in your proposal.
> > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > that
> > > > >>> > op_2 is in a
> > > > >>> > >>> SSG whose resource is not specified thus would have the
> > > > >>> > default slot
> > > > >>> > >>> resource. This is equivalent to having default operator
> > > > >>> > resources in
> > > > >>> > >> your
> > > > >>> > >>> proposal.
> > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > op_2
> > > > >>> > to the same
> > > > >>> > >> SSG
> > > > >>> > >>> or separate SSGs.
> > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > >>> > equivalent to
> > > > >>> > >> the
> > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > op_2
> > > > >>> > share a
> > > > >>> > >> default
> > > > >>> > >>> size slot no matter which data exchange mode is
> > > used.
> > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > of
> > > > >>> > them will
> > > > >>> > >> use
> > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > them
> > > > >>> > with
> > > > >>> > >> default
> > > > >>> > >>> operator resources in your proposal.
> > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > > is
> > > > >>> > known.*
> > > > >>> > >>> - It is possible that the user learns the total /
> > > max
> > > > >>> > resource
> > > > >>> > >>> requirement from executing and monitoring the job,
> > > > >>> > while not
> > > > >>> > >>> being aware of
> > > > >>> > >>> individual operator requirements.
> > > > >>> > >>> - I believe this is the case your proposal does not
> > > > >>> > cover. And TBH,
> > > > >>> > >>> this is probably how most users learn the resource
> > > > >>> > requirements,
> > > > >>> > >>> according
> > > > >>> > >>> to my experiences.
> > > > >>> > >>> - In this case, the user might need to specify
> > > > >>> > different resources
> > > > >>> > >> if
> > > > >>> > >>> he wants to switch the execution mode, which should
> > > > not
> > > > >>> > be worse
> > > > >>> > >> than not
> > > > >>> > >>> being able to use fine-grained resource management.
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > >>> > >>> We may provide multiple options for deciding resources
for
> > > > >>> > SSGs whose
> > > > >>> > >>> requirement is not specified, if needed.
> > > > >>> > >>>
> > > > >>> > >>> - Default slot resource (current design)
> > > > >>> > >>> - Default operator resource times number of operators
> > > > >>> > (equivalent to
> > > > >>> > >>> your proposal)
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> ## Exposing internal runtime strategies
> > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > >>> > requirements might be
> > > > >>> > >>> affected if how SSGs are internally handled changes in
> > > > future.
> > > > >>> > >> Practically,
> > > > >>> > >>> I do not concretely see at the moment what kind of
changes
> > > we
> > > > >>> > may want in
> > > > >>> > >>> future that might conflict with this FLIP proposal, as
the
> > > > >>> > question of
> > > > >>> > >>> switching data exchange mode answered above. I'd suggest
to
> > > > >>> > not give up
> > > > >>> > >> the
> > > > >>> > >>> user friendliness we may gain now for the future problems
> > > > that
> > > > >>> > may or may
> > > > >>> > >>> not exist.
> > > > >>> > >>>
> > > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > > >>> > achieve the
> > > > >>> > >>> equivalent behavior as the operator-based approach, if we
> > > > set each
> > > > >>> > >> operator
> > > > >>> > >>> (or task) to a separate SSG. We can even provide a
shortcut
> > > > >>> > option to
> > > > >>> > >>> automatically do that for users, if needed.
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> Thank you~
> > > > >>> > >>>
> > > > >>> > >>> Xintong Song
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > > >>> > >> wrote:
> > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > >>> > >>>>
> > > > >>> > >>>> I agree that being able to define the resource
> > > requirements
> > > > for a
> > > > >>> > >> group of
> > > > >>> > >>>> operators is more user friendly. However, my concern is
> > > that
> > > > >>> > we are
> > > > >>> > >>>> exposing thereby internal runtime strategies which might
> > > > >>> > limit our
> > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > semantics
> > > > of
> > > > >>> > >> configuring
> > > > >>> > >>>> resource requirements for SSGs could break if switching
> > > from
> > > > >>> > streaming
> > > > >>> > >> to
> > > > >>> > >>>> batch execution. If one defines the resource
requirements
> > > > for
> > > > >>> > op_1 ->
> > > > >>> > >> op_2
> > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > >>> > execution, then
> > > > >>> > >> how do
> > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > > >>> > executed with a
> > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > Consequently,
> > > > >>> > I am
> > > > >>> > >> still
> > > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > > >>> > requirements per
> > > > >>> > >>>> operator.
> > > > >>> > >>>>
> > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > easier:
> > > > >>> > If the
> > > > >>> > >> user
> > > > >>> > >>>> wants to use fine-grained resource requirements, then
she
> > > > >>> > needs to
> > > > >>> > >> specify
> > > > >>> > >>>> the default size which is used for operators which have
no
> > > > >>> > explicit
> > > > >>> > >>>> resource annotation. If this holds true, then every
> > > operator
> > > > >>> > would
> > > > >>> > >> have a
> > > > >>> > >>>> resource requirement and the system can try to execute
the
> > > > >>> > operators
> > > > >>> > >> in the
> > > > >>> > >>>> best possible manner w/o being constrained by how the
user
> > > > >>> > set the SSG
> > > > >>> > >>>> requirements.
> > > > >>> > >>>>
> > > > >>> > >>>> Cheers,
> > > > >>> > >>>> Till
> > > > >>> > >>>>
> > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > >>> > >>>> wrote:
> > > > >>> > >>>>
> > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > >>> > >>>>>
> > > > >>> > >>>>> Actually, your proposal has also come to my mind at
some
> > > > >>> > point. And I
> > > > >>> > >>>> have
> > > > >>> > >>>>> some concerns about it.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> 1. It does not give users the same control as the
> > > SSG-based
> > > > >>> > approach.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> While both approaches do not require specifying for
each
> > > > >>> > operator,
> > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > operators
> > > > >>> > >> together
> > > > >>> > >>>> use
> > > > >>> > >>>>> this much resource" while the operator-based approach
> > > > doesn't.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2,
...,
> > > > >>> > o_m), and
> > > > >>> > >> at
> > > > >>> > >>>> some
> > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which
significantly
> > > > >>> > reduces the
> > > > >>> > >> data
> > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups
SSG_1
> > > > >>> > (o_1, ...,
> > > > >>> > >> o_n)
> > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > > higher
> > > > >>> > >> parallelisms
> > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2
won't
> > > > >>> > lead to too
> > > > >>> > >> much
> > > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > > different
> > > > >>> > >> resources,
> > > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > > >>> > resources for
> > > > >>> > >> the
> > > > >>> > >>>> two
> > > > >>> > >>>>> groups. However, with the operator-based approach, the
> > > > user will
> > > > >>> > >> have to
> > > > >>> > >>>>> specify resources for each operator in one of the two
> > > > >>> > groups, and
> > > > >>> > >> tune
> > > > >>> > >>>> the
> > > > >>> > >>>>> default slot resource via configurations to fit the
other
> > > > group.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> 2. It increases the chance of breaking operator chains.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Setting chainnable operators into different slot
sharing
> > > > >>> > groups will
> > > > >>> > >>>>> prevent them from being chained. In the current
> > > > implementation,
> > > > >>> > >>>> downstream
> > > > >>> > >>>>> operators, if SSG not explicitly specified, will be set
> > > to
> > > > >>> > the same
> > > > >>> > >> group
> > > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > > upstream
> > > > >>> > >> operators
> > > > >>> > >>>> in
> > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > chains.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 ->
o_3,
> > > > >>> > deciding
> > > > >>> > >> SSGs
> > > > >>> > >>>>> based on whether resource is specified we will easily
get
> > > > >>> > groups like
> > > > >>> > >>>> (o_1,
> > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > > >>> > chained. This
> > > > >>> > >> is
> > > > >>> > >>>> also
> > > > >>> > >>>>> possible for the SSG-based approach, but I believe the
> > > > >>> > chance is much
> > > > >>> > >>>>> smaller because there's no strong reason for users to
> > > > >>> > specify the
> > > > >>> > >> groups
> > > > >>> > >>>>> with alternate operators like that. We are more likely
to
> > > > >>> > get groups
> > > > >>> > >> like
> > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > > between
> > > > >>> > o_2 and
> > > > >>> > >> o_3.
> > > > >>> > >>>>>
> > > > >>> > >>>>> 3. It complicates the system by having two different
> > > > >>> > mechanisms for
> > > > >>> > >>>> sharing
> > > > >>> > >>>>> managed memory in a slot.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > memory
> > > > >>> > sharing
> > > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > > >>> > according to the
> > > > >>> > >>>>> consumer type, then further distributed across
operators
> > > > of that
> > > > >>> > >> consumer
> > > > >>> > >>>>> type.
> > > > >>> > >>>>>
> > > > >>> > >>>>> - With the operator-based approach, managed memory size
> > > > >>> > specified
> > > > >>> > >> for an
> > > > >>> > >>>>> operator should account for all the consumer types of
> > > that
> > > > >>> > operator.
> > > > >>> > >> That
> > > > >>> > >>>>> means the managed memory is first distributed across
> > > > >>> > operators, then
> > > > >>> > >>>>> distributed to different consumer types of each
operator.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Unfortunately, the different order of the two
calculation
> > > > >>> > steps can
> > > > >>> > >> lead
> > > > >>> > >>>> to
> > > > >>> > >>>>> different results. To be specific, the semantic of the
> > > > >>> > configuration
> > > > >>> > >>>> option
> > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within an
> > > > >>> > operator).
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> To sum up things:
> > > > >>> > >>>>>
> > > > >>> > >>>>> While (3) might be a bit more implementation related, I
> > > > >>> > think (1)
> > > > >>> > >> and (2)
> > > > >>> > >>>>> somehow suggest that, the price for the proposed
approach
> > > > to
> > > > >>> > avoid
> > > > >>> > >>>>> specifying resource for every operator is that it's not
> > > as
> > > > >>> > >> independent
> > > > >>> > >>>> from
> > > > >>> > >>>>> operator chaining and slot sharing as the
operator-based
> > > > >>> > approach
> > > > >>> > >>>> discussed
> > > > >>> > >>>>> in the FLIP.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Thank you~
> > > > >>> > >>>>>
> > > > >>> > >>>>> Xintong Song
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > > >>> > >> wrote:
> > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > > >>> > written. And
> > > > >>> > >> the
> > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > >>> > configuration to
> > > > >>> > >>>> users
> > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > >>> > >>>>>> So good job here!
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> About how to let users specify the resource profiles.
> > > If I
> > > > >>> > can sum
> > > > >>> > >> the
> > > > >>> > >>>>> FLIP
> > > > >>> > >>>>>> and previous discussion up in my own words, the
problem
> > > > is the
> > > > >>> > >>>> following:
> > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > cleanest
> > > > >>> > approach,
> > > > >>> > >>>>> because
> > > > >>> > >>>>>>> it avoids mixing operator configuration (resource)
and
> > > > >>> > >> scheduling. No
> > > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > > sharing,
> > > > >>> > >>>> switching
> > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource
profiles
> > > > >>> > stay the
> > > > >>> > >>>> same.
> > > > >>> > >>>>>>> But it would require that a user specifies resources
on
> > > > all
> > > > >>> > >>>> operators,
> > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > suggests
> > > > going
> > > > >>> > >> with
> > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> I think both thoughts are important, so can we find a
> > > > solution
> > > > >>> > >> where
> > > > >>> > >>>> the
> > > > >>> > >>>>>> Resource Profiles are specified on an Operator, but we
> > > > >>> > still avoid
> > > > >>> > >> that
> > > > >>> > >>>>> we
> > > > >>> > >>>>>> need to specify a resource profile on every operator?
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> What do you think about something like the following:
> > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > level.
> > > > >>> > >>>>>> - Not all operators need profiles
> > > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > > in
> > > > the
> > > > >>> > >> default
> > > > >>> > >>>> slot
> > > > >>> > >>>>>> sharing group with a default profile (will get a
default
> > > > slot).
> > > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > > >>> > another slot
> > > > >>> > >>>>> sharing
> > > > >>> > >>>>>> group (the resource-specified-group).
> > > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > > >>> > operators
> > > > >>> > >> like
> > > > >>> > >>>>> they
> > > > >>> > >>>>>> do now, with the exception that you cannot mix
operators
> > > > >>> > that have
> > > > >>> > >> a
> > > > >>> > >>>>>> resource profile and operators that have no resource
> > > > profile.
> > > > >>> > >>>>>> - The default case where no operator has a resource
> > > > >>> > profile is
> > > > >>> > >> just a
> > > > >>> > >>>>>> special case of this model
> > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > operator,
> > > > >>> > like it
> > > > >>> > >> does
> > > > >>> > >>>>> now,
> > > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks
that
> > > > it
> > > > >>> > >> schedules
> > > > >>> > >>>>>> together.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> There is another question about reactive scaling
raised
> > > > in the
> > > > >>> > >> FLIP. I
> > > > >>> > >>>>> need
> > > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > > tricky
> > > > >>> > once we
> > > > >>> > >>>> have
> > > > >>> > >>>>>> slots of different sizes.
> > > > >>> > >>>>>> It is not clear then which of the different slot
> > > requests
> > > > the
> > > > >>> > >>>>>> ResourceManager should fulfill when new resources
(TMs)
> > > > >>> > show up,
> > > > >>> > >> or how
> > > > >>> > >>>>> the
> > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > resources
> > > > >>> > (TMs)
> > > > >>> > >>>>> disappear
> > > > >>> > >>>>>> This question is pretty orthogonal, though, to the
"how
> > > to
> > > > >>> > specify
> > > > >>> > >> the
> > > > >>> > >>>>>> resources".
> > > > >>> > >>>>>>
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> Best,
> > > > >>> > >>>>>> Stephan
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > > >>> > >>>>> wrote:
> > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > discussion,
> > > > >>> > Yangze.
> > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> @Till,
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > > that
> > > > SSGs
> > > > >>> > >> need to
> > > > >>> > >>>>> be
> > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > otherwise
> > > > each
> > > > >>> > >>>> operator
> > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > However,
> > > > I
> > > > >>> > cannot
> > > > >>> > >>>> think
> > > > >>> > >>>>>> of
> > > > >>> > >>>>>>> a strong reason for not supporting SSGs in
fine-grained
> > > > >>> > resource
> > > > >>> > >>>>>>> management.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>> Interestingly, if all operators have their resources
> > > > properly
> > > > >>> > >>>>>> specified,
> > > > >>> > >>>>>>>> then slot sharing is no longer needed because Flink
> > > > could
> > > > >>> > >> slice off
> > > > >>> > >>>>> the
> > > > >>> > >>>>>>>> appropriately sized slots for every Task
individually.
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > > >>> > operator op_1
> > > > >>> > >> and
> > > > >>> > >>>>> op_2
> > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would then
> > > say
> > > > that
> > > > >>> > >> the
> > > > >>> > >>>> slot
> > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > have
> > > > a
> > > > >>> > >> cluster
> > > > >>> > >>>>> with
> > > > >>> > >>>>>> 2
> > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > cannot run
> > > > >>> > >> this
> > > > >>> > >>>>> job.
> > > > >>> > >>>>>> If
> > > > >>> > >>>>>>>> the resources were specified on an operator level,
> > > then
> > > > the
> > > > >>> > >> system
> > > > >>> > >>>>>> could
> > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > op_2
> > > > to
> > > > >>> > >> TM_2.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Couldn't agree more that if all operators'
requirements
> > > > are
> > > > >>> > >> properly
> > > > >>> > >>>>>>> specified, slot sharing should be no longer needed. I
> > > > >>> > think this
> > > > >>> > >>>>> exactly
> > > > >>> > >>>>>>> disproves the example. If we already know op_1 and
op_2
> > > > each
> > > > >>> > >> needs
> > > > >>> > >>>> 100
> > > > >>> > >>>>> MB
> > > > >>> > >>>>>>> of memory, why would we put them in the same group?
If
> > > > >>> > they are
> > > > >>> > >> in
> > > > >>> > >>>>>> separate
> > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > freely
> > > > >>> > deploy
> > > > >>> > >> them
> > > > >>> > >>>> to
> > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Moreover, the precondition for not needing slot
sharing
> > > > is
> > > > >>> > having
> > > > >>> > >>>>>> resource
> > > > >>> > >>>>>>> requirements properly specified for all operators.
This
> > > > is not
> > > > >>> > >> always
> > > > >>> > >>>>>>> possible, and usually requires tremendous efforts.
One
> > > > of the
> > > > >>> > >>>> benefits
> > > > >>> > >>>>>> for
> > > > >>> > >>>>>>> SSG-based requirements is that it allows the user to
> > > > freely
> > > > >>> > >> decide
> > > > >>> > >>>> the
> > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > > >>> > consider SSG
> > > > >>> > >> in
> > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > operators
> > > > >>> > that the
> > > > >>> > >>>> user
> > > > >>> > >>>>>>> would like to specify the total resource for. There
can
> > > > be
> > > > >>> > only
> > > > >>> > >> one
> > > > >>> > >>>>> group
> > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > > major
> > > > >>> > parts,
> > > > >>> > >> or as
> > > > >>> > >>>>>> many
> > > > >>> > >>>>>>> groups as the number of tasks/operators, depending on
> > > how
> > > > >>> > >>>> fine-grained
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>> user is able to specify the resources.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But
given
> > > > >>> > that all
> > > > >>> > >> the
> > > > >>> > >>>>>>> current scheduler implementations already support
> > > SSGs, I
> > > > >>> > tend to
> > > > >>> > >>>> think
> > > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > > >>> > usability and
> > > > >>> > >>>>>>> flexibility.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> @Chesnay
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > > waste
> > > > >>> > >> resources
> > > > >>> > >>>> if
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > different?
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>> Yes. It's a trade-off between usability and resource
> > > > >>> > >> utilization. To
> > > > >>> > >>>>>> avoid
> > > > >>> > >>>>>>> such wasting, the user can define more groups, so
that
> > > > >>> > each group
> > > > >>> > >>>>>> contains
> > > > >>> > >>>>>>> less operators and the chance of having operators
with
> > > > >>> > different
> > > > >>> > >>>>>>> parallelism will be reduced. The price is to have
more
> > > > >>> > resource
> > > > >>> > >>>>>>> requirements to specify.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> It also seems like quite a hassle for users having to
> > > > >>> > >> recalculate the
> > > > >>> > >>>>>>>> resource requirements if they change the slot
sharing.
> > > > >>> > >>>>>>>> I'd think that it's not really workable for users
that
> > > > create
> > > > >>> > >> a set
> > > > >>> > >>>>> of
> > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > their
> > > > >>> > >>>>> applications;
> > > > >>> > >>>>>>>> managing the resources requirements in such a
setting
> > > > >>> > would be
> > > > >>> > >> a
> > > > >>> > >>>>>>>> nightmare, and in the end would require
operator-level
> > > > >>> > >> requirements
> > > > >>> > >>>>> any
> > > > >>> > >>>>>>>> way.
> > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > increases
> > > > >>> > >>>>> usability.
> > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > there's no
> > > > >>> > >> reason to
> > > > >>> > >>>>> put
> > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > >>> > requirements are
> > > > >>> > >>>>> already
> > > > >>> > >>>>>>> known
> > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > management.
> > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > multiple
> > > > >>> > >>>>> applications,
> > > > >>> > >>>>>>> it does not guarantee the same resource
> > > requirements.
> > > > >>> > During
> > > > >>> > >> our
> > > > >>> > >>>>> years
> > > > >>> > >>>>>>> of
> > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > requirements
> > > > >>> > >> specified for
> > > > >>> > >>>>>>> Blink's
> > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > >>> > (including
> > > > >>> > >> our
> > > > >>> > >>>>>>> specialists
> > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > > >>> > >> experienced as
> > > > >>> > >>>>> to
> > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > >>> > >> requirements.
> > > > >>> > >>>> Most
> > > > >>> > >>>>>>> people
> > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > delay, cpu
> > > > >>> > >> load,
> > > > >>> > >>>>>> memory
> > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > specification.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> To sum up:
> > > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > > >>> > requirements
> > > > >>> > >> for
> > > > >>> > >>>>>> every
> > > > >>> > >>>>>>> operator, that's definitely a good thing and we would
> > > not
> > > > >>> > need to
> > > > >>> > >>>> rely
> > > > >>> > >>>>> on
> > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > > > >>> > >> fine-grained
> > > > >>> > >>>>>> resource
> > > > >>> > >>>>>>> management to work. For those users who are capable
and
> > > > do not
> > > > >>> > >> like
> > > > >>> > >>>>>> having
> > > > >>> > >>>>>>> to set each operator to a separate SSG, I would be ok
> > > to
> > > > have
> > > > >>> > >> both
> > > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and
to
> > > > only
> > > > >>> > >> fallback
> > > > >>> > >>>> to
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>> SSG requirements when the operator requirements are
not
> > > > >>> > >> specified.
> > > > >>> > >>>>>> However,
> > > > >>> > >>>>>>> as the first step, I think we should prioritise the
use
> > > > cases
> > > > >>> > >> where
> > > > >>> > >>>>> users
> > > > >>> > >>>>>>> are not that experienced.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Thank you~
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Xintong Song
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > > >>> > >> chesnay@apache.org <ma...@apache.org>>
> > > > >>> > >>>>>>> wrote:
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>> Will declaring them on slot sharing groups not also
> > > > waste
> > > > >>> > >> resources
> > > > >>> > >>>>> if
> > > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > > different?
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>>> It also seems like quite a hassle for users having
to
> > > > >>> > >> recalculate
> > > > >>> > >>>> the
> > > > >>> > >>>>>>>> resource requirements if they change the slot
sharing.
> > > > >>> > >>>>>>>> I'd think that it's not really workable for users
that
> > > > create
> > > > >>> > >> a set
> > > > >>> > >>>>> of
> > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > their
> > > > >>> > >>>>> applications;
> > > > >>> > >>>>>>>> managing the resources requirements in such a
setting
> > > > >>> > would be
> > > > >>> > >> a
> > > > >>> > >>>>>>>> nightmare, and in the end would require
operator-level
> > > > >>> > >> requirements
> > > > >>> > >>>>> any
> > > > >>> > >>>>>>>> way.
> > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > increases
> > > > >>> > >>>>> usability.
> > > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > > work
> > > > >>> > on SSGs
> > > > >>> > >>>> it's
> > > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > > approaches,
> > > > >>> > >> which
> > > > >>> > >>>>>>>> would not be the case if, for the runtime, they are
> > > > always
> > > > >>> > >> defined
> > > > >>> > >>>> on
> > > > >>> > >>>>>> an
> > > > >>> > >>>>>>>> operator-level.
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > > discussion
> > > > >>> > >>>> Yangze.
> > > > >>> > >>>>>>>>> I like that defining resource requirements on a
slot
> > > > sharing
> > > > >>> > >>>> group
> > > > >>> > >>>>>>> makes
> > > > >>> > >>>>>>>>> the overall setup easier and improves usability of
> > > > resource
> > > > >>> > >>>>>>> requirements.
> > > > >>> > >>>>>>>>> What I do not like about it is that it changes slot
> > > > sharing
> > > > >>> > >>>> groups
> > > > >>> > >>>>>> from
> > > > >>> > >>>>>>>>> being a scheduling hint to something which needs to
> > > be
> > > > >>> > >> supported
> > > > >>> > >>>> in
> > > > >>> > >>>>>>> order
> > > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > > far,
> > > > the
> > > > >>> > >> idea
> > > > >>> > >>>> of
> > > > >>> > >>>>>> slot
> > > > >>> > >>>>>>>>> sharing groups was that it tells the system that a
> > > set
> > > > of
> > > > >>> > >>>> operators
> > > > >>> > >>>>>> can
> > > > >>> > >>>>>>>> be
> > > > >>> > >>>>>>>>> deployed in the same slot. But the system still had
> > > the
> > > > >>> > >> freedom
> > > > >>> > >>>> to
> > > > >>> > >>>>>> say
> > > > >>> > >>>>>>>> that
> > > > >>> > >>>>>>>>> it would rather place these tasks in different
slots
> > > > if it
> > > > >>> > >>>> wanted.
> > > > >>> > >>>>> If
> > > > >>> > >>>>>>> we
> > > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > > sharing
> > > > >>> > >> group,
> > > > >>> > >>>> then
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>>>> only option for a scheduler which does not support
> > > slot
> > > > >>> > >> sharing
> > > > >>> > >>>>>> groups
> > > > >>> > >>>>>>> is
> > > > >>> > >>>>>>>>> to say that every operator in this slot sharing
group
> > > > >>> > needs a
> > > > >>> > >>>> slot
> > > > >>> > >>>>>> with
> > > > >>> > >>>>>>>> the
> > > > >>> > >>>>>>>>> same resources as the whole group.
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> So for example, if we have a job consisting of two
> > > > operator
> > > > >>> > >> op_1
> > > > >>> > >>>>> and
> > > > >>> > >>>>>>> op_2
> > > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would then
> > > > say that
> > > > >>> > >> the
> > > > >>> > >>>>> slot
> > > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > > have a
> > > > >>> > >> cluster
> > > > >>> > >>>>>> with
> > > > >>> > >>>>>>> 2
> > > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > cannot run
> > > > >>> > >> this
> > > > >>> > >>>>>> job.
> > > > >>> > >>>>>>> If
> > > > >>> > >>>>>>>>> the resources were specified on an operator level,
> > > > then the
> > > > >>> > >>>> system
> > > > >>> > >>>>>>> could
> > > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > > op_2 to
> > > > >>> > >> TM_2.
> > > > >>> > >>>>>>>>> Originally, one of the primary goals of slot
sharing
> > > > groups
> > > > >>> > >> was
> > > > >>> > >>>> to
> > > > >>> > >>>>>> make
> > > > >>> > >>>>>>>> it
> > > > >>> > >>>>>>>>> easier for the user to reason about how many slots
a
> > > > job
> > > > >>> > >> needs
> > > > >>> > >>>>>>>> independent
> > > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > > Interestingly,
> > > > >>> > >> if
> > > > >>> > >>>> all
> > > > >>> > >>>>>>>>> operators have their resources properly specified,
> > > > then slot
> > > > >>> > >>>>> sharing
> > > > >>> > >>>>>> is
> > > > >>> > >>>>>>>> no
> > > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > > >>> > appropriately
> > > > >>> > >>>> sized
> > > > >>> > >>>>>>> slots
> > > > >>> > >>>>>>>>> for every Task individually. What matters is
whether
> > > > the
> > > > >>> > >> whole
> > > > >>> > >>>>>> cluster
> > > > >>> > >>>>>>>> has
> > > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> Cheers,
> > > > >>> > >>>>>>>>> Till
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > > >>> > >> karmagyz@gmail.com <ma...@gmail.com>>
> > > > >>> > >>>>>> wrote:
> > > > >>> > >>>>>>>>>> Hi, there,
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > > "FLIP-156:
> > > > >>> > >> Runtime
> > > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > > Requirements"[1],
> > > > >>> > >> where we
> > > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > > interfaces
> > > > >>> > >> for
> > > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> In this FLIP:
> > > > >>> > >>>>>>>>>> - Expound the user story of fine-grained resource
> > > > >>> > >> management.
> > > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > > SSG-based
> > > > >>> > >> resource
> > > > >>> > >>>>>>>>>> requirements.
> > > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three potential
> > > > >>> > >> granularities
> > > > >>> > >>>>> for
> > > > >>> > >>>>>>>>>> specifying the resource requirements (op, task and
> > > > slot
> > > > >>> > >> sharing
> > > > >>> > >>>>>> group)
> > > > >>> > >>>>>>>>>> and explain why we choose the slot sharing group.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki document
> > > > [1].
> > > > >>> > >> Looking
> > > > >>> > >>>>>>>>>> forward to your feedback.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> [1]
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>
> > > > >>> >
> > > >
> > >
>
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >>> > <
> > > >
> > >
>
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >
> > > > >>> > >>>>>>>>>> Best,
> > > > >>> > >>>>>>>>>> Yangze Guo
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>
> > > > >>> >
> > > > >>>
> > > >
> > >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

Thanks for your feedback, Kezhu.

I think Flink *runtime* already has an ideal granularity for resource
> management 'task'. If there is
> a slot shared by multiple tasks, that slot's resource requirement is simple
> sum of all its logical
> slots. So basically, this is no resource requirement for SlotSharingGroup
> in runtime until now,
> right ?

That is a halfly-cooked implementation, coming from the previous attempts
(years ago) trying to deliver the fine-grained resource management feature,
and never really put into use.

From the FLIP and dicusssion, I assume that SSG resource specifying will
> override operator level
> resource specifying if both are specified ?
>
Actually, I think we should use the finer-grained resources (i.e. operator
level) if both are specified. And more importantly, that is based on the
assumption that we do need two different levels of interfaces.

So, I wonder whether we could interpret SSG resource specifying as an "add"
> but not an "set" on
> resource requirement ?
>
IIUC, this is the core idea behind your proposal. I think it provides an
interesting idea of how we combine operator level and SSG level resources, *if
we allow configuring resources at both levels*. However, I'm not sure
whether the configuring resources on the operator level is indeed needed.
Therefore, as a first step, this FLIP proposes to only introduce the
SSG-level interfaces. As listed in the future plan, we would consider
allowing operator level resource configuration later if we do see a need
for it. At that time, we definitely should discuss what to do if resources
are configured at both levels.

* Could SSG express negative resource requirement ?
>
No.

Is there concrete bar for partial resource configured not function ? I
> saw it will fail job submission in Dispatcher.submitJob.
>
With the SSG-based approach, this should no longer be needed. The
constraint was introduced because we can neither properly define what is
the resource of a task chained from an operator with specified resource and
another with unspecified resource, nor for a slot shared by a task with
specified resource and another with unspecified resource. With the
SSG-based approach, we no longer have those problems.

An option(cluster/job level) to force slot sharing in scheduler ? This
> could be useful in case of migration from FLIP-156 to future approach.
>
I think this is exactly what we are trying to avoid, requiring the
scheduler to enforce slot sharing.

An option(cluster) to ignore resource specifying(allow resource specified
> job to run on open box environment) for no production usage ?
>
That's possible. Actually, we are planning to introduce an option for
activating the fine-grained resource management, for development purposes.
We might consider to keep that option after the feature is completed, to
allow disable the feature without having to touch the job codes.

Thank you~

Xintong Song



On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <ke...@gmail.com> wrote:

> Hi all, sorry for join discussion even after voting started.
>
> I want to share my thoughts on this after reading above discussions.
>
> I think Flink *runtime* already has an ideal granularity for resource
> management 'task'. If there is
> a slot shared by multiple tasks, that slot's resource requirement is simple
> sum of all its logical
> slots. So basically, this is no resource requirement for SlotSharingGroup
> in runtime until now,
> right ?
>
> As in discussion, we already agree upon that: "If all operators have their
> resources properly
> specified, then slot sharing is no longer needed. "
>
> So seems to me, naturally in mind path, what we would discuss is that: how
> to bridge impractical
> operator level resource specifying to runtime task level resource
> requirement ? This is actually a
> pure api thing as Chesnay has pointed out.
>
> But FLIP-156 brings another direction on table: how about using SSG for
> both api and runtime
> resource specifying ?
>
> From the FLIP and dicusssion, I assume that SSG resource specifying will
> override operator level
> resource specifying if both are specified ?
>
> So, I wonder whether we could interpret SSG resource specifying as an "add"
> but not an "set" on
> resource requirement ?
>
> The semantics is that SSG resource specifying adds additional resource to
> shared slot to express
> concerns on possible high thoughput and resource requirement for tasks in
> one physical slot.
>
> The result is that if scheduler indeed respect slot sharing, allocated slot
> will gain extra resource
> specified for that SSG.
>
> I think one of coding barrier from "add" approach is ResourceSpec.UNKNOWN
> which didn't support
> 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> executor should be aware of
> this.
>
> @Chesnay
> > My main worry is that it if we wire the runtime to work on SSGs it's
> > gonna be difficult to implement more fine-grained approaches, which
> > would not be the case if, for the runtime, they are always defined on an
> > operator-level.
>
> An "add" operation should be less invasive and enforce low barrier for
> future find-grained
> approaches.
>
> @Stephan
> >   - Users can define different slot sharing groups for operators like
> they
> > do now, with the exception that you cannot mix operators that have a
> > resource profile and operators that have no resource profile.
>
> @Till
> > This effectively means that all unspecified operators
> > will implicitly have a zero resource requirement.
> > I am wondering whether this wouldn't lead to a surprising behaviour for
> the
> > user. If the user specifies the resource requirements for a single
> > operator, then he probably will assume that the other operators will get
> > the default share of resources and not nothing.
>
> I think it is inherent due to fact that we could not defining
> ResourceSpec.ONE, eg. resource
> requirement for exact one default slot, with concrete numbers ? I tend to
> squash out unspecified one
> if there are operators in chaining with explicit resource specifying.
> Otherwise, the protocol tends
> to verbose as say "give me this much resource and a default". I think if we
> have explict resource
> specifying for partial operators, it is just saying "I don't care other
> operators that much, just
> get them places to run". It is most likely be cases there are stateless
> fliter/map or other less
> resource consuming operators. If there is indeed a problem, I think clients
> can specify a global
> default(or other level default in future). In job graph generating phase,
> we could take that default
> into account for unspecified operators.
>
> @FLIP-156
> > Expose operator chaining. (Cons fo task level resource specifying)
>
> Is it inherent for all group level resource specifying ? They will either
> break chaining or obey it,
> or event could not work with.
>
> To sum up above, my suggestions are:
>
> In api side:
> * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> unspecified).
> * Operator: ResourceSpec.ZERO(unspecified) as default.
> * Task: sum of requirements from specified operators + global default(if
> there are any unspecified operators)
> * SSG: additional resource to physical slot.
>
> In runtime side:
> * Task: ResourceSpec.Task or ResourceSpec.ZERO
> * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
>
> Physical slot gets sum up resources from logical slots and SSG, if it gets
> ResourceSpec.ZERO, it is
> just a default sized slot.
>
> In short, turn SSG resource speciying as "add" and drop
> ResourceSpec.UNKNOWN.
>
>
> Questions/Issues:
> * Could SSG express negative resource requirement ?
> * Is there concrete bar for partial resource configured not function ? I
> saw it will fail job submission in Dispatcher.submitJob.
> * An option(cluster/job level) to force slot sharing in scheduler ? This
> could be useful in case of migration from FLIP-156 to future approach.
> * An option(cluster) to ignore resource specifying(allow resource specified
> job to run on open box environment) for no production usage ?
>
>
>
> On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com) wrote:
>
> Thanks for reply, Till and Xintong!
>
> I update the FLIP, including:
> - Edit the JavaDoc of the proposed
> StreamGraphGenerator#setSlotSharingGroupResource.
> - Add "Future Plan" section, which contains the potential follow-up
> issues and the limitations to be documented when fine-grained resource
> management is exposed to users.
>
> I'll start a vote in another thread.
>
> Best,
> Yangze Guo
>
> On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > Thanks for summarizing the discussion, Yangze. I agree that setting
> > resource requirements per operator is not very user friendly. Moreover, I
> > couldn't come up with a different proposal which would be as easy to use
> > and wouldn't expose internal scheduling details. In fact, following this
> > argument then we shouldn't have exposed the slot sharing groups in the
> > first place.
> >
> > What is important for the user is that we properly document the
> limitations
> > and constraints the fine grained resource specification has. For example,
> > we should explain how optimizations like chaining are affected by it and
> > how different execution modes (batch vs. streaming) affect the execution
> of
> > operators which have specified resources. These things shouldn't become
> > part of the contract of this feature and are more caused by internal
> > implementation details but it will be important to understand these
> things
> > properly in order to use this feature effectively.
> >
> > Hence, +1 for starting the vote for this FLIP.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks for the summary, Yangze.
> > >
> > > The changes and follow-up issues LGTM. Let's wait for responses from
> the
> > > others before starting a vote.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > >
> > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > summarize the current convergence in the discussion. Please let me
> > > > know if I got things wrong or missed something crucial here.
> > > >
> > > > Change of this FLIP:
> > > > - Treat the SSG resource requirements as a hint instead of a
> > > > restriction for the runtime. That's should be explicitly explained in
> > > > the JavaDocs.
> > > >
> > > > Potential follow-up issues if needed:
> > > > - Provide operator-level resource configuration interface.
> > > > - Provide multiple options for deciding resources for SSGs whose
> > > > requirement is not specified:
> > > > ** Default slot resource.
> > > > ** Default operator resource times number of operators.
> > > >
> > > > If there are no other issues, I'll update the FLIP accordingly and
> > > > start a vote thread. Thanks all for the valuable feedback again.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > >
> > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <tonysong820@gmail.com
> >
> > > > wrote:
> > > > >
> > > > >
> > > > > FGRuntimeInterface.png
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <
> tonysong820@gmail.com>
>
> > > > wrote:
> > > > >>
> > > > >> I think Chesnay's proposal could actually work. IIUC, the keypoint
> is
> > > > to derive operator requirements from SSG requirements on the API
> side, so
> > > > that the runtime only deals with operator requirements. It's
> debatable
> > > how
> > > > the deriving should be done though. E.g., an alternative could be to
> > > evenly
> > > > divide the SSG requirement into requirements of operators in the
> group.
> > > > >>
> > > > >>
> > > > >> However, I'm not entirely sure which option is more desired.
> > > > Illustrating my understanding in the following figure, in which on
> the
> > > top
> > > > is Chesnay's proposal and on the bottom is the SSG-based proposal in
> this
> > > > FLIP.
> > > > >>
> > > > >>
> > > > >>
> > > > >> I think the major difference between the two approaches is where
> > > > deriving operator requirements from SSG requirements happens.
> > > > >>
> > > > >> - Chesnay's proposal simplifies the runtime logic and the
> interface to
> > > > expose, at the price of moving more complexity (i.e. the deriving) to
> the
> > > > API side. The question is, where do we prefer to keep the complexity?
> I'm
> > > > slightly leaning towards having a thin API and keep the complexity in
> > > > runtime if possible.
> > > > >>
> > > > >> - Notice that the dash line arrows represent optional steps that
> are
> > > > needed only for schedulers that do not respect SSGs, which we don't
> have
> > > at
> > > > the moment. If we only look at the solid line arrows, then the
> SSG-based
> > > > approach is much simpler, without needing to derive and aggregate the
> > > > requirements back and forth. I'm not sure about complicating the
> current
> > > > design only for the potential future needs.
> > > > >>
> > > > >>
> > > > >> Thank you~
> > > > >>
> > > > >> Xintong Song
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> chesnay@apache.org>
> > > > wrote:
> > > > >>>
> > > > >>> You're raising a good point, but I think I can rectify that with
> a
> > > > minor
> > > > >>> adjustment.
> > > > >>>
> > > > >>> Default requirements are whatever the default requirements are,
> > > setting
> > > > >>> the requirements for one operator has no effect on other
> operators.
> > > > >>>
> > > > >>> With these rules, and some API enhancements, the following mockup
> > > would
> > > > >>> replicate the SSG-based behavior:
> > > > >>>
> > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > >>> vertices = slotSharingGroup.getVertices()
> > > > >>>
> > > >
> > >
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > >>> }
> > > > >>>
> > > > >>> We could even allow setting requirements on slotsharing-groups
> > > > >>> colocation-groups and internally translate them accordingly.
> > > > >>> I can't help but feel this is a plain API issue.
> > > > >>>
> > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > >>> > If I understand you correctly Chesnay, then you want to
> decouple
> > > the
> > > > >>> > resource requirement specification from the slot sharing group
> > > > >>> > assignment. Hence, per default all operators would be in the
> same
> > > > slot
> > > > >>> > sharing group. If there is no operator with a resource
> > > specification,
> > > > >>> > then the system would allocate a default slot for it. If there
> is
> > > at
> > > > >>> > least one operator, then the system would sum up all the
> specified
> > > > >>> > resources and allocate a slot of this size. This effectively
> means
> > > > >>> > that all unspecified operators will implicitly have a zero
> resource
> > > > >>> > requirement. Did I understand your idea correctly?
> > > > >>> >
> > > > >>> > I am wondering whether this wouldn't lead to a surprising
> behaviour
> > > > >>> > for the user. If the user specifies the resource requirements
> for a
> > > > >>> > single operator, then he probably will assume that the other
> > > > operators
> > > > >>> > will get the default share of resources and not nothing.
> > > > >>> >
> > > > >>> > Cheers,
> > > > >>> > Till
> > > > >>> >
> > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > chesnay@apache.org
> > > > >>> > <ma...@apache.org>> wrote:
> > > > >>> >
> > > > >>> > Is there even a functional difference between specifying the
> > > > >>> > requirements for an SSG vs specifying the same requirements on
> > > a
> > > > >>> > single
> > > > >>> > operator within that group (ideally a colocation group to avoid
> > > > this
> > > > >>> > whole hint business)?
> > > > >>> >
> > > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > > >>> >
> > > > >>> > Users can take shortcuts to define shared requirements,
> > > > >>> > but refine them further as needed on a per-operator basis,
> > > > >>> > without changing semantics of slotsharing groups
> > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > >>> >
> > > > >>> > (And before anyone argues what happens if slotsharing groups
> > > > >>> > change or
> > > > >>> > whatnot, that's a plain API issue that we could surely solve.
> > > (A
> > > > >>> > plain
> > > > >>> > iteration over slotsharing groups and therein contained
> > > operators
> > > > >>> > would
> > > > >>> > suffice)).
> > > > >>> >
> > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > >>> > > Maybe a different minor idea: Would it be possible to treat
> > > > the SSG
> > > > >>> > > resource requirements as a hint for the runtime similar to
> > > how
> > > > >>> > slot sharing
> > > > >>> > > groups are designed at the moment? Meaning that we don't give
> > > > >>> > the guarantee
> > > > >>> > > that Flink will always deploy this set of tasks together no
> > > > >>> > matter what
> > > > >>> > > comes. If, for example, the runtime can derive by some means
> > > > the
> > > > >>> > resource
> > > > >>> > > requirements for each task based on the requirements for the
> > > > >>> > SSG, this
> > > > >>> > > could be possible. One easy strategy would be to give every
> > > > task
> > > > >>> > the same
> > > > >>> > > resources as the whole slot sharing group. Another one could
> > > be
> > > > >>> > > distributing the resources equally among the tasks. This does
> > > > >>> > not even have
> > > > >>> > > to be implemented but we would give ourselves the freedom to
> > > > change
> > > > >>> > > scheduling if need should arise.
> > > > >>> > >
> > > > >>> > > Cheers,
> > > > >>> > > Till
> > > > >>> > >
> > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > karmagyz@gmail.com
> > > > >>> > <ma...@gmail.com>> wrote:
> > > > >>> > >
> > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > >>> > >>
> > > > >>> > >> I second Xintong's comment that SSG-based runtime interface
> > > > >>> > will give
> > > > >>> > >> us the flexibility to achieve op/task-based approach. That's
> > > > one of
> > > > >>> > >> the most important reasons for our design choice.
> > > > >>> > >>
> > > > >>> > >> Some cents regarding the default operator resource:
> > > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > > >>> > >> ** For light-weight operators, the accumulative
> > > > >>> > configuration error
> > > > >>> > >> will not be significant. Then, the resource of a task used
> > > is
> > > > >>> > >> proportional to the number of operators it contains.
> > > > >>> > >> ** For heavy operators like join and window or operators
> > > > >>> > using the
> > > > >>> > >> external resources, user will turn to the fine-grained
> > > > resource
> > > > >>> > >> configuration.
> > > > >>> > >> - It can increase the stability for the standalone cluster
> > > > >>> > where task
> > > > >>> > >> executors registered are heterogeneous(with different
> > > default
> > > > slot
> > > > >>> > >> resources).
> > > > >>> > >> - It might not be good for SQL users. The operators that SQL
> > > > >>> > will be
> > > > >>> > >> transferred to is a black box to the user. We also do not
> > > > guarantee
> > > > >>> > >> the cross-version of consistency of the transformation so
> > > far.
> > > > >>> > >>
> > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > fine-grained
> > > > >>> > >> resource management is end-to-end ready.
> > > > >>> > >>
> > > > >>> > >> Best,
> > > > >>> > >> Yangze Guo
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > >>> > >> wrote:
> > > > >>> > >>> Thanks for the feedback, Till.
> > > > >>> > >>>
> > > > >>> > >>> ## I feel that what you proposed (operator-based + default
> > > > >>> > value) might
> > > > >>> > >> be
> > > > >>> > >>> subsumed by the SSG-based approach.
> > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> > > > >>> > categorized by
> > > > >>> > >>> whether the resource requirements are known to the users.
> > > > >>> > >>>
> > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > >>> > reason to put
> > > > >>> > >>> multiple operators whose individual resource
> > > requirements
> > > > >>> > are already
> > > > >>> > >> known
> > > > >>> > >>> into the same group in fine-grained resource
> > > management.
> > > > >>> > And if op_1
> > > > >>> > >> and
> > > > >>> > >>> op_2 are in different groups, there should be no
> > > problem
> > > > >>> > switching
> > > > >>> > >> data
> > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > >>> > equivalent to
> > > > >>> > >> specifying
> > > > >>> > >>> operator resource requirements in your proposal.
> > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > that
> > > > >>> > op_2 is in a
> > > > >>> > >>> SSG whose resource is not specified thus would have the
> > > > >>> > default slot
> > > > >>> > >>> resource. This is equivalent to having default operator
> > > > >>> > resources in
> > > > >>> > >> your
> > > > >>> > >>> proposal.
> > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > op_2
> > > > >>> > to the same
> > > > >>> > >> SSG
> > > > >>> > >>> or separate SSGs.
> > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > >>> > equivalent to
> > > > >>> > >> the
> > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > op_2
> > > > >>> > share a
> > > > >>> > >> default
> > > > >>> > >>> size slot no matter which data exchange mode is
> > > used.
> > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > of
> > > > >>> > them will
> > > > >>> > >> use
> > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > them
> > > > >>> > with
> > > > >>> > >> default
> > > > >>> > >>> operator resources in your proposal.
> > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > > is
> > > > >>> > known.*
> > > > >>> > >>> - It is possible that the user learns the total /
> > > max
> > > > >>> > resource
> > > > >>> > >>> requirement from executing and monitoring the job,
> > > > >>> > while not
> > > > >>> > >>> being aware of
> > > > >>> > >>> individual operator requirements.
> > > > >>> > >>> - I believe this is the case your proposal does not
> > > > >>> > cover. And TBH,
> > > > >>> > >>> this is probably how most users learn the resource
> > > > >>> > requirements,
> > > > >>> > >>> according
> > > > >>> > >>> to my experiences.
> > > > >>> > >>> - In this case, the user might need to specify
> > > > >>> > different resources
> > > > >>> > >> if
> > > > >>> > >>> he wants to switch the execution mode, which should
> > > > not
> > > > >>> > be worse
> > > > >>> > >> than not
> > > > >>> > >>> being able to use fine-grained resource management.
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > >>> > >>> We may provide multiple options for deciding resources for
> > > > >>> > SSGs whose
> > > > >>> > >>> requirement is not specified, if needed.
> > > > >>> > >>>
> > > > >>> > >>> - Default slot resource (current design)
> > > > >>> > >>> - Default operator resource times number of operators
> > > > >>> > (equivalent to
> > > > >>> > >>> your proposal)
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> ## Exposing internal runtime strategies
> > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > >>> > requirements might be
> > > > >>> > >>> affected if how SSGs are internally handled changes in
> > > > future.
> > > > >>> > >> Practically,
> > > > >>> > >>> I do not concretely see at the moment what kind of changes
> > > we
> > > > >>> > may want in
> > > > >>> > >>> future that might conflict with this FLIP proposal, as the
> > > > >>> > question of
> > > > >>> > >>> switching data exchange mode answered above. I'd suggest to
> > > > >>> > not give up
> > > > >>> > >> the
> > > > >>> > >>> user friendliness we may gain now for the future problems
> > > > that
> > > > >>> > may or may
> > > > >>> > >>> not exist.
> > > > >>> > >>>
> > > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > > >>> > achieve the
> > > > >>> > >>> equivalent behavior as the operator-based approach, if we
> > > > set each
> > > > >>> > >> operator
> > > > >>> > >>> (or task) to a separate SSG. We can even provide a shortcut
> > > > >>> > option to
> > > > >>> > >>> automatically do that for users, if needed.
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> Thank you~
> > > > >>> > >>>
> > > > >>> > >>> Xintong Song
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > > >>> > >> wrote:
> > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > >>> > >>>>
> > > > >>> > >>>> I agree that being able to define the resource
> > > requirements
> > > > for a
> > > > >>> > >> group of
> > > > >>> > >>>> operators is more user friendly. However, my concern is
> > > that
> > > > >>> > we are
> > > > >>> > >>>> exposing thereby internal runtime strategies which might
> > > > >>> > limit our
> > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > semantics
> > > > of
> > > > >>> > >> configuring
> > > > >>> > >>>> resource requirements for SSGs could break if switching
> > > from
> > > > >>> > streaming
> > > > >>> > >> to
> > > > >>> > >>>> batch execution. If one defines the resource requirements
> > > > for
> > > > >>> > op_1 ->
> > > > >>> > >> op_2
> > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > >>> > execution, then
> > > > >>> > >> how do
> > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > > >>> > executed with a
> > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > Consequently,
> > > > >>> > I am
> > > > >>> > >> still
> > > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > > >>> > requirements per
> > > > >>> > >>>> operator.
> > > > >>> > >>>>
> > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > easier:
> > > > >>> > If the
> > > > >>> > >> user
> > > > >>> > >>>> wants to use fine-grained resource requirements, then she
> > > > >>> > needs to
> > > > >>> > >> specify
> > > > >>> > >>>> the default size which is used for operators which have no
> > > > >>> > explicit
> > > > >>> > >>>> resource annotation. If this holds true, then every
> > > operator
> > > > >>> > would
> > > > >>> > >> have a
> > > > >>> > >>>> resource requirement and the system can try to execute the
> > > > >>> > operators
> > > > >>> > >> in the
> > > > >>> > >>>> best possible manner w/o being constrained by how the user
> > > > >>> > set the SSG
> > > > >>> > >>>> requirements.
> > > > >>> > >>>>
> > > > >>> > >>>> Cheers,
> > > > >>> > >>>> Till
> > > > >>> > >>>>
> > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > >>> > >>>> wrote:
> > > > >>> > >>>>
> > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > >>> > >>>>>
> > > > >>> > >>>>> Actually, your proposal has also come to my mind at some
> > > > >>> > point. And I
> > > > >>> > >>>> have
> > > > >>> > >>>>> some concerns about it.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> 1. It does not give users the same control as the
> > > SSG-based
> > > > >>> > approach.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> While both approaches do not require specifying for each
> > > > >>> > operator,
> > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > operators
> > > > >>> > >> together
> > > > >>> > >>>> use
> > > > >>> > >>>>> this much resource" while the operator-based approach
> > > > doesn't.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> > > > >>> > o_m), and
> > > > >>> > >> at
> > > > >>> > >>>> some
> > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which significantly
> > > > >>> > reduces the
> > > > >>> > >> data
> > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> > > > >>> > (o_1, ...,
> > > > >>> > >> o_n)
> > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > > higher
> > > > >>> > >> parallelisms
> > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> > > > >>> > lead to too
> > > > >>> > >> much
> > > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > > different
> > > > >>> > >> resources,
> > > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > > >>> > resources for
> > > > >>> > >> the
> > > > >>> > >>>> two
> > > > >>> > >>>>> groups. However, with the operator-based approach, the
> > > > user will
> > > > >>> > >> have to
> > > > >>> > >>>>> specify resources for each operator in one of the two
> > > > >>> > groups, and
> > > > >>> > >> tune
> > > > >>> > >>>> the
> > > > >>> > >>>>> default slot resource via configurations to fit the other
> > > > group.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> 2. It increases the chance of breaking operator chains.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Setting chainnable operators into different slot sharing
> > > > >>> > groups will
> > > > >>> > >>>>> prevent them from being chained. In the current
> > > > implementation,
> > > > >>> > >>>> downstream
> > > > >>> > >>>>> operators, if SSG not explicitly specified, will be set
> > > to
> > > > >>> > the same
> > > > >>> > >> group
> > > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > > upstream
> > > > >>> > >> operators
> > > > >>> > >>>> in
> > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > chains.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> > > > >>> > deciding
> > > > >>> > >> SSGs
> > > > >>> > >>>>> based on whether resource is specified we will easily get
> > > > >>> > groups like
> > > > >>> > >>>> (o_1,
> > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > > >>> > chained. This
> > > > >>> > >> is
> > > > >>> > >>>> also
> > > > >>> > >>>>> possible for the SSG-based approach, but I believe the
> > > > >>> > chance is much
> > > > >>> > >>>>> smaller because there's no strong reason for users to
> > > > >>> > specify the
> > > > >>> > >> groups
> > > > >>> > >>>>> with alternate operators like that. We are more likely to
> > > > >>> > get groups
> > > > >>> > >> like
> > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > > between
> > > > >>> > o_2 and
> > > > >>> > >> o_3.
> > > > >>> > >>>>>
> > > > >>> > >>>>> 3. It complicates the system by having two different
> > > > >>> > mechanisms for
> > > > >>> > >>>> sharing
> > > > >>> > >>>>> managed memory in a slot.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > memory
> > > > >>> > sharing
> > > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > > >>> > according to the
> > > > >>> > >>>>> consumer type, then further distributed across operators
> > > > of that
> > > > >>> > >> consumer
> > > > >>> > >>>>> type.
> > > > >>> > >>>>>
> > > > >>> > >>>>> - With the operator-based approach, managed memory size
> > > > >>> > specified
> > > > >>> > >> for an
> > > > >>> > >>>>> operator should account for all the consumer types of
> > > that
> > > > >>> > operator.
> > > > >>> > >> That
> > > > >>> > >>>>> means the managed memory is first distributed across
> > > > >>> > operators, then
> > > > >>> > >>>>> distributed to different consumer types of each operator.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Unfortunately, the different order of the two calculation
> > > > >>> > steps can
> > > > >>> > >> lead
> > > > >>> > >>>> to
> > > > >>> > >>>>> different results. To be specific, the semantic of the
> > > > >>> > configuration
> > > > >>> > >>>> option
> > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within an
> > > > >>> > operator).
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> To sum up things:
> > > > >>> > >>>>>
> > > > >>> > >>>>> While (3) might be a bit more implementation related, I
> > > > >>> > think (1)
> > > > >>> > >> and (2)
> > > > >>> > >>>>> somehow suggest that, the price for the proposed approach
> > > > to
> > > > >>> > avoid
> > > > >>> > >>>>> specifying resource for every operator is that it's not
> > > as
> > > > >>> > >> independent
> > > > >>> > >>>> from
> > > > >>> > >>>>> operator chaining and slot sharing as the operator-based
> > > > >>> > approach
> > > > >>> > >>>> discussed
> > > > >>> > >>>>> in the FLIP.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Thank you~
> > > > >>> > >>>>>
> > > > >>> > >>>>> Xintong Song
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > > >>> > >> wrote:
> > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > > >>> > written. And
> > > > >>> > >> the
> > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > >>> > configuration to
> > > > >>> > >>>> users
> > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > >>> > >>>>>> So good job here!
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> About how to let users specify the resource profiles.
> > > If I
> > > > >>> > can sum
> > > > >>> > >> the
> > > > >>> > >>>>> FLIP
> > > > >>> > >>>>>> and previous discussion up in my own words, the problem
> > > > is the
> > > > >>> > >>>> following:
> > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > cleanest
> > > > >>> > approach,
> > > > >>> > >>>>> because
> > > > >>> > >>>>>>> it avoids mixing operator configuration (resource) and
> > > > >>> > >> scheduling. No
> > > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > > sharing,
> > > > >>> > >>>> switching
> > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource profiles
> > > > >>> > stay the
> > > > >>> > >>>> same.
> > > > >>> > >>>>>>> But it would require that a user specifies resources on
> > > > all
> > > > >>> > >>>> operators,
> > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > suggests
> > > > going
> > > > >>> > >> with
> > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> I think both thoughts are important, so can we find a
> > > > solution
> > > > >>> > >> where
> > > > >>> > >>>> the
> > > > >>> > >>>>>> Resource Profiles are specified on an Operator, but we
> > > > >>> > still avoid
> > > > >>> > >> that
> > > > >>> > >>>>> we
> > > > >>> > >>>>>> need to specify a resource profile on every operator?
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> What do you think about something like the following:
> > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > level.
> > > > >>> > >>>>>> - Not all operators need profiles
> > > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > > in
> > > > the
> > > > >>> > >> default
> > > > >>> > >>>> slot
> > > > >>> > >>>>>> sharing group with a default profile (will get a default
> > > > slot).
> > > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > > >>> > another slot
> > > > >>> > >>>>> sharing
> > > > >>> > >>>>>> group (the resource-specified-group).
> > > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > > >>> > operators
> > > > >>> > >> like
> > > > >>> > >>>>> they
> > > > >>> > >>>>>> do now, with the exception that you cannot mix operators
> > > > >>> > that have
> > > > >>> > >> a
> > > > >>> > >>>>>> resource profile and operators that have no resource
> > > > profile.
> > > > >>> > >>>>>> - The default case where no operator has a resource
> > > > >>> > profile is
> > > > >>> > >> just a
> > > > >>> > >>>>>> special case of this model
> > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > operator,
> > > > >>> > like it
> > > > >>> > >> does
> > > > >>> > >>>>> now,
> > > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks that
> > > > it
> > > > >>> > >> schedules
> > > > >>> > >>>>>> together.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> There is another question about reactive scaling raised
> > > > in the
> > > > >>> > >> FLIP. I
> > > > >>> > >>>>> need
> > > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > > tricky
> > > > >>> > once we
> > > > >>> > >>>> have
> > > > >>> > >>>>>> slots of different sizes.
> > > > >>> > >>>>>> It is not clear then which of the different slot
> > > requests
> > > > the
> > > > >>> > >>>>>> ResourceManager should fulfill when new resources (TMs)
> > > > >>> > show up,
> > > > >>> > >> or how
> > > > >>> > >>>>> the
> > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > resources
> > > > >>> > (TMs)
> > > > >>> > >>>>> disappear
> > > > >>> > >>>>>> This question is pretty orthogonal, though, to the "how
> > > to
> > > > >>> > specify
> > > > >>> > >> the
> > > > >>> > >>>>>> resources".
> > > > >>> > >>>>>>
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> Best,
> > > > >>> > >>>>>> Stephan
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > > >>> > >>>>> wrote:
> > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > discussion,
> > > > >>> > Yangze.
> > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> @Till,
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > > that
> > > > SSGs
> > > > >>> > >> need to
> > > > >>> > >>>>> be
> > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > otherwise
> > > > each
> > > > >>> > >>>> operator
> > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > However,
> > > > I
> > > > >>> > cannot
> > > > >>> > >>>> think
> > > > >>> > >>>>>> of
> > > > >>> > >>>>>>> a strong reason for not supporting SSGs in fine-grained
> > > > >>> > resource
> > > > >>> > >>>>>>> management.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>> Interestingly, if all operators have their resources
> > > > properly
> > > > >>> > >>>>>> specified,
> > > > >>> > >>>>>>>> then slot sharing is no longer needed because Flink
> > > > could
> > > > >>> > >> slice off
> > > > >>> > >>>>> the
> > > > >>> > >>>>>>>> appropriately sized slots for every Task individually.
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > > >>> > operator op_1
> > > > >>> > >> and
> > > > >>> > >>>>> op_2
> > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would then
> > > say
> > > > that
> > > > >>> > >> the
> > > > >>> > >>>> slot
> > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > have
> > > > a
> > > > >>> > >> cluster
> > > > >>> > >>>>> with
> > > > >>> > >>>>>> 2
> > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > cannot run
> > > > >>> > >> this
> > > > >>> > >>>>> job.
> > > > >>> > >>>>>> If
> > > > >>> > >>>>>>>> the resources were specified on an operator level,
> > > then
> > > > the
> > > > >>> > >> system
> > > > >>> > >>>>>> could
> > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > op_2
> > > > to
> > > > >>> > >> TM_2.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Couldn't agree more that if all operators' requirements
> > > > are
> > > > >>> > >> properly
> > > > >>> > >>>>>>> specified, slot sharing should be no longer needed. I
> > > > >>> > think this
> > > > >>> > >>>>> exactly
> > > > >>> > >>>>>>> disproves the example. If we already know op_1 and op_2
> > > > each
> > > > >>> > >> needs
> > > > >>> > >>>> 100
> > > > >>> > >>>>> MB
> > > > >>> > >>>>>>> of memory, why would we put them in the same group? If
> > > > >>> > they are
> > > > >>> > >> in
> > > > >>> > >>>>>> separate
> > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > freely
> > > > >>> > deploy
> > > > >>> > >> them
> > > > >>> > >>>> to
> > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Moreover, the precondition for not needing slot sharing
> > > > is
> > > > >>> > having
> > > > >>> > >>>>>> resource
> > > > >>> > >>>>>>> requirements properly specified for all operators. This
> > > > is not
> > > > >>> > >> always
> > > > >>> > >>>>>>> possible, and usually requires tremendous efforts. One
> > > > of the
> > > > >>> > >>>> benefits
> > > > >>> > >>>>>> for
> > > > >>> > >>>>>>> SSG-based requirements is that it allows the user to
> > > > freely
> > > > >>> > >> decide
> > > > >>> > >>>> the
> > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > > >>> > consider SSG
> > > > >>> > >> in
> > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > operators
> > > > >>> > that the
> > > > >>> > >>>> user
> > > > >>> > >>>>>>> would like to specify the total resource for. There can
> > > > be
> > > > >>> > only
> > > > >>> > >> one
> > > > >>> > >>>>> group
> > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > > major
> > > > >>> > parts,
> > > > >>> > >> or as
> > > > >>> > >>>>>> many
> > > > >>> > >>>>>>> groups as the number of tasks/operators, depending on
> > > how
> > > > >>> > >>>> fine-grained
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>> user is able to specify the resources.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But given
> > > > >>> > that all
> > > > >>> > >> the
> > > > >>> > >>>>>>> current scheduler implementations already support
> > > SSGs, I
> > > > >>> > tend to
> > > > >>> > >>>> think
> > > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > > >>> > usability and
> > > > >>> > >>>>>>> flexibility.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> @Chesnay
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > > waste
> > > > >>> > >> resources
> > > > >>> > >>>> if
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > different?
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>> Yes. It's a trade-off between usability and resource
> > > > >>> > >> utilization. To
> > > > >>> > >>>>>> avoid
> > > > >>> > >>>>>>> such wasting, the user can define more groups, so that
> > > > >>> > each group
> > > > >>> > >>>>>> contains
> > > > >>> > >>>>>>> less operators and the chance of having operators with
> > > > >>> > different
> > > > >>> > >>>>>>> parallelism will be reduced. The price is to have more
> > > > >>> > resource
> > > > >>> > >>>>>>> requirements to specify.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> It also seems like quite a hassle for users having to
> > > > >>> > >> recalculate the
> > > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > > create
> > > > >>> > >> a set
> > > > >>> > >>>>> of
> > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > their
> > > > >>> > >>>>> applications;
> > > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > > >>> > would be
> > > > >>> > >> a
> > > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > > >>> > >> requirements
> > > > >>> > >>>>> any
> > > > >>> > >>>>>>>> way.
> > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > increases
> > > > >>> > >>>>> usability.
> > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > there's no
> > > > >>> > >> reason to
> > > > >>> > >>>>> put
> > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > >>> > requirements are
> > > > >>> > >>>>> already
> > > > >>> > >>>>>>> known
> > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > management.
> > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > multiple
> > > > >>> > >>>>> applications,
> > > > >>> > >>>>>>> it does not guarantee the same resource
> > > requirements.
> > > > >>> > During
> > > > >>> > >> our
> > > > >>> > >>>>> years
> > > > >>> > >>>>>>> of
> > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > requirements
> > > > >>> > >> specified for
> > > > >>> > >>>>>>> Blink's
> > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > >>> > (including
> > > > >>> > >> our
> > > > >>> > >>>>>>> specialists
> > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > > >>> > >> experienced as
> > > > >>> > >>>>> to
> > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > >>> > >> requirements.
> > > > >>> > >>>> Most
> > > > >>> > >>>>>>> people
> > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > delay, cpu
> > > > >>> > >> load,
> > > > >>> > >>>>>> memory
> > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > specification.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> To sum up:
> > > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > > >>> > requirements
> > > > >>> > >> for
> > > > >>> > >>>>>> every
> > > > >>> > >>>>>>> operator, that's definitely a good thing and we would
> > > not
> > > > >>> > need to
> > > > >>> > >>>> rely
> > > > >>> > >>>>> on
> > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > > > >>> > >> fine-grained
> > > > >>> > >>>>>> resource
> > > > >>> > >>>>>>> management to work. For those users who are capable and
> > > > do not
> > > > >>> > >> like
> > > > >>> > >>>>>> having
> > > > >>> > >>>>>>> to set each operator to a separate SSG, I would be ok
> > > to
> > > > have
> > > > >>> > >> both
> > > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and to
> > > > only
> > > > >>> > >> fallback
> > > > >>> > >>>> to
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>> SSG requirements when the operator requirements are not
> > > > >>> > >> specified.
> > > > >>> > >>>>>> However,
> > > > >>> > >>>>>>> as the first step, I think we should prioritise the use
> > > > cases
> > > > >>> > >> where
> > > > >>> > >>>>> users
> > > > >>> > >>>>>>> are not that experienced.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Thank you~
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Xintong Song
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > > >>> > >> chesnay@apache.org <ma...@apache.org>>
> > > > >>> > >>>>>>> wrote:
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>> Will declaring them on slot sharing groups not also
> > > > waste
> > > > >>> > >> resources
> > > > >>> > >>>>> if
> > > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > > different?
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>>> It also seems like quite a hassle for users having to
> > > > >>> > >> recalculate
> > > > >>> > >>>> the
> > > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > > create
> > > > >>> > >> a set
> > > > >>> > >>>>> of
> > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > their
> > > > >>> > >>>>> applications;
> > > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > > >>> > would be
> > > > >>> > >> a
> > > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > > >>> > >> requirements
> > > > >>> > >>>>> any
> > > > >>> > >>>>>>>> way.
> > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > increases
> > > > >>> > >>>>> usability.
> > > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > > work
> > > > >>> > on SSGs
> > > > >>> > >>>> it's
> > > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > > approaches,
> > > > >>> > >> which
> > > > >>> > >>>>>>>> would not be the case if, for the runtime, they are
> > > > always
> > > > >>> > >> defined
> > > > >>> > >>>> on
> > > > >>> > >>>>>> an
> > > > >>> > >>>>>>>> operator-level.
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > > discussion
> > > > >>> > >>>> Yangze.
> > > > >>> > >>>>>>>>> I like that defining resource requirements on a slot
> > > > sharing
> > > > >>> > >>>> group
> > > > >>> > >>>>>>> makes
> > > > >>> > >>>>>>>>> the overall setup easier and improves usability of
> > > > resource
> > > > >>> > >>>>>>> requirements.
> > > > >>> > >>>>>>>>> What I do not like about it is that it changes slot
> > > > sharing
> > > > >>> > >>>> groups
> > > > >>> > >>>>>> from
> > > > >>> > >>>>>>>>> being a scheduling hint to something which needs to
> > > be
> > > > >>> > >> supported
> > > > >>> > >>>> in
> > > > >>> > >>>>>>> order
> > > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > > far,
> > > > the
> > > > >>> > >> idea
> > > > >>> > >>>> of
> > > > >>> > >>>>>> slot
> > > > >>> > >>>>>>>>> sharing groups was that it tells the system that a
> > > set
> > > > of
> > > > >>> > >>>> operators
> > > > >>> > >>>>>> can
> > > > >>> > >>>>>>>> be
> > > > >>> > >>>>>>>>> deployed in the same slot. But the system still had
> > > the
> > > > >>> > >> freedom
> > > > >>> > >>>> to
> > > > >>> > >>>>>> say
> > > > >>> > >>>>>>>> that
> > > > >>> > >>>>>>>>> it would rather place these tasks in different slots
> > > > if it
> > > > >>> > >>>> wanted.
> > > > >>> > >>>>> If
> > > > >>> > >>>>>>> we
> > > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > > sharing
> > > > >>> > >> group,
> > > > >>> > >>>> then
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>>>> only option for a scheduler which does not support
> > > slot
> > > > >>> > >> sharing
> > > > >>> > >>>>>> groups
> > > > >>> > >>>>>>> is
> > > > >>> > >>>>>>>>> to say that every operator in this slot sharing group
> > > > >>> > needs a
> > > > >>> > >>>> slot
> > > > >>> > >>>>>> with
> > > > >>> > >>>>>>>> the
> > > > >>> > >>>>>>>>> same resources as the whole group.
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> So for example, if we have a job consisting of two
> > > > operator
> > > > >>> > >> op_1
> > > > >>> > >>>>> and
> > > > >>> > >>>>>>> op_2
> > > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would then
> > > > say that
> > > > >>> > >> the
> > > > >>> > >>>>> slot
> > > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > > have a
> > > > >>> > >> cluster
> > > > >>> > >>>>>> with
> > > > >>> > >>>>>>> 2
> > > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > cannot run
> > > > >>> > >> this
> > > > >>> > >>>>>> job.
> > > > >>> > >>>>>>> If
> > > > >>> > >>>>>>>>> the resources were specified on an operator level,
> > > > then the
> > > > >>> > >>>> system
> > > > >>> > >>>>>>> could
> > > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > > op_2 to
> > > > >>> > >> TM_2.
> > > > >>> > >>>>>>>>> Originally, one of the primary goals of slot sharing
> > > > groups
> > > > >>> > >> was
> > > > >>> > >>>> to
> > > > >>> > >>>>>> make
> > > > >>> > >>>>>>>> it
> > > > >>> > >>>>>>>>> easier for the user to reason about how many slots a
> > > > job
> > > > >>> > >> needs
> > > > >>> > >>>>>>>> independent
> > > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > > Interestingly,
> > > > >>> > >> if
> > > > >>> > >>>> all
> > > > >>> > >>>>>>>>> operators have their resources properly specified,
> > > > then slot
> > > > >>> > >>>>> sharing
> > > > >>> > >>>>>> is
> > > > >>> > >>>>>>>> no
> > > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > > >>> > appropriately
> > > > >>> > >>>> sized
> > > > >>> > >>>>>>> slots
> > > > >>> > >>>>>>>>> for every Task individually. What matters is whether
> > > > the
> > > > >>> > >> whole
> > > > >>> > >>>>>> cluster
> > > > >>> > >>>>>>>> has
> > > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> Cheers,
> > > > >>> > >>>>>>>>> Till
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > > >>> > >> karmagyz@gmail.com <ma...@gmail.com>>
> > > > >>> > >>>>>> wrote:
> > > > >>> > >>>>>>>>>> Hi, there,
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > > "FLIP-156:
> > > > >>> > >> Runtime
> > > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > > Requirements"[1],
> > > > >>> > >> where we
> > > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > > interfaces
> > > > >>> > >> for
> > > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> In this FLIP:
> > > > >>> > >>>>>>>>>> - Expound the user story of fine-grained resource
> > > > >>> > >> management.
> > > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > > SSG-based
> > > > >>> > >> resource
> > > > >>> > >>>>>>>>>> requirements.
> > > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three potential
> > > > >>> > >> granularities
> > > > >>> > >>>>> for
> > > > >>> > >>>>>>>>>> specifying the resource requirements (op, task and
> > > > slot
> > > > >>> > >> sharing
> > > > >>> > >>>>>> group)
> > > > >>> > >>>>>>>>>> and explain why we choose the slot sharing group.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki document
> > > > [1].
> > > > >>> > >> Looking
> > > > >>> > >>>>>>>>>> forward to your feedback.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> [1]
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>
> > > > >>> >
> > > >
> > >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >>> > <
> > > >
> > >
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >
> > > > >>> > >>>>>>>>>> Best,
> > > > >>> > >>>>>>>>>> Yangze Guo
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>
> > > > >>> >
> > > > >>>
> > > >
> > >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Yangze Guo <ka...@gmail.com>.

Hi, Kezhu.

Thanks for your feedback.

> Flink *runtime* already has an ideal granularity for resource management 'task'.
As mentioned in FLIP, there are some ancient codes in Flink code base,
but these codes are never really used and exposed to user. So, there
is actually no operator or SSG level resource requirements, but the
slot is already the basic unit for resource management in Flink’s
runtime.

> that SSG resource specifying will override operator level resource specifying if both are specified
We now treat the operator level resource specifying as a potential
follow up for the fine-grained resource management. We need to collect
more feedbacks to decide whether we really need it. Regarding whether
and how to allow hybrid (SSG + OP) configuration, I think there might
be no point in discussing it at present.

UUIC, your proposal based on the assumption that we already have the
operator level resource configuration and target to solve how to
determine the slot resource spec when both configurations exist.
- First, we do not ensure that we need operator-level resource
configuration atm.
- Second, we do even not sure whether we need to support hybrid configuration.

So, as written in the future plan, I tend to first collect feedbacks
on the operator-level resource configuration interface when the
fine-grained resource management is ready. Then, we consider further
optimization, such as your proposal.


Best,
Yangze Guo

On Wed, Feb 3, 2021 at 1:28 PM Kezhu Wang <ke...@gmail.com> wrote:
>
> Hi all, sorry for join discussion even after voting started.
>
> I want to share my thoughts on this after reading above discussions.
>
> I think Flink *runtime* already has an ideal granularity for resource
> management 'task'. If there is
> a slot shared by multiple tasks, that slot's resource requirement is simple
> sum of all its logical
> slots. So basically, this is no resource requirement for SlotSharingGroup
> in runtime until now,
> right ?
>
> As in discussion, we already agree upon that: "If all operators have their
> resources properly
> specified, then slot sharing is no longer needed. "
>
> So seems to me, naturally in mind path, what we would discuss is that: how
> to bridge impractical
> operator level resource specifying to runtime task level resource
> requirement ? This is actually a
> pure api thing as Chesnay has pointed out.
>
> But FLIP-156 brings another direction on table: how about using SSG for
> both api and runtime
> resource specifying ?
>
> From the FLIP and dicusssion, I assume that SSG resource specifying will
> override operator level
> resource specifying if both are specified ?
>
> So, I wonder whether we could interpret SSG resource specifying as an "add"
> but not an "set" on
> resource requirement ?
>
> The semantics is that SSG resource specifying adds additional resource to
> shared slot to express
> concerns on possible high thoughput and resource requirement for tasks in
> one physical slot.
>
> The result is that if scheduler indeed respect slot sharing, allocated slot
> will gain extra resource
> specified for that SSG.
>
> I think one of coding barrier from "add" approach is ResourceSpec.UNKNOWN
> which didn't support
> 'merge' operation. I tend to use ResourceSpec.ZERO as default, task
> executor should be aware of
> this.
>
> @Chesnay
> > My main worry is that it if we wire the runtime to work on SSGs it's
> > gonna be difficult to implement more fine-grained approaches, which
> > would not be the case if, for the runtime, they are always defined on an
> > operator-level.
>
> An "add" operation should be less invasive and enforce low barrier for
> future find-grained
> approaches.
>
> @Stephan
> >   - Users can define different slot sharing groups for operators like
> they
> > do now, with the exception that you cannot mix operators that have a
> > resource profile and operators that have no resource profile.
>
> @Till
> > This effectively means that all unspecified operators
> > will implicitly have a zero resource requirement.
> > I am wondering whether this wouldn't lead to a surprising behaviour for
> the
> > user. If the user specifies the resource requirements for a single
> > operator, then he probably will assume that the other operators will get
> > the default share of resources and not nothing.
>
> I think it is inherent due to fact that we could not defining
> ResourceSpec.ONE, eg. resource
> requirement for exact one default slot, with concrete numbers ? I tend to
> squash out unspecified one
> if there are operators in chaining with explicit resource specifying.
> Otherwise, the protocol tends
> to verbose as say "give me this much resource and a default". I think if we
> have explict resource
> specifying for partial operators, it is just saying "I don't care other
> operators that much, just
> get them places to run". It is most likely be cases there are stateless
> fliter/map or other less
> resource consuming operators. If there is indeed a problem, I think clients
> can specify a global
> default(or other level default in future). In job graph generating phase,
> we could take that default
> into account for unspecified operators.
>
> @FLIP-156
> > Expose operator chaining. (Cons fo task level resource specifying)
>
> Is it inherent for all group level resource specifying ? They will either
> break chaining or obey it,
> or event could not work with.
>
> To sum up above, my suggestions are:
>
> In api side:
> * StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
> unspecified).
> * Operator: ResourceSpec.ZERO(unspecified) as default.
> * Task: sum of requirements from specified operators + global default(if
> there are any unspecified operators)
> * SSG: additional resource to physical slot.
>
> In runtime side:
> * Task: ResourceSpec.Task or ResourceSpec.ZERO
> * SSG: ResourceSpec.SSG or ResourceSpec.ZERO
>
> Physical slot gets sum up resources from logical slots and SSG, if it gets
> ResourceSpec.ZERO, it is
> just a default sized slot.
>
> In short, turn SSG resource speciying as "add" and drop
> ResourceSpec.UNKNOWN.
>
>
> Questions/Issues:
> * Could SSG express negative resource requirement ?
> * Is there concrete bar for partial resource configured not function ? I
> saw it will fail job submission in Dispatcher.submitJob.
> * An option(cluster/job level) to force slot sharing in scheduler ? This
> could be useful in case of migration from FLIP-156 to future approach.
> * An option(cluster) to ignore resource specifying(allow resource specified
> job to run on open box environment) for no production usage ?
>
>
>
> On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com) wrote:
>
> Thanks for reply, Till and Xintong!
>
> I update the FLIP, including:
> - Edit the JavaDoc of the proposed
> StreamGraphGenerator#setSlotSharingGroupResource.
> - Add "Future Plan" section, which contains the potential follow-up
> issues and the limitations to be documented when fine-grained resource
> management is exposed to users.
>
> I'll start a vote in another thread.
>
> Best,
> Yangze Guo
>
> On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > Thanks for summarizing the discussion, Yangze. I agree that setting
> > resource requirements per operator is not very user friendly. Moreover, I
> > couldn't come up with a different proposal which would be as easy to use
> > and wouldn't expose internal scheduling details. In fact, following this
> > argument then we shouldn't have exposed the slot sharing groups in the
> > first place.
> >
> > What is important for the user is that we properly document the
> limitations
> > and constraints the fine grained resource specification has. For example,
> > we should explain how optimizations like chaining are affected by it and
> > how different execution modes (batch vs. streaming) affect the execution
> of
> > operators which have specified resources. These things shouldn't become
> > part of the contract of this feature and are more caused by internal
> > implementation details but it will be important to understand these
> things
> > properly in order to use this feature effectively.
> >
> > Hence, +1 for starting the vote for this FLIP.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks for the summary, Yangze.
> > >
> > > The changes and follow-up issues LGTM. Let's wait for responses from
> the
> > > others before starting a vote.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > > > Thanks everyone for the lively discussion. I'd like to try to
> > > > summarize the current convergence in the discussion. Please let me
> > > > know if I got things wrong or missed something crucial here.
> > > >
> > > > Change of this FLIP:
> > > > - Treat the SSG resource requirements as a hint instead of a
> > > > restriction for the runtime. That's should be explicitly explained in
> > > > the JavaDocs.
> > > >
> > > > Potential follow-up issues if needed:
> > > > - Provide operator-level resource configuration interface.
> > > > - Provide multiple options for deciding resources for SSGs whose
> > > > requirement is not specified:
> > > > ** Default slot resource.
> > > > ** Default operator resource times number of operators.
> > > >
> > > > If there are no other issues, I'll update the FLIP accordingly and
> > > > start a vote thread. Thanks all for the valuable feedback again.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > >
> > > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <to...@gmail.com>
> > > > wrote:
> > > > >
> > > > >
> > > > > FGRuntimeInterface.png
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <to...@gmail.com>
>
> > > > wrote:
> > > > >>
> > > > >> I think Chesnay's proposal could actually work. IIUC, the keypoint
> is
> > > > to derive operator requirements from SSG requirements on the API
> side, so
> > > > that the runtime only deals with operator requirements. It's
> debatable
> > > how
> > > > the deriving should be done though. E.g., an alternative could be to
> > > evenly
> > > > divide the SSG requirement into requirements of operators in the
> group.
> > > > >>
> > > > >>
> > > > >> However, I'm not entirely sure which option is more desired.
> > > > Illustrating my understanding in the following figure, in which on
> the
> > > top
> > > > is Chesnay's proposal and on the bottom is the SSG-based proposal in
> this
> > > > FLIP.
> > > > >>
> > > > >>
> > > > >>
> > > > >> I think the major difference between the two approaches is where
> > > > deriving operator requirements from SSG requirements happens.
> > > > >>
> > > > >> - Chesnay's proposal simplifies the runtime logic and the
> interface to
> > > > expose, at the price of moving more complexity (i.e. the deriving) to
> the
> > > > API side. The question is, where do we prefer to keep the complexity?
> I'm
> > > > slightly leaning towards having a thin API and keep the complexity in
> > > > runtime if possible.
> > > > >>
> > > > >> - Notice that the dash line arrows represent optional steps that
> are
> > > > needed only for schedulers that do not respect SSGs, which we don't
> have
> > > at
> > > > the moment. If we only look at the solid line arrows, then the
> SSG-based
> > > > approach is much simpler, without needing to derive and aggregate the
> > > > requirements back and forth. I'm not sure about complicating the
> current
> > > > design only for the potential future needs.
> > > > >>
> > > > >>
> > > > >> Thank you~
> > > > >>
> > > > >> Xintong Song
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
> chesnay@apache.org>
> > > > wrote:
> > > > >>>
> > > > >>> You're raising a good point, but I think I can rectify that with
> a
> > > > minor
> > > > >>> adjustment.
> > > > >>>
> > > > >>> Default requirements are whatever the default requirements are,
> > > setting
> > > > >>> the requirements for one operator has no effect on other
> operators.
> > > > >>>
> > > > >>> With these rules, and some API enhancements, the following mockup
> > > would
> > > > >>> replicate the SSG-based behavior:
> > > > >>>
> > > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > > >>> vertices = slotSharingGroup.getVertices()
> > > > >>>
> > > >
> > >
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > > >>> vertices.remainint().setRequirements(ZERO)
> > > > >>> }
> > > > >>>
> > > > >>> We could even allow setting requirements on slotsharing-groups
> > > > >>> colocation-groups and internally translate them accordingly.
> > > > >>> I can't help but feel this is a plain API issue.
> > > > >>>
> > > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > > >>> > If I understand you correctly Chesnay, then you want to
> decouple
> > > the
> > > > >>> > resource requirement specification from the slot sharing group
> > > > >>> > assignment. Hence, per default all operators would be in the
> same
> > > > slot
> > > > >>> > sharing group. If there is no operator with a resource
> > > specification,
> > > > >>> > then the system would allocate a default slot for it. If there
> is
> > > at
> > > > >>> > least one operator, then the system would sum up all the
> specified
> > > > >>> > resources and allocate a slot of this size. This effectively
> means
> > > > >>> > that all unspecified operators will implicitly have a zero
> resource
> > > > >>> > requirement. Did I understand your idea correctly?
> > > > >>> >
> > > > >>> > I am wondering whether this wouldn't lead to a surprising
> behaviour
> > > > >>> > for the user. If the user specifies the resource requirements
> for a
> > > > >>> > single operator, then he probably will assume that the other
> > > > operators
> > > > >>> > will get the default share of resources and not nothing.
> > > > >>> >
> > > > >>> > Cheers,
> > > > >>> > Till
> > > > >>> >
> > > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > > chesnay@apache.org
> > > > >>> > <ma...@apache.org>> wrote:
> > > > >>> >
> > > > >>> > Is there even a functional difference between specifying the
> > > > >>> > requirements for an SSG vs specifying the same requirements on
> > > a
> > > > >>> > single
> > > > >>> > operator within that group (ideally a colocation group to avoid
> > > > this
> > > > >>> > whole hint business)?
> > > > >>> >
> > > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > > >>> >
> > > > >>> > Users can take shortcuts to define shared requirements,
> > > > >>> > but refine them further as needed on a per-operator basis,
> > > > >>> > without changing semantics of slotsharing groups
> > > > >>> > nor the runtime being locked into SSG-based requirements.
> > > > >>> >
> > > > >>> > (And before anyone argues what happens if slotsharing groups
> > > > >>> > change or
> > > > >>> > whatnot, that's a plain API issue that we could surely solve.
> > > (A
> > > > >>> > plain
> > > > >>> > iteration over slotsharing groups and therein contained
> > > operators
> > > > >>> > would
> > > > >>> > suffice)).
> > > > >>> >
> > > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > > >>> > > Maybe a different minor idea: Would it be possible to treat
> > > > the SSG
> > > > >>> > > resource requirements as a hint for the runtime similar to
> > > how
> > > > >>> > slot sharing
> > > > >>> > > groups are designed at the moment? Meaning that we don't give
> > > > >>> > the guarantee
> > > > >>> > > that Flink will always deploy this set of tasks together no
> > > > >>> > matter what
> > > > >>> > > comes. If, for example, the runtime can derive by some means
> > > > the
> > > > >>> > resource
> > > > >>> > > requirements for each task based on the requirements for the
> > > > >>> > SSG, this
> > > > >>> > > could be possible. One easy strategy would be to give every
> > > > task
> > > > >>> > the same
> > > > >>> > > resources as the whole slot sharing group. Another one could
> > > be
> > > > >>> > > distributing the resources equally among the tasks. This does
> > > > >>> > not even have
> > > > >>> > > to be implemented but we would give ourselves the freedom to
> > > > change
> > > > >>> > > scheduling if need should arise.
> > > > >>> > >
> > > > >>> > > Cheers,
> > > > >>> > > Till
> > > > >>> > >
> > > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > > karmagyz@gmail.com
> > > > >>> > <ma...@gmail.com>> wrote:
> > > > >>> > >
> > > > >>> > >> Thanks for the responses, Till and Xintong.
> > > > >>> > >>
> > > > >>> > >> I second Xintong's comment that SSG-based runtime interface
> > > > >>> > will give
> > > > >>> > >> us the flexibility to achieve op/task-based approach. That's
> > > > one of
> > > > >>> > >> the most important reasons for our design choice.
> > > > >>> > >>
> > > > >>> > >> Some cents regarding the default operator resource:
> > > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > > >>> > >> ** For light-weight operators, the accumulative
> > > > >>> > configuration error
> > > > >>> > >> will not be significant. Then, the resource of a task used
> > > is
> > > > >>> > >> proportional to the number of operators it contains.
> > > > >>> > >> ** For heavy operators like join and window or operators
> > > > >>> > using the
> > > > >>> > >> external resources, user will turn to the fine-grained
> > > > resource
> > > > >>> > >> configuration.
> > > > >>> > >> - It can increase the stability for the standalone cluster
> > > > >>> > where task
> > > > >>> > >> executors registered are heterogeneous(with different
> > > default
> > > > slot
> > > > >>> > >> resources).
> > > > >>> > >> - It might not be good for SQL users. The operators that SQL
> > > > >>> > will be
> > > > >>> > >> transferred to is a black box to the user. We also do not
> > > > guarantee
> > > > >>> > >> the cross-version of consistency of the transformation so
> > > far.
> > > > >>> > >>
> > > > >>> > >> I think it can be treated as a follow-up work when the
> > > > fine-grained
> > > > >>> > >> resource management is end-to-end ready.
> > > > >>> > >>
> > > > >>> > >> Best,
> > > > >>> > >> Yangze Guo
> > > > >>> > >>
> > > > >>> > >>
> > > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > >>> > >> wrote:
> > > > >>> > >>> Thanks for the feedback, Till.
> > > > >>> > >>>
> > > > >>> > >>> ## I feel that what you proposed (operator-based + default
> > > > >>> > value) might
> > > > >>> > >> be
> > > > >>> > >>> subsumed by the SSG-based approach.
> > > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> > > > >>> > categorized by
> > > > >>> > >>> whether the resource requirements are known to the users.
> > > > >>> > >>>
> > > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > > >>> > reason to put
> > > > >>> > >>> multiple operators whose individual resource
> > > requirements
> > > > >>> > are already
> > > > >>> > >> known
> > > > >>> > >>> into the same group in fine-grained resource
> > > management.
> > > > >>> > And if op_1
> > > > >>> > >> and
> > > > >>> > >>> op_2 are in different groups, there should be no
> > > problem
> > > > >>> > switching
> > > > >>> > >> data
> > > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > > >>> > equivalent to
> > > > >>> > >> specifying
> > > > >>> > >>> operator resource requirements in your proposal.
> > > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > > that
> > > > >>> > op_2 is in a
> > > > >>> > >>> SSG whose resource is not specified thus would have the
> > > > >>> > default slot
> > > > >>> > >>> resource. This is equivalent to having default operator
> > > > >>> > resources in
> > > > >>> > >> your
> > > > >>> > >>> proposal.
> > > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > > op_2
> > > > >>> > to the same
> > > > >>> > >> SSG
> > > > >>> > >>> or separate SSGs.
> > > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > > >>> > equivalent to
> > > > >>> > >> the
> > > > >>> > >>> coarse-grained resource management, where op_1 and
> > > > op_2
> > > > >>> > share a
> > > > >>> > >> default
> > > > >>> > >>> size slot no matter which data exchange mode is
> > > used.
> > > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > > of
> > > > >>> > them will
> > > > >>> > >> use
> > > > >>> > >>> a default size slot. This is equivalent to setting
> > > > them
> > > > >>> > with
> > > > >>> > >> default
> > > > >>> > >>> operator resources in your proposal.
> > > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > > is
> > > > >>> > known.*
> > > > >>> > >>> - It is possible that the user learns the total /
> > > max
> > > > >>> > resource
> > > > >>> > >>> requirement from executing and monitoring the job,
> > > > >>> > while not
> > > > >>> > >>> being aware of
> > > > >>> > >>> individual operator requirements.
> > > > >>> > >>> - I believe this is the case your proposal does not
> > > > >>> > cover. And TBH,
> > > > >>> > >>> this is probably how most users learn the resource
> > > > >>> > requirements,
> > > > >>> > >>> according
> > > > >>> > >>> to my experiences.
> > > > >>> > >>> - In this case, the user might need to specify
> > > > >>> > different resources
> > > > >>> > >> if
> > > > >>> > >>> he wants to switch the execution mode, which should
> > > > not
> > > > >>> > be worse
> > > > >>> > >> than not
> > > > >>> > >>> being able to use fine-grained resource management.
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> ## An additional idea inspired by your proposal.
> > > > >>> > >>> We may provide multiple options for deciding resources for
> > > > >>> > SSGs whose
> > > > >>> > >>> requirement is not specified, if needed.
> > > > >>> > >>>
> > > > >>> > >>> - Default slot resource (current design)
> > > > >>> > >>> - Default operator resource times number of operators
> > > > >>> > (equivalent to
> > > > >>> > >>> your proposal)
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> ## Exposing internal runtime strategies
> > > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > > >>> > requirements might be
> > > > >>> > >>> affected if how SSGs are internally handled changes in
> > > > future.
> > > > >>> > >> Practically,
> > > > >>> > >>> I do not concretely see at the moment what kind of changes
> > > we
> > > > >>> > may want in
> > > > >>> > >>> future that might conflict with this FLIP proposal, as the
> > > > >>> > question of
> > > > >>> > >>> switching data exchange mode answered above. I'd suggest to
> > > > >>> > not give up
> > > > >>> > >> the
> > > > >>> > >>> user friendliness we may gain now for the future problems
> > > > that
> > > > >>> > may or may
> > > > >>> > >>> not exist.
> > > > >>> > >>>
> > > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > > >>> > achieve the
> > > > >>> > >>> equivalent behavior as the operator-based approach, if we
> > > > set each
> > > > >>> > >> operator
> > > > >>> > >>> (or task) to a separate SSG. We can even provide a shortcut
> > > > >>> > option to
> > > > >>> > >>> automatically do that for users, if needed.
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> Thank you~
> > > > >>> > >>>
> > > > >>> > >>> Xintong Song
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>>
> > > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > > >>> > >> wrote:
> > > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > > >>> > >>>>
> > > > >>> > >>>> I agree that being able to define the resource
> > > requirements
> > > > for a
> > > > >>> > >> group of
> > > > >>> > >>>> operators is more user friendly. However, my concern is
> > > that
> > > > >>> > we are
> > > > >>> > >>>> exposing thereby internal runtime strategies which might
> > > > >>> > limit our
> > > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > > semantics
> > > > of
> > > > >>> > >> configuring
> > > > >>> > >>>> resource requirements for SSGs could break if switching
> > > from
> > > > >>> > streaming
> > > > >>> > >> to
> > > > >>> > >>>> batch execution. If one defines the resource requirements
> > > > for
> > > > >>> > op_1 ->
> > > > >>> > >> op_2
> > > > >>> > >>>> which run in pipelined mode when using the streaming
> > > > >>> > execution, then
> > > > >>> > >> how do
> > > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > > >>> > executed with a
> > > > >>> > >>>> blocking data exchange in batch execution mode?
> > > > Consequently,
> > > > >>> > I am
> > > > >>> > >> still
> > > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > > >>> > requirements per
> > > > >>> > >>>> operator.
> > > > >>> > >>>>
> > > > >>> > >>>> Maybe the following proposal makes the configuration
> > > easier:
> > > > >>> > If the
> > > > >>> > >> user
> > > > >>> > >>>> wants to use fine-grained resource requirements, then she
> > > > >>> > needs to
> > > > >>> > >> specify
> > > > >>> > >>>> the default size which is used for operators which have no
> > > > >>> > explicit
> > > > >>> > >>>> resource annotation. If this holds true, then every
> > > operator
> > > > >>> > would
> > > > >>> > >> have a
> > > > >>> > >>>> resource requirement and the system can try to execute the
> > > > >>> > operators
> > > > >>> > >> in the
> > > > >>> > >>>> best possible manner w/o being constrained by how the user
> > > > >>> > set the SSG
> > > > >>> > >>>> requirements.
> > > > >>> > >>>>
> > > > >>> > >>>> Cheers,
> > > > >>> > >>>> Till
> > > > >>> > >>>>
> > > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > > >>> > >>>> wrote:
> > > > >>> > >>>>
> > > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > > >>> > >>>>>
> > > > >>> > >>>>> Actually, your proposal has also come to my mind at some
> > > > >>> > point. And I
> > > > >>> > >>>> have
> > > > >>> > >>>>> some concerns about it.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> 1. It does not give users the same control as the
> > > SSG-based
> > > > >>> > approach.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> While both approaches do not require specifying for each
> > > > >>> > operator,
> > > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > > operators
> > > > >>> > >> together
> > > > >>> > >>>> use
> > > > >>> > >>>>> this much resource" while the operator-based approach
> > > > doesn't.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> > > > >>> > o_m), and
> > > > >>> > >> at
> > > > >>> > >>>> some
> > > > >>> > >>>>> point there's an agg o_n (1 < n < m) which significantly
> > > > >>> > reduces the
> > > > >>> > >> data
> > > > >>> > >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> > > > >>> > (o_1, ...,
> > > > >>> > >> o_n)
> > > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > > higher
> > > > >>> > >> parallelisms
> > > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> > > > >>> > lead to too
> > > > >>> > >> much
> > > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > > different
> > > > >>> > >> resources,
> > > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > > >>> > resources for
> > > > >>> > >> the
> > > > >>> > >>>> two
> > > > >>> > >>>>> groups. However, with the operator-based approach, the
> > > > user will
> > > > >>> > >> have to
> > > > >>> > >>>>> specify resources for each operator in one of the two
> > > > >>> > groups, and
> > > > >>> > >> tune
> > > > >>> > >>>> the
> > > > >>> > >>>>> default slot resource via configurations to fit the other
> > > > group.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> 2. It increases the chance of breaking operator chains.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Setting chainnable operators into different slot sharing
> > > > >>> > groups will
> > > > >>> > >>>>> prevent them from being chained. In the current
> > > > implementation,
> > > > >>> > >>>> downstream
> > > > >>> > >>>>> operators, if SSG not explicitly specified, will be set
> > > to
> > > > >>> > the same
> > > > >>> > >> group
> > > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > > upstream
> > > > >>> > >> operators
> > > > >>> > >>>> in
> > > > >>> > >>>>> different groups), to reduce the chance of breaking
> > > chains.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> > > > >>> > deciding
> > > > >>> > >> SSGs
> > > > >>> > >>>>> based on whether resource is specified we will easily get
> > > > >>> > groups like
> > > > >>> > >>>> (o_1,
> > > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > > >>> > chained. This
> > > > >>> > >> is
> > > > >>> > >>>> also
> > > > >>> > >>>>> possible for the SSG-based approach, but I believe the
> > > > >>> > chance is much
> > > > >>> > >>>>> smaller because there's no strong reason for users to
> > > > >>> > specify the
> > > > >>> > >> groups
> > > > >>> > >>>>> with alternate operators like that. We are more likely to
> > > > >>> > get groups
> > > > >>> > >> like
> > > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > > between
> > > > >>> > o_2 and
> > > > >>> > >> o_3.
> > > > >>> > >>>>>
> > > > >>> > >>>>> 3. It complicates the system by having two different
> > > > >>> > mechanisms for
> > > > >>> > >>>> sharing
> > > > >>> > >>>>> managed memory in a slot.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > > memory
> > > > >>> > sharing
> > > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > > >>> > according to the
> > > > >>> > >>>>> consumer type, then further distributed across operators
> > > > of that
> > > > >>> > >> consumer
> > > > >>> > >>>>> type.
> > > > >>> > >>>>>
> > > > >>> > >>>>> - With the operator-based approach, managed memory size
> > > > >>> > specified
> > > > >>> > >> for an
> > > > >>> > >>>>> operator should account for all the consumer types of
> > > that
> > > > >>> > operator.
> > > > >>> > >> That
> > > > >>> > >>>>> means the managed memory is first distributed across
> > > > >>> > operators, then
> > > > >>> > >>>>> distributed to different consumer types of each operator.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Unfortunately, the different order of the two calculation
> > > > >>> > steps can
> > > > >>> > >> lead
> > > > >>> > >>>> to
> > > > >>> > >>>>> different results. To be specific, the semantic of the
> > > > >>> > configuration
> > > > >>> > >>>> option
> > > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within an
> > > > >>> > operator).
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> To sum up things:
> > > > >>> > >>>>>
> > > > >>> > >>>>> While (3) might be a bit more implementation related, I
> > > > >>> > think (1)
> > > > >>> > >> and (2)
> > > > >>> > >>>>> somehow suggest that, the price for the proposed approach
> > > > to
> > > > >>> > avoid
> > > > >>> > >>>>> specifying resource for every operator is that it's not
> > > as
> > > > >>> > >> independent
> > > > >>> > >>>> from
> > > > >>> > >>>>> operator chaining and slot sharing as the operator-based
> > > > >>> > approach
> > > > >>> > >>>> discussed
> > > > >>> > >>>>> in the FLIP.
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> Thank you~
> > > > >>> > >>>>>
> > > > >>> > >>>>> Xintong Song
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>>
> > > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > > >>> > >> wrote:
> > > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > > >>> > written. And
> > > > >>> > >> the
> > > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > > >>> > configuration to
> > > > >>> > >>>> users
> > > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > > >>> > >>>>>> So good job here!
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> About how to let users specify the resource profiles.
> > > If I
> > > > >>> > can sum
> > > > >>> > >> the
> > > > >>> > >>>>> FLIP
> > > > >>> > >>>>>> and previous discussion up in my own words, the problem
> > > > is the
> > > > >>> > >>>> following:
> > > > >>> > >>>>>> Operator-level specification is the simplest and
> > > cleanest
> > > > >>> > approach,
> > > > >>> > >>>>> because
> > > > >>> > >>>>>>> it avoids mixing operator configuration (resource) and
> > > > >>> > >> scheduling. No
> > > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > > sharing,
> > > > >>> > >>>> switching
> > > > >>> > >>>>>>> pipelined and blocking shuffles), the resource profiles
> > > > >>> > stay the
> > > > >>> > >>>> same.
> > > > >>> > >>>>>>> But it would require that a user specifies resources on
> > > > all
> > > > >>> > >>>> operators,
> > > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > > suggests
> > > > going
> > > > >>> > >> with
> > > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> I think both thoughts are important, so can we find a
> > > > solution
> > > > >>> > >> where
> > > > >>> > >>>> the
> > > > >>> > >>>>>> Resource Profiles are specified on an Operator, but we
> > > > >>> > still avoid
> > > > >>> > >> that
> > > > >>> > >>>>> we
> > > > >>> > >>>>>> need to specify a resource profile on every operator?
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> What do you think about something like the following:
> > > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > > level.
> > > > >>> > >>>>>> - Not all operators need profiles
> > > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > > in
> > > > the
> > > > >>> > >> default
> > > > >>> > >>>> slot
> > > > >>> > >>>>>> sharing group with a default profile (will get a default
> > > > slot).
> > > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > > >>> > another slot
> > > > >>> > >>>>> sharing
> > > > >>> > >>>>>> group (the resource-specified-group).
> > > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > > >>> > operators
> > > > >>> > >> like
> > > > >>> > >>>>> they
> > > > >>> > >>>>>> do now, with the exception that you cannot mix operators
> > > > >>> > that have
> > > > >>> > >> a
> > > > >>> > >>>>>> resource profile and operators that have no resource
> > > > profile.
> > > > >>> > >>>>>> - The default case where no operator has a resource
> > > > >>> > profile is
> > > > >>> > >> just a
> > > > >>> > >>>>>> special case of this model
> > > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > > operator,
> > > > >>> > like it
> > > > >>> > >> does
> > > > >>> > >>>>> now,
> > > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks that
> > > > it
> > > > >>> > >> schedules
> > > > >>> > >>>>>> together.
> > > > >>> > >>>>>>
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> There is another question about reactive scaling raised
> > > > in the
> > > > >>> > >> FLIP. I
> > > > >>> > >>>>> need
> > > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > > tricky
> > > > >>> > once we
> > > > >>> > >>>> have
> > > > >>> > >>>>>> slots of different sizes.
> > > > >>> > >>>>>> It is not clear then which of the different slot
> > > requests
> > > > the
> > > > >>> > >>>>>> ResourceManager should fulfill when new resources (TMs)
> > > > >>> > show up,
> > > > >>> > >> or how
> > > > >>> > >>>>> the
> > > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > > resources
> > > > >>> > (TMs)
> > > > >>> > >>>>> disappear
> > > > >>> > >>>>>> This question is pretty orthogonal, though, to the "how
> > > to
> > > > >>> > specify
> > > > >>> > >> the
> > > > >>> > >>>>>> resources".
> > > > >>> > >>>>>>
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> Best,
> > > > >>> > >>>>>> Stephan
> > > > >>> > >>>>>>
> > > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > > >>> > >>>>> wrote:
> > > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > > discussion,
> > > > >>> > Yangze.
> > > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> @Till,
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > > that
> > > > SSGs
> > > > >>> > >> need to
> > > > >>> > >>>>> be
> > > > >>> > >>>>>>> supported in fine-grained resource management,
> > > otherwise
> > > > each
> > > > >>> > >>>> operator
> > > > >>> > >>>>>>> might use as many resources as the whole group.
> > > However,
> > > > I
> > > > >>> > cannot
> > > > >>> > >>>> think
> > > > >>> > >>>>>> of
> > > > >>> > >>>>>>> a strong reason for not supporting SSGs in fine-grained
> > > > >>> > resource
> > > > >>> > >>>>>>> management.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>> Interestingly, if all operators have their resources
> > > > properly
> > > > >>> > >>>>>> specified,
> > > > >>> > >>>>>>>> then slot sharing is no longer needed because Flink
> > > > could
> > > > >>> > >> slice off
> > > > >>> > >>>>> the
> > > > >>> > >>>>>>>> appropriately sized slots for every Task individually.
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > > >>> > operator op_1
> > > > >>> > >> and
> > > > >>> > >>>>> op_2
> > > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would then
> > > say
> > > > that
> > > > >>> > >> the
> > > > >>> > >>>> slot
> > > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > have
> > > > a
> > > > >>> > >> cluster
> > > > >>> > >>>>> with
> > > > >>> > >>>>>> 2
> > > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > cannot run
> > > > >>> > >> this
> > > > >>> > >>>>> job.
> > > > >>> > >>>>>> If
> > > > >>> > >>>>>>>> the resources were specified on an operator level,
> > > then
> > > > the
> > > > >>> > >> system
> > > > >>> > >>>>>> could
> > > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > op_2
> > > > to
> > > > >>> > >> TM_2.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Couldn't agree more that if all operators' requirements
> > > > are
> > > > >>> > >> properly
> > > > >>> > >>>>>>> specified, slot sharing should be no longer needed. I
> > > > >>> > think this
> > > > >>> > >>>>> exactly
> > > > >>> > >>>>>>> disproves the example. If we already know op_1 and op_2
> > > > each
> > > > >>> > >> needs
> > > > >>> > >>>> 100
> > > > >>> > >>>>> MB
> > > > >>> > >>>>>>> of memory, why would we put them in the same group? If
> > > > >>> > they are
> > > > >>> > >> in
> > > > >>> > >>>>>> separate
> > > > >>> > >>>>>>> groups, with the proposed approach the system can
> > > freely
> > > > >>> > deploy
> > > > >>> > >> them
> > > > >>> > >>>> to
> > > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Moreover, the precondition for not needing slot sharing
> > > > is
> > > > >>> > having
> > > > >>> > >>>>>> resource
> > > > >>> > >>>>>>> requirements properly specified for all operators. This
> > > > is not
> > > > >>> > >> always
> > > > >>> > >>>>>>> possible, and usually requires tremendous efforts. One
> > > > of the
> > > > >>> > >>>> benefits
> > > > >>> > >>>>>> for
> > > > >>> > >>>>>>> SSG-based requirements is that it allows the user to
> > > > freely
> > > > >>> > >> decide
> > > > >>> > >>>> the
> > > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > > >>> > consider SSG
> > > > >>> > >> in
> > > > >>> > >>>>>>> fine-grained resource management as a group of
> > > operators
> > > > >>> > that the
> > > > >>> > >>>> user
> > > > >>> > >>>>>>> would like to specify the total resource for. There can
> > > > be
> > > > >>> > only
> > > > >>> > >> one
> > > > >>> > >>>>> group
> > > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > > major
> > > > >>> > parts,
> > > > >>> > >> or as
> > > > >>> > >>>>>> many
> > > > >>> > >>>>>>> groups as the number of tasks/operators, depending on
> > > how
> > > > >>> > >>>> fine-grained
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>> user is able to specify the resources.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Having to support SSGs might be a constraint. But given
> > > > >>> > that all
> > > > >>> > >> the
> > > > >>> > >>>>>>> current scheduler implementations already support
> > > SSGs, I
> > > > >>> > tend to
> > > > >>> > >>>> think
> > > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > > >>> > usability and
> > > > >>> > >>>>>>> flexibility.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> @Chesnay
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > > waste
> > > > >>> > >> resources
> > > > >>> > >>>> if
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>>> parallelism of operators within that group are
> > > > different?
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>> Yes. It's a trade-off between usability and resource
> > > > >>> > >> utilization. To
> > > > >>> > >>>>>> avoid
> > > > >>> > >>>>>>> such wasting, the user can define more groups, so that
> > > > >>> > each group
> > > > >>> > >>>>>> contains
> > > > >>> > >>>>>>> less operators and the chance of having operators with
> > > > >>> > different
> > > > >>> > >>>>>>> parallelism will be reduced. The price is to have more
> > > > >>> > resource
> > > > >>> > >>>>>>> requirements to specify.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> It also seems like quite a hassle for users having to
> > > > >>> > >> recalculate the
> > > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > > create
> > > > >>> > >> a set
> > > > >>> > >>>>> of
> > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > their
> > > > >>> > >>>>> applications;
> > > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > > >>> > would be
> > > > >>> > >> a
> > > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > > >>> > >> requirements
> > > > >>> > >>>>> any
> > > > >>> > >>>>>>>> way.
> > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > increases
> > > > >>> > >>>>> usability.
> > > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > > there's no
> > > > >>> > >> reason to
> > > > >>> > >>>>> put
> > > > >>> > >>>>>>> multiple operators whose individual resource
> > > > >>> > requirements are
> > > > >>> > >>>>> already
> > > > >>> > >>>>>>> known
> > > > >>> > >>>>>>> into the same group in fine-grained resource
> > > > management.
> > > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > > multiple
> > > > >>> > >>>>> applications,
> > > > >>> > >>>>>>> it does not guarantee the same resource
> > > requirements.
> > > > >>> > During
> > > > >>> > >> our
> > > > >>> > >>>>> years
> > > > >>> > >>>>>>> of
> > > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > > requirements
> > > > >>> > >> specified for
> > > > >>> > >>>>>>> Blink's
> > > > >>> > >>>>>>> fine-grained resource management, very few users
> > > > >>> > (including
> > > > >>> > >> our
> > > > >>> > >>>>>>> specialists
> > > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > > >>> > >> experienced as
> > > > >>> > >>>>> to
> > > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > > >>> > >> requirements.
> > > > >>> > >>>> Most
> > > > >>> > >>>>>>> people
> > > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > > delay, cpu
> > > > >>> > >> load,
> > > > >>> > >>>>>> memory
> > > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > > specification.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> To sum up:
> > > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > > >>> > requirements
> > > > >>> > >> for
> > > > >>> > >>>>>> every
> > > > >>> > >>>>>>> operator, that's definitely a good thing and we would
> > > not
> > > > >>> > need to
> > > > >>> > >>>> rely
> > > > >>> > >>>>> on
> > > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > > > >>> > >> fine-grained
> > > > >>> > >>>>>> resource
> > > > >>> > >>>>>>> management to work. For those users who are capable and
> > > > do not
> > > > >>> > >> like
> > > > >>> > >>>>>> having
> > > > >>> > >>>>>>> to set each operator to a separate SSG, I would be ok
> > > to
> > > > have
> > > > >>> > >> both
> > > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and to
> > > > only
> > > > >>> > >> fallback
> > > > >>> > >>>> to
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>> SSG requirements when the operator requirements are not
> > > > >>> > >> specified.
> > > > >>> > >>>>>> However,
> > > > >>> > >>>>>>> as the first step, I think we should prioritise the use
> > > > cases
> > > > >>> > >> where
> > > > >>> > >>>>> users
> > > > >>> > >>>>>>> are not that experienced.
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Thank you~
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> Xintong Song
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > > >>> > >> chesnay@apache.org <ma...@apache.org>>
> > > > >>> > >>>>>>> wrote:
> > > > >>> > >>>>>>>
> > > > >>> > >>>>>>>> Will declaring them on slot sharing groups not also
> > > > waste
> > > > >>> > >> resources
> > > > >>> > >>>>> if
> > > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > > different?
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>>> It also seems like quite a hassle for users having to
> > > > >>> > >> recalculate
> > > > >>> > >>>> the
> > > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > > create
> > > > >>> > >> a set
> > > > >>> > >>>>> of
> > > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > > their
> > > > >>> > >>>>> applications;
> > > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > > >>> > would be
> > > > >>> > >> a
> > > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > > >>> > >> requirements
> > > > >>> > >>>>> any
> > > > >>> > >>>>>>>> way.
> > > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > > increases
> > > > >>> > >>>>> usability.
> > > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > > work
> > > > >>> > on SSGs
> > > > >>> > >>>> it's
> > > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > > approaches,
> > > > >>> > >> which
> > > > >>> > >>>>>>>> would not be the case if, for the runtime, they are
> > > > always
> > > > >>> > >> defined
> > > > >>> > >>>> on
> > > > >>> > >>>>>> an
> > > > >>> > >>>>>>>> operator-level.
> > > > >>> > >>>>>>>>
> > > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > > discussion
> > > > >>> > >>>> Yangze.
> > > > >>> > >>>>>>>>> I like that defining resource requirements on a slot
> > > > sharing
> > > > >>> > >>>> group
> > > > >>> > >>>>>>> makes
> > > > >>> > >>>>>>>>> the overall setup easier and improves usability of
> > > > resource
> > > > >>> > >>>>>>> requirements.
> > > > >>> > >>>>>>>>> What I do not like about it is that it changes slot
> > > > sharing
> > > > >>> > >>>> groups
> > > > >>> > >>>>>> from
> > > > >>> > >>>>>>>>> being a scheduling hint to something which needs to
> > > be
> > > > >>> > >> supported
> > > > >>> > >>>> in
> > > > >>> > >>>>>>> order
> > > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > > far,
> > > > the
> > > > >>> > >> idea
> > > > >>> > >>>> of
> > > > >>> > >>>>>> slot
> > > > >>> > >>>>>>>>> sharing groups was that it tells the system that a
> > > set
> > > > of
> > > > >>> > >>>> operators
> > > > >>> > >>>>>> can
> > > > >>> > >>>>>>>> be
> > > > >>> > >>>>>>>>> deployed in the same slot. But the system still had
> > > the
> > > > >>> > >> freedom
> > > > >>> > >>>> to
> > > > >>> > >>>>>> say
> > > > >>> > >>>>>>>> that
> > > > >>> > >>>>>>>>> it would rather place these tasks in different slots
> > > > if it
> > > > >>> > >>>> wanted.
> > > > >>> > >>>>> If
> > > > >>> > >>>>>>> we
> > > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > > sharing
> > > > >>> > >> group,
> > > > >>> > >>>> then
> > > > >>> > >>>>>> the
> > > > >>> > >>>>>>>>> only option for a scheduler which does not support
> > > slot
> > > > >>> > >> sharing
> > > > >>> > >>>>>> groups
> > > > >>> > >>>>>>> is
> > > > >>> > >>>>>>>>> to say that every operator in this slot sharing group
> > > > >>> > needs a
> > > > >>> > >>>> slot
> > > > >>> > >>>>>> with
> > > > >>> > >>>>>>>> the
> > > > >>> > >>>>>>>>> same resources as the whole group.
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> So for example, if we have a job consisting of two
> > > > operator
> > > > >>> > >> op_1
> > > > >>> > >>>>> and
> > > > >>> > >>>>>>> op_2
> > > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would then
> > > > say that
> > > > >>> > >> the
> > > > >>> > >>>>> slot
> > > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > > have a
> > > > >>> > >> cluster
> > > > >>> > >>>>>> with
> > > > >>> > >>>>>>> 2
> > > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > > cannot run
> > > > >>> > >> this
> > > > >>> > >>>>>> job.
> > > > >>> > >>>>>>> If
> > > > >>> > >>>>>>>>> the resources were specified on an operator level,
> > > > then the
> > > > >>> > >>>> system
> > > > >>> > >>>>>>> could
> > > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > > op_2 to
> > > > >>> > >> TM_2.
> > > > >>> > >>>>>>>>> Originally, one of the primary goals of slot sharing
> > > > groups
> > > > >>> > >> was
> > > > >>> > >>>> to
> > > > >>> > >>>>>> make
> > > > >>> > >>>>>>>> it
> > > > >>> > >>>>>>>>> easier for the user to reason about how many slots a
> > > > job
> > > > >>> > >> needs
> > > > >>> > >>>>>>>> independent
> > > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > > Interestingly,
> > > > >>> > >> if
> > > > >>> > >>>> all
> > > > >>> > >>>>>>>>> operators have their resources properly specified,
> > > > then slot
> > > > >>> > >>>>> sharing
> > > > >>> > >>>>>> is
> > > > >>> > >>>>>>>> no
> > > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > > >>> > appropriately
> > > > >>> > >>>> sized
> > > > >>> > >>>>>>> slots
> > > > >>> > >>>>>>>>> for every Task individually. What matters is whether
> > > > the
> > > > >>> > >> whole
> > > > >>> > >>>>>> cluster
> > > > >>> > >>>>>>>> has
> > > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> Cheers,
> > > > >>> > >>>>>>>>> Till
> > > > >>> > >>>>>>>>>
> > > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > > >>> > >> karmagyz@gmail.com <ma...@gmail.com>>
> > > > >>> > >>>>>> wrote:
> > > > >>> > >>>>>>>>>> Hi, there,
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > > "FLIP-156:
> > > > >>> > >> Runtime
> > > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > > Requirements"[1],
> > > > >>> > >> where we
> > > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > > interfaces
> > > > >>> > >> for
> > > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> In this FLIP:
> > > > >>> > >>>>>>>>>> - Expound the user story of fine-grained resource
> > > > >>> > >> management.
> > > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > > SSG-based
> > > > >>> > >> resource
> > > > >>> > >>>>>>>>>> requirements.
> > > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three potential
> > > > >>> > >> granularities
> > > > >>> > >>>>> for
> > > > >>> > >>>>>>>>>> specifying the resource requirements (op, task and
> > > > slot
> > > > >>> > >> sharing
> > > > >>> > >>>>>> group)
> > > > >>> > >>>>>>>>>> and explain why we choose the slot sharing group.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki document
> > > > [1].
> > > > >>> > >> Looking
> > > > >>> > >>>>>>>>>> forward to your feedback.
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>>> [1]
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>
> > > > >>> >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >>> > <
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >
> > > > >>> > >>>>>>>>>> Best,
> > > > >>> > >>>>>>>>>> Yangze Guo
> > > > >>> > >>>>>>>>>>
> > > > >>> > >>>>>>>>
> > > > >>> >
> > > > >>>
> > > >
> > >

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Kezhu Wang <ke...@gmail.com>.

Hi all, sorry for join discussion even after voting started.

I want to share my thoughts on this after reading above discussions.

I think Flink *runtime* already has an ideal granularity for resource
management 'task'. If there is
a slot shared by multiple tasks, that slot's resource requirement is simple
sum of all its logical
slots. So basically, this is no resource requirement for SlotSharingGroup
in runtime until now,
right ?

As in discussion, we already agree upon that: "If all operators have their
resources properly
specified, then slot sharing is no longer needed. "

So seems to me, naturally in mind path, what we would discuss is that: how
to bridge impractical
operator level resource specifying to runtime task level resource
requirement ? This is actually a
pure api thing as Chesnay has pointed out.

But FLIP-156 brings another direction on table: how about using SSG for
both api and runtime
resource specifying ?

From the FLIP and dicusssion, I assume that SSG resource specifying will
override operator level
resource specifying if both are specified ?

So, I wonder whether we could interpret SSG resource specifying as an "add"
but not an "set" on
resource requirement ?

The semantics is that SSG resource specifying adds additional resource to
shared slot to express
concerns on possible high thoughput and resource requirement for tasks in
one physical slot.

The result is that if scheduler indeed respect slot sharing, allocated slot
will gain extra resource
specified for that SSG.

I think one of coding barrier from "add" approach is ResourceSpec.UNKNOWN
which didn't support
'merge' operation. I tend to use ResourceSpec.ZERO as default, task
executor should be aware of
this.

@Chesnay
> My main worry is that it if we wire the runtime to work on SSGs it's
> gonna be difficult to implement more fine-grained approaches, which
> would not be the case if, for the runtime, they are always defined on an
> operator-level.

An "add" operation should be less invasive and enforce low barrier for
future find-grained
approaches.

@Stephan
>   - Users can define different slot sharing groups for operators like
they
> do now, with the exception that you cannot mix operators that have a
> resource profile and operators that have no resource profile.

@Till
> This effectively means that all unspecified operators
> will implicitly have a zero resource requirement.
> I am wondering whether this wouldn't lead to a surprising behaviour for
the
> user. If the user specifies the resource requirements for a single
> operator, then he probably will assume that the other operators will get
> the default share of resources and not nothing.

I think it is inherent due to fact that we could not defining
ResourceSpec.ONE, eg. resource
requirement for exact one default slot, with concrete numbers ? I tend to
squash out unspecified one
if there are operators in chaining with explicit resource specifying.
Otherwise, the protocol tends
to verbose as say "give me this much resource and a default". I think if we
have explict resource
specifying for partial operators, it is just saying "I don't care other
operators that much, just
get them places to run". It is most likely be cases there are stateless
fliter/map or other less
resource consuming operators. If there is indeed a problem, I think clients
can specify a global
default(or other level default in future). In job graph generating phase,
we could take that default
into account for unspecified operators.

@FLIP-156
> Expose operator chaining. (Cons fo task level resource specifying)

Is it inherent for all group level resource specifying ? They will either
break chaining or obey it,
or event could not work with.

To sum up above, my suggestions are:

In api side:
* StreamExecutionEnvironment: A global default(ResourceSpec.ZERO if
unspecified).
* Operator: ResourceSpec.ZERO(unspecified) as default.
* Task: sum of requirements from specified operators + global default(if
there are any unspecified operators)
* SSG: additional resource to physical slot.

In runtime side:
* Task: ResourceSpec.Task or ResourceSpec.ZERO
* SSG: ResourceSpec.SSG or ResourceSpec.ZERO

Physical slot gets sum up resources from logical slots and SSG, if it gets
ResourceSpec.ZERO, it is
just a default sized slot.

In short, turn SSG resource speciying as "add" and drop
ResourceSpec.UNKNOWN.


Questions/Issues:
* Could SSG express negative resource requirement ?
* Is there concrete bar for partial resource configured not function ? I
saw it will fail job submission in Dispatcher.submitJob.
* An option(cluster/job level) to force slot sharing in scheduler ? This
could be useful in case of migration from FLIP-156 to future approach.
* An option(cluster) to ignore resource specifying(allow resource specified
job to run on open box environment) for no production usage ?



On February 1, 2021 at 11:54:10, Yangze Guo (karmagyz@gmail.com) wrote:

Thanks for reply, Till and Xintong!

I update the FLIP, including:
- Edit the JavaDoc of the proposed
StreamGraphGenerator#setSlotSharingGroupResource.
- Add "Future Plan" section, which contains the potential follow-up
issues and the limitations to be documented when fine-grained resource
management is exposed to users.

I'll start a vote in another thread.

Best,
Yangze Guo

On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org>
wrote:
>
> Thanks for summarizing the discussion, Yangze. I agree that setting
> resource requirements per operator is not very user friendly. Moreover, I
> couldn't come up with a different proposal which would be as easy to use
> and wouldn't expose internal scheduling details. In fact, following this
> argument then we shouldn't have exposed the slot sharing groups in the
> first place.
>
> What is important for the user is that we properly document the
limitations
> and constraints the fine grained resource specification has. For example,
> we should explain how optimizations like chaining are affected by it and
> how different execution modes (batch vs. streaming) affect the execution
of
> operators which have specified resources. These things shouldn't become
> part of the contract of this feature and are more caused by internal
> implementation details but it will be important to understand these
things
> properly in order to use this feature effectively.
>
> Hence, +1 for starting the vote for this FLIP.
>
> Cheers,
> Till
>
> On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com>
wrote:
>
> > Thanks for the summary, Yangze.
> >
> > The changes and follow-up issues LGTM. Let's wait for responses from
the
> > others before starting a vote.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks everyone for the lively discussion. I'd like to try to
> > > summarize the current convergence in the discussion. Please let me
> > > know if I got things wrong or missed something crucial here.
> > >
> > > Change of this FLIP:
> > > - Treat the SSG resource requirements as a hint instead of a
> > > restriction for the runtime. That's should be explicitly explained in
> > > the JavaDocs.
> > >
> > > Potential follow-up issues if needed:
> > > - Provide operator-level resource configuration interface.
> > > - Provide multiple options for deciding resources for SSGs whose
> > > requirement is not specified:
> > > ** Default slot resource.
> > > ** Default operator resource times number of operators.
> > >
> > > If there are no other issues, I'll update the FLIP accordingly and
> > > start a vote thread. Thanks all for the valuable feedback again.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > Best,
> > > Yangze Guo
> > >
> > >
> > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > >
> > > > FGRuntimeInterface.png
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <to...@gmail.com>

> > > wrote:
> > > >>
> > > >> I think Chesnay's proposal could actually work. IIUC, the keypoint
is
> > > to derive operator requirements from SSG requirements on the API
side, so
> > > that the runtime only deals with operator requirements. It's
debatable
> > how
> > > the deriving should be done though. E.g., an alternative could be to
> > evenly
> > > divide the SSG requirement into requirements of operators in the
group.
> > > >>
> > > >>
> > > >> However, I'm not entirely sure which option is more desired.
> > > Illustrating my understanding in the following figure, in which on
the
> > top
> > > is Chesnay's proposal and on the bottom is the SSG-based proposal in
this
> > > FLIP.
> > > >>
> > > >>
> > > >>
> > > >> I think the major difference between the two approaches is where
> > > deriving operator requirements from SSG requirements happens.
> > > >>
> > > >> - Chesnay's proposal simplifies the runtime logic and the
interface to
> > > expose, at the price of moving more complexity (i.e. the deriving) to
the
> > > API side. The question is, where do we prefer to keep the complexity?
I'm
> > > slightly leaning towards having a thin API and keep the complexity in
> > > runtime if possible.
> > > >>
> > > >> - Notice that the dash line arrows represent optional steps that
are
> > > needed only for schedulers that do not respect SSGs, which we don't
have
> > at
> > > the moment. If we only look at the solid line arrows, then the
SSG-based
> > > approach is much simpler, without needing to derive and aggregate the
> > > requirements back and forth. I'm not sure about complicating the
current
> > > design only for the potential future needs.
> > > >>
> > > >>
> > > >> Thank you~
> > > >>
> > > >> Xintong Song
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <
chesnay@apache.org>
> > > wrote:
> > > >>>
> > > >>> You're raising a good point, but I think I can rectify that with
a
> > > minor
> > > >>> adjustment.
> > > >>>
> > > >>> Default requirements are whatever the default requirements are,
> > setting
> > > >>> the requirements for one operator has no effect on other
operators.
> > > >>>
> > > >>> With these rules, and some API enhancements, the following mockup
> > would
> > > >>> replicate the SSG-based behavior:
> > > >>>
> > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > >>> vertices = slotSharingGroup.getVertices()
> > > >>>
> > >
> >
vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > >>> vertices.remainint().setRequirements(ZERO)
> > > >>> }
> > > >>>
> > > >>> We could even allow setting requirements on slotsharing-groups
> > > >>> colocation-groups and internally translate them accordingly.
> > > >>> I can't help but feel this is a plain API issue.
> > > >>>
> > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > >>> > If I understand you correctly Chesnay, then you want to
decouple
> > the
> > > >>> > resource requirement specification from the slot sharing group
> > > >>> > assignment. Hence, per default all operators would be in the
same
> > > slot
> > > >>> > sharing group. If there is no operator with a resource
> > specification,
> > > >>> > then the system would allocate a default slot for it. If there
is
> > at
> > > >>> > least one operator, then the system would sum up all the
specified
> > > >>> > resources and allocate a slot of this size. This effectively
means
> > > >>> > that all unspecified operators will implicitly have a zero
resource
> > > >>> > requirement. Did I understand your idea correctly?
> > > >>> >
> > > >>> > I am wondering whether this wouldn't lead to a surprising
behaviour
> > > >>> > for the user. If the user specifies the resource requirements
for a
> > > >>> > single operator, then he probably will assume that the other
> > > operators
> > > >>> > will get the default share of resources and not nothing.
> > > >>> >
> > > >>> > Cheers,
> > > >>> > Till
> > > >>> >
> > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > chesnay@apache.org
> > > >>> > <ma...@apache.org>> wrote:
> > > >>> >
> > > >>> > Is there even a functional difference between specifying the
> > > >>> > requirements for an SSG vs specifying the same requirements on
> > a
> > > >>> > single
> > > >>> > operator within that group (ideally a colocation group to avoid
> > > this
> > > >>> > whole hint business)?
> > > >>> >
> > > >>> > Wouldn't we get the best of both worlds in the latter case?
> > > >>> >
> > > >>> > Users can take shortcuts to define shared requirements,
> > > >>> > but refine them further as needed on a per-operator basis,
> > > >>> > without changing semantics of slotsharing groups
> > > >>> > nor the runtime being locked into SSG-based requirements.
> > > >>> >
> > > >>> > (And before anyone argues what happens if slotsharing groups
> > > >>> > change or
> > > >>> > whatnot, that's a plain API issue that we could surely solve.
> > (A
> > > >>> > plain
> > > >>> > iteration over slotsharing groups and therein contained
> > operators
> > > >>> > would
> > > >>> > suffice)).
> > > >>> >
> > > >>> > On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > >>> > > Maybe a different minor idea: Would it be possible to treat
> > > the SSG
> > > >>> > > resource requirements as a hint for the runtime similar to
> > how
> > > >>> > slot sharing
> > > >>> > > groups are designed at the moment? Meaning that we don't give
> > > >>> > the guarantee
> > > >>> > > that Flink will always deploy this set of tasks together no
> > > >>> > matter what
> > > >>> > > comes. If, for example, the runtime can derive by some means
> > > the
> > > >>> > resource
> > > >>> > > requirements for each task based on the requirements for the
> > > >>> > SSG, this
> > > >>> > > could be possible. One easy strategy would be to give every
> > > task
> > > >>> > the same
> > > >>> > > resources as the whole slot sharing group. Another one could
> > be
> > > >>> > > distributing the resources equally among the tasks. This does
> > > >>> > not even have
> > > >>> > > to be implemented but we would give ourselves the freedom to
> > > change
> > > >>> > > scheduling if need should arise.
> > > >>> > >
> > > >>> > > Cheers,
> > > >>> > > Till
> > > >>> > >
> > > >>> > > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > karmagyz@gmail.com
> > > >>> > <ma...@gmail.com>> wrote:
> > > >>> > >
> > > >>> > >> Thanks for the responses, Till and Xintong.
> > > >>> > >>
> > > >>> > >> I second Xintong's comment that SSG-based runtime interface
> > > >>> > will give
> > > >>> > >> us the flexibility to achieve op/task-based approach. That's
> > > one of
> > > >>> > >> the most important reasons for our design choice.
> > > >>> > >>
> > > >>> > >> Some cents regarding the default operator resource:
> > > >>> > >> - It might be good for the scenario of DataStream jobs.
> > > >>> > >> ** For light-weight operators, the accumulative
> > > >>> > configuration error
> > > >>> > >> will not be significant. Then, the resource of a task used
> > is
> > > >>> > >> proportional to the number of operators it contains.
> > > >>> > >> ** For heavy operators like join and window or operators
> > > >>> > using the
> > > >>> > >> external resources, user will turn to the fine-grained
> > > resource
> > > >>> > >> configuration.
> > > >>> > >> - It can increase the stability for the standalone cluster
> > > >>> > where task
> > > >>> > >> executors registered are heterogeneous(with different
> > default
> > > slot
> > > >>> > >> resources).
> > > >>> > >> - It might not be good for SQL users. The operators that SQL
> > > >>> > will be
> > > >>> > >> transferred to is a black box to the user. We also do not
> > > guarantee
> > > >>> > >> the cross-version of consistency of the transformation so
> > far.
> > > >>> > >>
> > > >>> > >> I think it can be treated as a follow-up work when the
> > > fine-grained
> > > >>> > >> resource management is end-to-end ready.
> > > >>> > >>
> > > >>> > >> Best,
> > > >>> > >> Yangze Guo
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > >>> > >> wrote:
> > > >>> > >>> Thanks for the feedback, Till.
> > > >>> > >>>
> > > >>> > >>> ## I feel that what you proposed (operator-based + default
> > > >>> > value) might
> > > >>> > >> be
> > > >>> > >>> subsumed by the SSG-based approach.
> > > >>> > >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> > > >>> > categorized by
> > > >>> > >>> whether the resource requirements are known to the users.
> > > >>> > >>>
> > > >>> > >>> 1. *Both known.* As previously mentioned, there's no
> > > >>> > reason to put
> > > >>> > >>> multiple operators whose individual resource
> > requirements
> > > >>> > are already
> > > >>> > >> known
> > > >>> > >>> into the same group in fine-grained resource
> > management.
> > > >>> > And if op_1
> > > >>> > >> and
> > > >>> > >>> op_2 are in different groups, there should be no
> > problem
> > > >>> > switching
> > > >>> > >> data
> > > >>> > >>> exchange mode from pipelined to blocking. This is
> > > >>> > equivalent to
> > > >>> > >> specifying
> > > >>> > >>> operator resource requirements in your proposal.
> > > >>> > >>> 2. *op_1 known, op_2 unknown.* Similar to 1), except
> > that
> > > >>> > op_2 is in a
> > > >>> > >>> SSG whose resource is not specified thus would have the
> > > >>> > default slot
> > > >>> > >>> resource. This is equivalent to having default operator
> > > >>> > resources in
> > > >>> > >> your
> > > >>> > >>> proposal.
> > > >>> > >>> 3. *Both unknown*. The user can either set op_1 and
> > op_2
> > > >>> > to the same
> > > >>> > >> SSG
> > > >>> > >>> or separate SSGs.
> > > >>> > >>> - If op_1 and op_2 are in the same SSG, it will be
> > > >>> > equivalent to
> > > >>> > >> the
> > > >>> > >>> coarse-grained resource management, where op_1 and
> > > op_2
> > > >>> > share a
> > > >>> > >> default
> > > >>> > >>> size slot no matter which data exchange mode is
> > used.
> > > >>> > >>> - If op_1 and op_2 are in different SSGs, then each
> > of
> > > >>> > them will
> > > >>> > >> use
> > > >>> > >>> a default size slot. This is equivalent to setting
> > > them
> > > >>> > with
> > > >>> > >> default
> > > >>> > >>> operator resources in your proposal.
> > > >>> > >>> 4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > is
> > > >>> > known.*
> > > >>> > >>> - It is possible that the user learns the total /
> > max
> > > >>> > resource
> > > >>> > >>> requirement from executing and monitoring the job,
> > > >>> > while not
> > > >>> > >>> being aware of
> > > >>> > >>> individual operator requirements.
> > > >>> > >>> - I believe this is the case your proposal does not
> > > >>> > cover. And TBH,
> > > >>> > >>> this is probably how most users learn the resource
> > > >>> > requirements,
> > > >>> > >>> according
> > > >>> > >>> to my experiences.
> > > >>> > >>> - In this case, the user might need to specify
> > > >>> > different resources
> > > >>> > >> if
> > > >>> > >>> he wants to switch the execution mode, which should
> > > not
> > > >>> > be worse
> > > >>> > >> than not
> > > >>> > >>> being able to use fine-grained resource management.
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> ## An additional idea inspired by your proposal.
> > > >>> > >>> We may provide multiple options for deciding resources for
> > > >>> > SSGs whose
> > > >>> > >>> requirement is not specified, if needed.
> > > >>> > >>>
> > > >>> > >>> - Default slot resource (current design)
> > > >>> > >>> - Default operator resource times number of operators
> > > >>> > (equivalent to
> > > >>> > >>> your proposal)
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> ## Exposing internal runtime strategies
> > > >>> > >>> Theoretically, yes. Tying to the SSGs, the resource
> > > >>> > requirements might be
> > > >>> > >>> affected if how SSGs are internally handled changes in
> > > future.
> > > >>> > >> Practically,
> > > >>> > >>> I do not concretely see at the moment what kind of changes
> > we
> > > >>> > may want in
> > > >>> > >>> future that might conflict with this FLIP proposal, as the
> > > >>> > question of
> > > >>> > >>> switching data exchange mode answered above. I'd suggest to
> > > >>> > not give up
> > > >>> > >> the
> > > >>> > >>> user friendliness we may gain now for the future problems
> > > that
> > > >>> > may or may
> > > >>> > >>> not exist.
> > > >>> > >>>
> > > >>> > >>> Moreover, the SSG-based approach has the flexibility to
> > > >>> > achieve the
> > > >>> > >>> equivalent behavior as the operator-based approach, if we
> > > set each
> > > >>> > >> operator
> > > >>> > >>> (or task) to a separate SSG. We can even provide a shortcut
> > > >>> > option to
> > > >>> > >>> automatically do that for users, if needed.
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> Thank you~
> > > >>> > >>>
> > > >>> > >>> Xintong Song
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>>
> > > >>> > >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > >>> > <trohrmann@apache.org <ma...@apache.org>>
> > > >>> > >> wrote:
> > > >>> > >>>> Thanks for the responses Xintong and Stephan,
> > > >>> > >>>>
> > > >>> > >>>> I agree that being able to define the resource
> > requirements
> > > for a
> > > >>> > >> group of
> > > >>> > >>>> operators is more user friendly. However, my concern is
> > that
> > > >>> > we are
> > > >>> > >>>> exposing thereby internal runtime strategies which might
> > > >>> > limit our
> > > >>> > >>>> flexibility to execute a given job. Moreover, the
> > semantics
> > > of
> > > >>> > >> configuring
> > > >>> > >>>> resource requirements for SSGs could break if switching
> > from
> > > >>> > streaming
> > > >>> > >> to
> > > >>> > >>>> batch execution. If one defines the resource requirements
> > > for
> > > >>> > op_1 ->
> > > >>> > >> op_2
> > > >>> > >>>> which run in pipelined mode when using the streaming
> > > >>> > execution, then
> > > >>> > >> how do
> > > >>> > >>>> we interpret these requirements when op_1 -> op_2 are
> > > >>> > executed with a
> > > >>> > >>>> blocking data exchange in batch execution mode?
> > > Consequently,
> > > >>> > I am
> > > >>> > >> still
> > > >>> > >>>> leaning towards Stephan's proposal to set the resource
> > > >>> > requirements per
> > > >>> > >>>> operator.
> > > >>> > >>>>
> > > >>> > >>>> Maybe the following proposal makes the configuration
> > easier:
> > > >>> > If the
> > > >>> > >> user
> > > >>> > >>>> wants to use fine-grained resource requirements, then she
> > > >>> > needs to
> > > >>> > >> specify
> > > >>> > >>>> the default size which is used for operators which have no
> > > >>> > explicit
> > > >>> > >>>> resource annotation. If this holds true, then every
> > operator
> > > >>> > would
> > > >>> > >> have a
> > > >>> > >>>> resource requirement and the system can try to execute the
> > > >>> > operators
> > > >>> > >> in the
> > > >>> > >>>> best possible manner w/o being constrained by how the user
> > > >>> > set the SSG
> > > >>> > >>>> requirements.
> > > >>> > >>>>
> > > >>> > >>>> Cheers,
> > > >>> > >>>> Till
> > > >>> > >>>>
> > > >>> > >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > >>> > <tonysong820@gmail.com <ma...@gmail.com>>
> > > >>> > >>>> wrote:
> > > >>> > >>>>
> > > >>> > >>>>> Thanks for the feedback, Stephan.
> > > >>> > >>>>>
> > > >>> > >>>>> Actually, your proposal has also come to my mind at some
> > > >>> > point. And I
> > > >>> > >>>> have
> > > >>> > >>>>> some concerns about it.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> 1. It does not give users the same control as the
> > SSG-based
> > > >>> > approach.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> While both approaches do not require specifying for each
> > > >>> > operator,
> > > >>> > >>>>> SSG-based approach supports the semantic that "some
> > > operators
> > > >>> > >> together
> > > >>> > >>>> use
> > > >>> > >>>>> this much resource" while the operator-based approach
> > > doesn't.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> > > >>> > o_m), and
> > > >>> > >> at
> > > >>> > >>>> some
> > > >>> > >>>>> point there's an agg o_n (1 < n < m) which significantly
> > > >>> > reduces the
> > > >>> > >> data
> > > >>> > >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> > > >>> > (o_1, ...,
> > > >>> > >> o_n)
> > > >>> > >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > higher
> > > >>> > >> parallelisms
> > > >>> > >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> > > >>> > lead to too
> > > >>> > >> much
> > > >>> > >>>>> wasting of resources. If the two SSGs end up needing
> > > different
> > > >>> > >> resources,
> > > >>> > >>>>> with the SSG-based approach one can directly specify
> > > >>> > resources for
> > > >>> > >> the
> > > >>> > >>>> two
> > > >>> > >>>>> groups. However, with the operator-based approach, the
> > > user will
> > > >>> > >> have to
> > > >>> > >>>>> specify resources for each operator in one of the two
> > > >>> > groups, and
> > > >>> > >> tune
> > > >>> > >>>> the
> > > >>> > >>>>> default slot resource via configurations to fit the other
> > > group.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> 2. It increases the chance of breaking operator chains.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Setting chainnable operators into different slot sharing
> > > >>> > groups will
> > > >>> > >>>>> prevent them from being chained. In the current
> > > implementation,
> > > >>> > >>>> downstream
> > > >>> > >>>>> operators, if SSG not explicitly specified, will be set
> > to
> > > >>> > the same
> > > >>> > >> group
> > > >>> > >>>>> as the chainable upstream operators (unless multiple
> > > upstream
> > > >>> > >> operators
> > > >>> > >>>> in
> > > >>> > >>>>> different groups), to reduce the chance of breaking
> > chains.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> > > >>> > deciding
> > > >>> > >> SSGs
> > > >>> > >>>>> based on whether resource is specified we will easily get
> > > >>> > groups like
> > > >>> > >>>> (o_1,
> > > >>> > >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > >>> > chained. This
> > > >>> > >> is
> > > >>> > >>>> also
> > > >>> > >>>>> possible for the SSG-based approach, but I believe the
> > > >>> > chance is much
> > > >>> > >>>>> smaller because there's no strong reason for users to
> > > >>> > specify the
> > > >>> > >> groups
> > > >>> > >>>>> with alternate operators like that. We are more likely to
> > > >>> > get groups
> > > >>> > >> like
> > > >>> > >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > between
> > > >>> > o_2 and
> > > >>> > >> o_3.
> > > >>> > >>>>>
> > > >>> > >>>>> 3. It complicates the system by having two different
> > > >>> > mechanisms for
> > > >>> > >>>> sharing
> > > >>> > >>>>> managed memory in a slot.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> - In FLIP-141, we introduced the intra-slot managed
> > memory
> > > >>> > sharing
> > > >>> > >>>>> mechanism, where managed memory is first distributed
> > > >>> > according to the
> > > >>> > >>>>> consumer type, then further distributed across operators
> > > of that
> > > >>> > >> consumer
> > > >>> > >>>>> type.
> > > >>> > >>>>>
> > > >>> > >>>>> - With the operator-based approach, managed memory size
> > > >>> > specified
> > > >>> > >> for an
> > > >>> > >>>>> operator should account for all the consumer types of
> > that
> > > >>> > operator.
> > > >>> > >> That
> > > >>> > >>>>> means the managed memory is first distributed across
> > > >>> > operators, then
> > > >>> > >>>>> distributed to different consumer types of each operator.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Unfortunately, the different order of the two calculation
> > > >>> > steps can
> > > >>> > >> lead
> > > >>> > >>>> to
> > > >>> > >>>>> different results. To be specific, the semantic of the
> > > >>> > configuration
> > > >>> > >>>> option
> > > >>> > >>>>> `consumer-weights` changed (within a slot vs. within an
> > > >>> > operator).
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> To sum up things:
> > > >>> > >>>>>
> > > >>> > >>>>> While (3) might be a bit more implementation related, I
> > > >>> > think (1)
> > > >>> > >> and (2)
> > > >>> > >>>>> somehow suggest that, the price for the proposed approach
> > > to
> > > >>> > avoid
> > > >>> > >>>>> specifying resource for every operator is that it's not
> > as
> > > >>> > >> independent
> > > >>> > >>>> from
> > > >>> > >>>>> operator chaining and slot sharing as the operator-based
> > > >>> > approach
> > > >>> > >>>> discussed
> > > >>> > >>>>> in the FLIP.
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> Thank you~
> > > >>> > >>>>>
> > > >>> > >>>>> Xintong Song
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>>
> > > >>> > >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > >>> > <sewen@apache.org <ma...@apache.org>>
> > > >>> > >> wrote:
> > > >>> > >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > >>> > >>>>>>
> > > >>> > >>>>>> I want to say, first of all, that this is super well
> > > >>> > written. And
> > > >>> > >> the
> > > >>> > >>>>>> points that the FLIP makes about how to expose the
> > > >>> > configuration to
> > > >>> > >>>> users
> > > >>> > >>>>>> is exactly the right thing to figure out first.
> > > >>> > >>>>>> So good job here!
> > > >>> > >>>>>>
> > > >>> > >>>>>> About how to let users specify the resource profiles.
> > If I
> > > >>> > can sum
> > > >>> > >> the
> > > >>> > >>>>> FLIP
> > > >>> > >>>>>> and previous discussion up in my own words, the problem
> > > is the
> > > >>> > >>>> following:
> > > >>> > >>>>>> Operator-level specification is the simplest and
> > cleanest
> > > >>> > approach,
> > > >>> > >>>>> because
> > > >>> > >>>>>>> it avoids mixing operator configuration (resource) and
> > > >>> > >> scheduling. No
> > > >>> > >>>>>>> matter what other parameters change (chaining, slot
> > > sharing,
> > > >>> > >>>> switching
> > > >>> > >>>>>>> pipelined and blocking shuffles), the resource profiles
> > > >>> > stay the
> > > >>> > >>>> same.
> > > >>> > >>>>>>> But it would require that a user specifies resources on
> > > all
> > > >>> > >>>> operators,
> > > >>> > >>>>>>> which makes it hard to use. That's why the FLIP
> > suggests
> > > going
> > > >>> > >> with
> > > >>> > >>>>>>> specifying resources on a Sharing-Group.
> > > >>> > >>>>>>
> > > >>> > >>>>>> I think both thoughts are important, so can we find a
> > > solution
> > > >>> > >> where
> > > >>> > >>>> the
> > > >>> > >>>>>> Resource Profiles are specified on an Operator, but we
> > > >>> > still avoid
> > > >>> > >> that
> > > >>> > >>>>> we
> > > >>> > >>>>>> need to specify a resource profile on every operator?
> > > >>> > >>>>>>
> > > >>> > >>>>>> What do you think about something like the following:
> > > >>> > >>>>>> - Resource Profiles are specified on an operator
> > level.
> > > >>> > >>>>>> - Not all operators need profiles
> > > >>> > >>>>>> - All Operators without a Resource Profile ended up
> > in
> > > the
> > > >>> > >> default
> > > >>> > >>>> slot
> > > >>> > >>>>>> sharing group with a default profile (will get a default
> > > slot).
> > > >>> > >>>>>> - All Operators with a Resource Profile will go into
> > > >>> > another slot
> > > >>> > >>>>> sharing
> > > >>> > >>>>>> group (the resource-specified-group).
> > > >>> > >>>>>> - Users can define different slot sharing groups for
> > > >>> > operators
> > > >>> > >> like
> > > >>> > >>>>> they
> > > >>> > >>>>>> do now, with the exception that you cannot mix operators
> > > >>> > that have
> > > >>> > >> a
> > > >>> > >>>>>> resource profile and operators that have no resource
> > > profile.
> > > >>> > >>>>>> - The default case where no operator has a resource
> > > >>> > profile is
> > > >>> > >> just a
> > > >>> > >>>>>> special case of this model
> > > >>> > >>>>>> - The chaining logic sums up the profiles per
> > operator,
> > > >>> > like it
> > > >>> > >> does
> > > >>> > >>>>> now,
> > > >>> > >>>>>> and the scheduler sums up the profiles of the tasks that
> > > it
> > > >>> > >> schedules
> > > >>> > >>>>>> together.
> > > >>> > >>>>>>
> > > >>> > >>>>>>
> > > >>> > >>>>>> There is another question about reactive scaling raised
> > > in the
> > > >>> > >> FLIP. I
> > > >>> > >>>>> need
> > > >>> > >>>>>> to think a bit about that. That is indeed a bit more
> > > tricky
> > > >>> > once we
> > > >>> > >>>> have
> > > >>> > >>>>>> slots of different sizes.
> > > >>> > >>>>>> It is not clear then which of the different slot
> > requests
> > > the
> > > >>> > >>>>>> ResourceManager should fulfill when new resources (TMs)
> > > >>> > show up,
> > > >>> > >> or how
> > > >>> > >>>>> the
> > > >>> > >>>>>> JobManager redistributes the slots resources when
> > > resources
> > > >>> > (TMs)
> > > >>> > >>>>> disappear
> > > >>> > >>>>>> This question is pretty orthogonal, though, to the "how
> > to
> > > >>> > specify
> > > >>> > >> the
> > > >>> > >>>>>> resources".
> > > >>> > >>>>>>
> > > >>> > >>>>>>
> > > >>> > >>>>>> Best,
> > > >>> > >>>>>> Stephan
> > > >>> > >>>>>>
> > > >>> > >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > >>> > <tonysong820@gmail.com <ma...@gmail.com>
> > > >>> > >>>>> wrote:
> > > >>> > >>>>>>> Thanks for drafting the FLIP and driving the
> > discussion,
> > > >>> > Yangze.
> > > >>> > >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> @Till,
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> I agree that specifying requirements for SSGs means
> > that
> > > SSGs
> > > >>> > >> need to
> > > >>> > >>>>> be
> > > >>> > >>>>>>> supported in fine-grained resource management,
> > otherwise
> > > each
> > > >>> > >>>> operator
> > > >>> > >>>>>>> might use as many resources as the whole group.
> > However,
> > > I
> > > >>> > cannot
> > > >>> > >>>> think
> > > >>> > >>>>>> of
> > > >>> > >>>>>>> a strong reason for not supporting SSGs in fine-grained
> > > >>> > resource
> > > >>> > >>>>>>> management.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>>
> > > >>> > >>>>>>>> Interestingly, if all operators have their resources
> > > properly
> > > >>> > >>>>>> specified,
> > > >>> > >>>>>>>> then slot sharing is no longer needed because Flink
> > > could
> > > >>> > >> slice off
> > > >>> > >>>>> the
> > > >>> > >>>>>>>> appropriately sized slots for every Task individually.
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>> So for example, if we have a job consisting of two
> > > >>> > operator op_1
> > > >>> > >> and
> > > >>> > >>>>> op_2
> > > >>> > >>>>>>>> where each op needs 100 MB of memory, we would then
> > say
> > > that
> > > >>> > >> the
> > > >>> > >>>> slot
> > > >>> > >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > have
> > > a
> > > >>> > >> cluster
> > > >>> > >>>>> with
> > > >>> > >>>>>> 2
> > > >>> > >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > cannot run
> > > >>> > >> this
> > > >>> > >>>>> job.
> > > >>> > >>>>>> If
> > > >>> > >>>>>>>> the resources were specified on an operator level,
> > then
> > > the
> > > >>> > >> system
> > > >>> > >>>>>> could
> > > >>> > >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > op_2
> > > to
> > > >>> > >> TM_2.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Couldn't agree more that if all operators' requirements
> > > are
> > > >>> > >> properly
> > > >>> > >>>>>>> specified, slot sharing should be no longer needed. I
> > > >>> > think this
> > > >>> > >>>>> exactly
> > > >>> > >>>>>>> disproves the example. If we already know op_1 and op_2
> > > each
> > > >>> > >> needs
> > > >>> > >>>> 100
> > > >>> > >>>>> MB
> > > >>> > >>>>>>> of memory, why would we put them in the same group? If
> > > >>> > they are
> > > >>> > >> in
> > > >>> > >>>>>> separate
> > > >>> > >>>>>>> groups, with the proposed approach the system can
> > freely
> > > >>> > deploy
> > > >>> > >> them
> > > >>> > >>>> to
> > > >>> > >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Moreover, the precondition for not needing slot sharing
> > > is
> > > >>> > having
> > > >>> > >>>>>> resource
> > > >>> > >>>>>>> requirements properly specified for all operators. This
> > > is not
> > > >>> > >> always
> > > >>> > >>>>>>> possible, and usually requires tremendous efforts. One
> > > of the
> > > >>> > >>>> benefits
> > > >>> > >>>>>> for
> > > >>> > >>>>>>> SSG-based requirements is that it allows the user to
> > > freely
> > > >>> > >> decide
> > > >>> > >>>> the
> > > >>> > >>>>>>> granularity, thus efforts they want to pay. I would
> > > >>> > consider SSG
> > > >>> > >> in
> > > >>> > >>>>>>> fine-grained resource management as a group of
> > operators
> > > >>> > that the
> > > >>> > >>>> user
> > > >>> > >>>>>>> would like to specify the total resource for. There can
> > > be
> > > >>> > only
> > > >>> > >> one
> > > >>> > >>>>> group
> > > >>> > >>>>>>> in the job, 2~3 groups dividing the job into a few
> > major
> > > >>> > parts,
> > > >>> > >> or as
> > > >>> > >>>>>> many
> > > >>> > >>>>>>> groups as the number of tasks/operators, depending on
> > how
> > > >>> > >>>> fine-grained
> > > >>> > >>>>>> the
> > > >>> > >>>>>>> user is able to specify the resources.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Having to support SSGs might be a constraint. But given
> > > >>> > that all
> > > >>> > >> the
> > > >>> > >>>>>>> current scheduler implementations already support
> > SSGs, I
> > > >>> > tend to
> > > >>> > >>>> think
> > > >>> > >>>>>>> that as an acceptable price for the above discussed
> > > >>> > usability and
> > > >>> > >>>>>>> flexibility.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> @Chesnay
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Will declaring them on slot sharing groups not also
> > waste
> > > >>> > >> resources
> > > >>> > >>>> if
> > > >>> > >>>>>> the
> > > >>> > >>>>>>>> parallelism of operators within that group are
> > > different?
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>> Yes. It's a trade-off between usability and resource
> > > >>> > >> utilization. To
> > > >>> > >>>>>> avoid
> > > >>> > >>>>>>> such wasting, the user can define more groups, so that
> > > >>> > each group
> > > >>> > >>>>>> contains
> > > >>> > >>>>>>> less operators and the chance of having operators with
> > > >>> > different
> > > >>> > >>>>>>> parallelism will be reduced. The price is to have more
> > > >>> > resource
> > > >>> > >>>>>>> requirements to specify.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> It also seems like quite a hassle for users having to
> > > >>> > >> recalculate the
> > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > create
> > > >>> > >> a set
> > > >>> > >>>>> of
> > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > their
> > > >>> > >>>>> applications;
> > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > >>> > would be
> > > >>> > >> a
> > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > >>> > >> requirements
> > > >>> > >>>>> any
> > > >>> > >>>>>>>> way.
> > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > increases
> > > >>> > >>>>> usability.
> > > >>> > >>>>>>> - As mentioned in my reply to Till's comment,
> > > there's no
> > > >>> > >> reason to
> > > >>> > >>>>> put
> > > >>> > >>>>>>> multiple operators whose individual resource
> > > >>> > requirements are
> > > >>> > >>>>> already
> > > >>> > >>>>>>> known
> > > >>> > >>>>>>> into the same group in fine-grained resource
> > > management.
> > > >>> > >>>>>>> - Even an operator implementation is reused for
> > > multiple
> > > >>> > >>>>> applications,
> > > >>> > >>>>>>> it does not guarantee the same resource
> > requirements.
> > > >>> > During
> > > >>> > >> our
> > > >>> > >>>>> years
> > > >>> > >>>>>>> of
> > > >>> > >>>>>>> practices in Alibaba, with per-operator
> > requirements
> > > >>> > >> specified for
> > > >>> > >>>>>>> Blink's
> > > >>> > >>>>>>> fine-grained resource management, very few users
> > > >>> > (including
> > > >>> > >> our
> > > >>> > >>>>>>> specialists
> > > >>> > >>>>>>> who are dedicated to supporting Blink users) are as
> > > >>> > >> experienced as
> > > >>> > >>>>> to
> > > >>> > >>>>>>> accurately predict/estimate the operator resource
> > > >>> > >> requirements.
> > > >>> > >>>> Most
> > > >>> > >>>>>>> people
> > > >>> > >>>>>>> rely on the execution-time metrics (throughput,
> > > delay, cpu
> > > >>> > >> load,
> > > >>> > >>>>>> memory
> > > >>> > >>>>>>> usage, GC pressure, etc.) to improve the
> > > specification.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> To sum up:
> > > >>> > >>>>>>> If the user is capable of providing proper resource
> > > >>> > requirements
> > > >>> > >> for
> > > >>> > >>>>>> every
> > > >>> > >>>>>>> operator, that's definitely a good thing and we would
> > not
> > > >>> > need to
> > > >>> > >>>> rely
> > > >>> > >>>>> on
> > > >>> > >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > > >>> > >> fine-grained
> > > >>> > >>>>>> resource
> > > >>> > >>>>>>> management to work. For those users who are capable and
> > > do not
> > > >>> > >> like
> > > >>> > >>>>>> having
> > > >>> > >>>>>>> to set each operator to a separate SSG, I would be ok
> > to
> > > have
> > > >>> > >> both
> > > >>> > >>>>>>> SSG-based and operator-based runtime interfaces and to
> > > only
> > > >>> > >> fallback
> > > >>> > >>>> to
> > > >>> > >>>>>> the
> > > >>> > >>>>>>> SSG requirements when the operator requirements are not
> > > >>> > >> specified.
> > > >>> > >>>>>> However,
> > > >>> > >>>>>>> as the first step, I think we should prioritise the use
> > > cases
> > > >>> > >> where
> > > >>> > >>>>> users
> > > >>> > >>>>>>> are not that experienced.
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Thank you~
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> Xintong Song
> > > >>> > >>>>>>>
> > > >>> > >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > >>> > >> chesnay@apache.org <ma...@apache.org>>
> > > >>> > >>>>>>> wrote:
> > > >>> > >>>>>>>
> > > >>> > >>>>>>>> Will declaring them on slot sharing groups not also
> > > waste
> > > >>> > >> resources
> > > >>> > >>>>> if
> > > >>> > >>>>>>>> the parallelism of operators within that group are
> > > different?
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>>> It also seems like quite a hassle for users having to
> > > >>> > >> recalculate
> > > >>> > >>>> the
> > > >>> > >>>>>>>> resource requirements if they change the slot sharing.
> > > >>> > >>>>>>>> I'd think that it's not really workable for users that
> > > create
> > > >>> > >> a set
> > > >>> > >>>>> of
> > > >>> > >>>>>>>> re-usable operators which are mixed and matched in
> > their
> > > >>> > >>>>> applications;
> > > >>> > >>>>>>>> managing the resources requirements in such a setting
> > > >>> > would be
> > > >>> > >> a
> > > >>> > >>>>>>>> nightmare, and in the end would require operator-level
> > > >>> > >> requirements
> > > >>> > >>>>> any
> > > >>> > >>>>>>>> way.
> > > >>> > >>>>>>>> In that sense, I'm not even sure whether it really
> > > increases
> > > >>> > >>>>> usability.
> > > >>> > >>>>>>>> My main worry is that it if we wire the runtime to
> > work
> > > >>> > on SSGs
> > > >>> > >>>> it's
> > > >>> > >>>>>>>> gonna be difficult to implement more fine-grained
> > > approaches,
> > > >>> > >> which
> > > >>> > >>>>>>>> would not be the case if, for the runtime, they are
> > > always
> > > >>> > >> defined
> > > >>> > >>>> on
> > > >>> > >>>>>> an
> > > >>> > >>>>>>>> operator-level.
> > > >>> > >>>>>>>>
> > > >>> > >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > >>> > >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > discussion
> > > >>> > >>>> Yangze.
> > > >>> > >>>>>>>>> I like that defining resource requirements on a slot
> > > sharing
> > > >>> > >>>> group
> > > >>> > >>>>>>> makes
> > > >>> > >>>>>>>>> the overall setup easier and improves usability of
> > > resource
> > > >>> > >>>>>>> requirements.
> > > >>> > >>>>>>>>> What I do not like about it is that it changes slot
> > > sharing
> > > >>> > >>>> groups
> > > >>> > >>>>>> from
> > > >>> > >>>>>>>>> being a scheduling hint to something which needs to
> > be
> > > >>> > >> supported
> > > >>> > >>>> in
> > > >>> > >>>>>>> order
> > > >>> > >>>>>>>>> to support fine grained resource requirements. So
> > far,
> > > the
> > > >>> > >> idea
> > > >>> > >>>> of
> > > >>> > >>>>>> slot
> > > >>> > >>>>>>>>> sharing groups was that it tells the system that a
> > set
> > > of
> > > >>> > >>>> operators
> > > >>> > >>>>>> can
> > > >>> > >>>>>>>> be
> > > >>> > >>>>>>>>> deployed in the same slot. But the system still had
> > the
> > > >>> > >> freedom
> > > >>> > >>>> to
> > > >>> > >>>>>> say
> > > >>> > >>>>>>>> that
> > > >>> > >>>>>>>>> it would rather place these tasks in different slots
> > > if it
> > > >>> > >>>> wanted.
> > > >>> > >>>>> If
> > > >>> > >>>>>>> we
> > > >>> > >>>>>>>>> now specify resource requirements on a per slot
> > sharing
> > > >>> > >> group,
> > > >>> > >>>> then
> > > >>> > >>>>>> the
> > > >>> > >>>>>>>>> only option for a scheduler which does not support
> > slot
> > > >>> > >> sharing
> > > >>> > >>>>>> groups
> > > >>> > >>>>>>> is
> > > >>> > >>>>>>>>> to say that every operator in this slot sharing group
> > > >>> > needs a
> > > >>> > >>>> slot
> > > >>> > >>>>>> with
> > > >>> > >>>>>>>> the
> > > >>> > >>>>>>>>> same resources as the whole group.
> > > >>> > >>>>>>>>>
> > > >>> > >>>>>>>>> So for example, if we have a job consisting of two
> > > operator
> > > >>> > >> op_1
> > > >>> > >>>>> and
> > > >>> > >>>>>>> op_2
> > > >>> > >>>>>>>>> where each op needs 100 MB of memory, we would then
> > > say that
> > > >>> > >> the
> > > >>> > >>>>> slot
> > > >>> > >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > have a
> > > >>> > >> cluster
> > > >>> > >>>>>> with
> > > >>> > >>>>>>> 2
> > > >>> > >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > cannot run
> > > >>> > >> this
> > > >>> > >>>>>> job.
> > > >>> > >>>>>>> If
> > > >>> > >>>>>>>>> the resources were specified on an operator level,
> > > then the
> > > >>> > >>>> system
> > > >>> > >>>>>>> could
> > > >>> > >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > op_2 to
> > > >>> > >> TM_2.
> > > >>> > >>>>>>>>> Originally, one of the primary goals of slot sharing
> > > groups
> > > >>> > >> was
> > > >>> > >>>> to
> > > >>> > >>>>>> make
> > > >>> > >>>>>>>> it
> > > >>> > >>>>>>>>> easier for the user to reason about how many slots a
> > > job
> > > >>> > >> needs
> > > >>> > >>>>>>>> independent
> > > >>> > >>>>>>>>> of the actual number of operators in the job.
> > > Interestingly,
> > > >>> > >> if
> > > >>> > >>>> all
> > > >>> > >>>>>>>>> operators have their resources properly specified,
> > > then slot
> > > >>> > >>>>> sharing
> > > >>> > >>>>>> is
> > > >>> > >>>>>>>> no
> > > >>> > >>>>>>>>> longer needed because Flink could slice off the
> > > >>> > appropriately
> > > >>> > >>>> sized
> > > >>> > >>>>>>> slots
> > > >>> > >>>>>>>>> for every Task individually. What matters is whether
> > > the
> > > >>> > >> whole
> > > >>> > >>>>>> cluster
> > > >>> > >>>>>>>> has
> > > >>> > >>>>>>>>> enough resources to run all tasks or not.
> > > >>> > >>>>>>>>>
> > > >>> > >>>>>>>>> Cheers,
> > > >>> > >>>>>>>>> Till
> > > >>> > >>>>>>>>>
> > > >>> > >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > >>> > >> karmagyz@gmail.com <ma...@gmail.com>>
> > > >>> > >>>>>> wrote:
> > > >>> > >>>>>>>>>> Hi, there,
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> We would like to start a discussion thread on
> > > "FLIP-156:
> > > >>> > >> Runtime
> > > >>> > >>>>>>>>>> Interfaces for Fine-Grained Resource
> > Requirements"[1],
> > > >>> > >> where we
> > > >>> > >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > interfaces
> > > >>> > >> for
> > > >>> > >>>>>>>>>> specifying fine-grained resource requirements.
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> In this FLIP:
> > > >>> > >>>>>>>>>> - Expound the user story of fine-grained resource
> > > >>> > >> management.
> > > >>> > >>>>>>>>>> - Propose runtime interfaces for specifying
> > SSG-based
> > > >>> > >> resource
> > > >>> > >>>>>>>>>> requirements.
> > > >>> > >>>>>>>>>> - Discuss the pros and cons of the three potential
> > > >>> > >> granularities
> > > >>> > >>>>> for
> > > >>> > >>>>>>>>>> specifying the resource requirements (op, task and
> > > slot
> > > >>> > >> sharing
> > > >>> > >>>>>> group)
> > > >>> > >>>>>>>>>> and explain why we choose the slot sharing group.
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> Please find more details in the FLIP wiki document
> > > [1].
> > > >>> > >> Looking
> > > >>> > >>>>>>>>>> forward to your feedback.
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>>> [1]
> > > >>> > >>>>>>>>>>
> > > >>> > >>
> > > >>> >
> > >
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > >>> > <
> > >
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > >
> > > >>> > >>>>>>>>>> Best,
> > > >>> > >>>>>>>>>> Yangze Guo
> > > >>> > >>>>>>>>>>
> > > >>> > >>>>>>>>
> > > >>> >
> > > >>>
> > >
> >

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Yangze Guo <ka...@gmail.com>.

Thanks for reply, Till and Xintong!

I update the FLIP, including:
- Edit the JavaDoc of the proposed
StreamGraphGenerator#setSlotSharingGroupResource.
- Add "Future Plan" section, which contains the potential follow-up
issues and the limitations to be documented when fine-grained resource
management is exposed to users.

I'll start a vote in another thread.

Best,
Yangze Guo

On Fri, Jan 29, 2021 at 10:07 PM Till Rohrmann <tr...@apache.org> wrote:
>
> Thanks for summarizing the discussion, Yangze. I agree that setting
> resource requirements per operator is not very user friendly. Moreover, I
> couldn't come up with a different proposal which would be as easy to use
> and wouldn't expose internal scheduling details. In fact, following this
> argument then we shouldn't have exposed the slot sharing groups in the
> first place.
>
> What is important for the user is that we properly document the limitations
> and constraints the fine grained resource specification has. For example,
> we should explain how optimizations like chaining are affected by it and
> how different execution modes (batch vs. streaming) affect the execution of
> operators which have specified resources. These things shouldn't become
> part of the contract of this feature and are more caused by internal
> implementation details but it will be important to understand these things
> properly in order to use this feature effectively.
>
> Hence, +1 for starting the vote for this FLIP.
>
> Cheers,
> Till
>
> On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com> wrote:
>
> > Thanks for the summary, Yangze.
> >
> > The changes and follow-up issues LGTM. Let's wait for responses from the
> > others before starting a vote.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> > > Thanks everyone for the lively discussion. I'd like to try to
> > > summarize the current convergence in the discussion. Please let me
> > > know if I got things wrong or missed something crucial here.
> > >
> > > Change of this FLIP:
> > > - Treat the SSG resource requirements as a hint instead of a
> > > restriction for the runtime. That's should be explicitly explained in
> > > the JavaDocs.
> > >
> > > Potential follow-up issues if needed:
> > > - Provide operator-level resource configuration interface.
> > > - Provide multiple options for deciding resources for SSGs whose
> > > requirement is not specified:
> > >     ** Default slot resource.
> > >     ** Default operator resource times number of operators.
> > >
> > > If there are no other issues, I'll update the FLIP accordingly and
> > > start a vote thread. Thanks all for the valuable feedback again.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > Best,
> > > Yangze Guo
> > >
> > >
> > > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > >
> > > >  FGRuntimeInterface.png
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >>
> > > >> I think Chesnay's proposal could actually work. IIUC, the keypoint is
> > > to derive operator requirements from SSG requirements on the API side, so
> > > that the runtime only deals with operator requirements. It's debatable
> > how
> > > the deriving should be done though. E.g., an alternative could be to
> > evenly
> > > divide the SSG requirement into requirements of operators in the group.
> > > >>
> > > >>
> > > >> However, I'm not entirely sure which option is more desired.
> > > Illustrating my understanding in the following figure, in which on the
> > top
> > > is Chesnay's proposal and on the bottom is the SSG-based proposal in this
> > > FLIP.
> > > >>
> > > >>
> > > >>
> > > >> I think the major difference between the two approaches is where
> > > deriving operator requirements from SSG requirements happens.
> > > >>
> > > >> - Chesnay's proposal simplifies the runtime logic and the interface to
> > > expose, at the price of moving more complexity (i.e. the deriving) to the
> > > API side. The question is, where do we prefer to keep the complexity? I'm
> > > slightly leaning towards having a thin API and keep the complexity in
> > > runtime if possible.
> > > >>
> > > >> - Notice that the dash line arrows represent optional steps that are
> > > needed only for schedulers that do not respect SSGs, which we don't have
> > at
> > > the moment. If we only look at the solid line arrows, then the SSG-based
> > > approach is much simpler, without needing to derive and aggregate the
> > > requirements back and forth. I'm not sure about complicating the current
> > > design only for the potential future needs.
> > > >>
> > > >>
> > > >> Thank you~
> > > >>
> > > >> Xintong Song
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ch...@apache.org>
> > > wrote:
> > > >>>
> > > >>> You're raising a good point, but I think I can rectify that with a
> > > minor
> > > >>> adjustment.
> > > >>>
> > > >>> Default requirements are whatever the default requirements are,
> > setting
> > > >>> the requirements for one operator has no effect on other operators.
> > > >>>
> > > >>> With these rules, and some API enhancements, the following mockup
> > would
> > > >>> replicate the SSG-based behavior:
> > > >>>
> > > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > > >>>      vertices = slotSharingGroup.getVertices()
> > > >>>
> > >
> > vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > > >>> vertices.remainint().setRequirements(ZERO)
> > > >>> }
> > > >>>
> > > >>> We could even allow setting requirements on slotsharing-groups
> > > >>> colocation-groups and internally translate them accordingly.
> > > >>> I can't help but feel this is a plain API issue.
> > > >>>
> > > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > > >>> > If I understand you correctly Chesnay, then you want to decouple
> > the
> > > >>> > resource requirement specification from the slot sharing group
> > > >>> > assignment. Hence, per default all operators would be in the same
> > > slot
> > > >>> > sharing group. If there is no operator with a resource
> > specification,
> > > >>> > then the system would allocate a default slot for it. If there is
> > at
> > > >>> > least one operator, then the system would sum up all the specified
> > > >>> > resources and allocate a slot of this size. This effectively means
> > > >>> > that all unspecified operators will implicitly have a zero resource
> > > >>> > requirement. Did I understand your idea correctly?
> > > >>> >
> > > >>> > I am wondering whether this wouldn't lead to a surprising behaviour
> > > >>> > for the user. If the user specifies the resource requirements for a
> > > >>> > single operator, then he probably will assume that the other
> > > operators
> > > >>> > will get the default share of resources and not nothing.
> > > >>> >
> > > >>> > Cheers,
> > > >>> > Till
> > > >>> >
> > > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> > chesnay@apache.org
> > > >>> > <ma...@apache.org>> wrote:
> > > >>> >
> > > >>> >     Is there even a functional difference between specifying the
> > > >>> >     requirements for an SSG vs specifying the same requirements on
> > a
> > > >>> >     single
> > > >>> >     operator within that group (ideally a colocation group to avoid
> > > this
> > > >>> >     whole hint business)?
> > > >>> >
> > > >>> >     Wouldn't we get the best of both worlds in the latter case?
> > > >>> >
> > > >>> >     Users can take shortcuts to define shared requirements,
> > > >>> >     but refine them further as needed on a per-operator basis,
> > > >>> >     without changing semantics of slotsharing groups
> > > >>> >     nor the runtime being locked into SSG-based requirements.
> > > >>> >
> > > >>> >     (And before anyone argues what happens if slotsharing groups
> > > >>> >     change or
> > > >>> >     whatnot, that's a plain API issue that we could surely solve.
> > (A
> > > >>> >     plain
> > > >>> >     iteration over slotsharing groups and therein contained
> > operators
> > > >>> >     would
> > > >>> >     suffice)).
> > > >>> >
> > > >>> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > > >>> >     > Maybe a different minor idea: Would it be possible to treat
> > > the SSG
> > > >>> >     > resource requirements as a hint for the runtime similar to
> > how
> > > >>> >     slot sharing
> > > >>> >     > groups are designed at the moment? Meaning that we don't give
> > > >>> >     the guarantee
> > > >>> >     > that Flink will always deploy this set of tasks together no
> > > >>> >     matter what
> > > >>> >     > comes. If, for example, the runtime can derive by some means
> > > the
> > > >>> >     resource
> > > >>> >     > requirements for each task based on the requirements for the
> > > >>> >     SSG, this
> > > >>> >     > could be possible. One easy strategy would be to give every
> > > task
> > > >>> >     the same
> > > >>> >     > resources as the whole slot sharing group. Another one could
> > be
> > > >>> >     > distributing the resources equally among the tasks. This does
> > > >>> >     not even have
> > > >>> >     > to be implemented but we would give ourselves the freedom to
> > > change
> > > >>> >     > scheduling if need should arise.
> > > >>> >     >
> > > >>> >     > Cheers,
> > > >>> >     > Till
> > > >>> >     >
> > > >>> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> > karmagyz@gmail.com
> > > >>> >     <ma...@gmail.com>> wrote:
> > > >>> >     >
> > > >>> >     >> Thanks for the responses, Till and Xintong.
> > > >>> >     >>
> > > >>> >     >> I second Xintong's comment that SSG-based runtime interface
> > > >>> >     will give
> > > >>> >     >> us the flexibility to achieve op/task-based approach. That's
> > > one of
> > > >>> >     >> the most important reasons for our design choice.
> > > >>> >     >>
> > > >>> >     >> Some cents regarding the default operator resource:
> > > >>> >     >> - It might be good for the scenario of DataStream jobs.
> > > >>> >     >>     ** For light-weight operators, the accumulative
> > > >>> >     configuration error
> > > >>> >     >> will not be significant. Then, the resource of a task used
> > is
> > > >>> >     >> proportional to the number of operators it contains.
> > > >>> >     >>     ** For heavy operators like join and window or operators
> > > >>> >     using the
> > > >>> >     >> external resources, user will turn to the fine-grained
> > > resource
> > > >>> >     >> configuration.
> > > >>> >     >> - It can increase the stability for the standalone cluster
> > > >>> >     where task
> > > >>> >     >> executors registered are heterogeneous(with different
> > default
> > > slot
> > > >>> >     >> resources).
> > > >>> >     >> - It might not be good for SQL users. The operators that SQL
> > > >>> >     will be
> > > >>> >     >> transferred to is a black box to the user. We also do not
> > > guarantee
> > > >>> >     >> the cross-version of consistency of the transformation so
> > far.
> > > >>> >     >>
> > > >>> >     >> I think it can be treated as a follow-up work when the
> > > fine-grained
> > > >>> >     >> resource management is end-to-end ready.
> > > >>> >     >>
> > > >>> >     >> Best,
> > > >>> >     >> Yangze Guo
> > > >>> >     >>
> > > >>> >     >>
> > > >>> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > > >>> >     <tonysong820@gmail.com <ma...@gmail.com>>
> > > >>> >     >> wrote:
> > > >>> >     >>> Thanks for the feedback, Till.
> > > >>> >     >>>
> > > >>> >     >>> ## I feel that what you proposed (operator-based + default
> > > >>> >     value) might
> > > >>> >     >> be
> > > >>> >     >>> subsumed by the SSG-based approach.
> > > >>> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> > > >>> >     categorized by
> > > >>> >     >>> whether the resource requirements are known to the users.
> > > >>> >     >>>
> > > >>> >     >>>     1. *Both known.* As previously mentioned, there's no
> > > >>> >     reason to put
> > > >>> >     >>>     multiple operators whose individual resource
> > requirements
> > > >>> >     are already
> > > >>> >     >> known
> > > >>> >     >>>     into the same group in fine-grained resource
> > management.
> > > >>> >     And if op_1
> > > >>> >     >> and
> > > >>> >     >>>     op_2 are in different groups, there should be no
> > problem
> > > >>> >     switching
> > > >>> >     >> data
> > > >>> >     >>>     exchange mode from pipelined to blocking. This is
> > > >>> >     equivalent to
> > > >>> >     >> specifying
> > > >>> >     >>>     operator resource requirements in your proposal.
> > > >>> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except
> > that
> > > >>> >     op_2 is in a
> > > >>> >     >>>     SSG whose resource is not specified thus would have the
> > > >>> >     default slot
> > > >>> >     >>>     resource. This is equivalent to having default operator
> > > >>> >     resources in
> > > >>> >     >> your
> > > >>> >     >>>     proposal.
> > > >>> >     >>>     3. *Both unknown*. The user can either set op_1 and
> > op_2
> > > >>> >     to the same
> > > >>> >     >> SSG
> > > >>> >     >>>     or separate SSGs.
> > > >>> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
> > > >>> >     equivalent to
> > > >>> >     >> the
> > > >>> >     >>>        coarse-grained resource management, where op_1 and
> > > op_2
> > > >>> >     share a
> > > >>> >     >> default
> > > >>> >     >>>        size slot no matter which data exchange mode is
> > used.
> > > >>> >     >>>        - If op_1 and op_2 are in different SSGs, then each
> > of
> > > >>> >     them will
> > > >>> >     >> use
> > > >>> >     >>>        a default size slot. This is equivalent to setting
> > > them
> > > >>> >     with
> > > >>> >     >> default
> > > >>> >     >>>        operator resources in your proposal.
> > > >>> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > > is
> > > >>> >     known.*
> > > >>> >     >>>        - It is possible that the user learns the total /
> > max
> > > >>> >     resource
> > > >>> >     >>>        requirement from executing and monitoring the job,
> > > >>> >     while not
> > > >>> >     >>> being aware of
> > > >>> >     >>>        individual operator requirements.
> > > >>> >     >>>        - I believe this is the case your proposal does not
> > > >>> >     cover. And TBH,
> > > >>> >     >>>        this is probably how most users learn the resource
> > > >>> >     requirements,
> > > >>> >     >>> according
> > > >>> >     >>>        to my experiences.
> > > >>> >     >>>        - In this case, the user might need to specify
> > > >>> >     different resources
> > > >>> >     >> if
> > > >>> >     >>>        he wants to switch the execution mode, which should
> > > not
> > > >>> >     be worse
> > > >>> >     >> than not
> > > >>> >     >>>        being able to use fine-grained resource management.
> > > >>> >     >>>
> > > >>> >     >>>
> > > >>> >     >>> ## An additional idea inspired by your proposal.
> > > >>> >     >>> We may provide multiple options for deciding resources for
> > > >>> >     SSGs whose
> > > >>> >     >>> requirement is not specified, if needed.
> > > >>> >     >>>
> > > >>> >     >>>     - Default slot resource (current design)
> > > >>> >     >>>     - Default operator resource times number of operators
> > > >>> >     (equivalent to
> > > >>> >     >>>     your proposal)
> > > >>> >     >>>
> > > >>> >     >>>
> > > >>> >     >>> ## Exposing internal runtime strategies
> > > >>> >     >>> Theoretically, yes. Tying to the SSGs, the resource
> > > >>> >     requirements might be
> > > >>> >     >>> affected if how SSGs are internally handled changes in
> > > future.
> > > >>> >     >> Practically,
> > > >>> >     >>> I do not concretely see at the moment what kind of changes
> > we
> > > >>> >     may want in
> > > >>> >     >>> future that might conflict with this FLIP proposal, as the
> > > >>> >     question of
> > > >>> >     >>> switching data exchange mode answered above. I'd suggest to
> > > >>> >     not give up
> > > >>> >     >> the
> > > >>> >     >>> user friendliness we may gain now for the future problems
> > > that
> > > >>> >     may or may
> > > >>> >     >>> not exist.
> > > >>> >     >>>
> > > >>> >     >>> Moreover, the SSG-based approach has the flexibility to
> > > >>> >     achieve the
> > > >>> >     >>> equivalent behavior as the operator-based approach, if we
> > > set each
> > > >>> >     >> operator
> > > >>> >     >>> (or task) to a separate SSG. We can even provide a shortcut
> > > >>> >     option to
> > > >>> >     >>> automatically do that for users, if needed.
> > > >>> >     >>>
> > > >>> >     >>>
> > > >>> >     >>> Thank you~
> > > >>> >     >>>
> > > >>> >     >>> Xintong Song
> > > >>> >     >>>
> > > >>> >     >>>
> > > >>> >     >>>
> > > >>> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > > >>> >     <trohrmann@apache.org <ma...@apache.org>>
> > > >>> >     >> wrote:
> > > >>> >     >>>> Thanks for the responses Xintong and Stephan,
> > > >>> >     >>>>
> > > >>> >     >>>> I agree that being able to define the resource
> > requirements
> > > for a
> > > >>> >     >> group of
> > > >>> >     >>>> operators is more user friendly. However, my concern is
> > that
> > > >>> >     we are
> > > >>> >     >>>> exposing thereby internal runtime strategies which might
> > > >>> >     limit our
> > > >>> >     >>>> flexibility to execute a given job. Moreover, the
> > semantics
> > > of
> > > >>> >     >> configuring
> > > >>> >     >>>> resource requirements for SSGs could break if switching
> > from
> > > >>> >     streaming
> > > >>> >     >> to
> > > >>> >     >>>> batch execution. If one defines the resource requirements
> > > for
> > > >>> >     op_1 ->
> > > >>> >     >> op_2
> > > >>> >     >>>> which run in pipelined mode when using the streaming
> > > >>> >     execution, then
> > > >>> >     >> how do
> > > >>> >     >>>> we interpret these requirements when op_1 -> op_2 are
> > > >>> >     executed with a
> > > >>> >     >>>> blocking data exchange in batch execution mode?
> > > Consequently,
> > > >>> >     I am
> > > >>> >     >> still
> > > >>> >     >>>> leaning towards Stephan's proposal to set the resource
> > > >>> >     requirements per
> > > >>> >     >>>> operator.
> > > >>> >     >>>>
> > > >>> >     >>>> Maybe the following proposal makes the configuration
> > easier:
> > > >>> >     If the
> > > >>> >     >> user
> > > >>> >     >>>> wants to use fine-grained resource requirements, then she
> > > >>> >     needs to
> > > >>> >     >> specify
> > > >>> >     >>>> the default size which is used for operators which have no
> > > >>> >     explicit
> > > >>> >     >>>> resource annotation. If this holds true, then every
> > operator
> > > >>> >     would
> > > >>> >     >> have a
> > > >>> >     >>>> resource requirement and the system can try to execute the
> > > >>> >     operators
> > > >>> >     >> in the
> > > >>> >     >>>> best possible manner w/o being constrained by how the user
> > > >>> >     set the SSG
> > > >>> >     >>>> requirements.
> > > >>> >     >>>>
> > > >>> >     >>>> Cheers,
> > > >>> >     >>>> Till
> > > >>> >     >>>>
> > > >>> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > > >>> >     <tonysong820@gmail.com <ma...@gmail.com>>
> > > >>> >     >>>> wrote:
> > > >>> >     >>>>
> > > >>> >     >>>>> Thanks for the feedback, Stephan.
> > > >>> >     >>>>>
> > > >>> >     >>>>> Actually, your proposal has also come to my mind at some
> > > >>> >     point. And I
> > > >>> >     >>>> have
> > > >>> >     >>>>> some concerns about it.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> 1. It does not give users the same control as the
> > SSG-based
> > > >>> >     approach.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> While both approaches do not require specifying for each
> > > >>> >     operator,
> > > >>> >     >>>>> SSG-based approach supports the semantic that "some
> > > operators
> > > >>> >     >> together
> > > >>> >     >>>> use
> > > >>> >     >>>>> this much resource" while the operator-based approach
> > > doesn't.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> > > >>> >     o_m), and
> > > >>> >     >> at
> > > >>> >     >>>> some
> > > >>> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
> > > >>> >     reduces the
> > > >>> >     >> data
> > > >>> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> > > >>> >     (o_1, ...,
> > > >>> >     >> o_n)
> > > >>> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> > higher
> > > >>> >     >> parallelisms
> > > >>> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> > > >>> >     lead to too
> > > >>> >     >> much
> > > >>> >     >>>>> wasting of resources. If the two SSGs end up needing
> > > different
> > > >>> >     >> resources,
> > > >>> >     >>>>> with the SSG-based approach one can directly specify
> > > >>> >     resources for
> > > >>> >     >> the
> > > >>> >     >>>> two
> > > >>> >     >>>>> groups. However, with the operator-based approach, the
> > > user will
> > > >>> >     >> have to
> > > >>> >     >>>>> specify resources for each operator in one of the two
> > > >>> >     groups, and
> > > >>> >     >> tune
> > > >>> >     >>>> the
> > > >>> >     >>>>> default slot resource via configurations to fit the other
> > > group.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> 2. It increases the chance of breaking operator chains.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> Setting chainnable operators into different slot sharing
> > > >>> >     groups will
> > > >>> >     >>>>> prevent them from being chained. In the current
> > > implementation,
> > > >>> >     >>>> downstream
> > > >>> >     >>>>> operators, if SSG not explicitly specified, will be set
> > to
> > > >>> >     the same
> > > >>> >     >> group
> > > >>> >     >>>>> as the chainable upstream operators (unless multiple
> > > upstream
> > > >>> >     >> operators
> > > >>> >     >>>> in
> > > >>> >     >>>>> different groups), to reduce the chance of breaking
> > chains.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> > > >>> >     deciding
> > > >>> >     >> SSGs
> > > >>> >     >>>>> based on whether resource is specified we will easily get
> > > >>> >     groups like
> > > >>> >     >>>> (o_1,
> > > >>> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > > >>> >     chained. This
> > > >>> >     >> is
> > > >>> >     >>>> also
> > > >>> >     >>>>> possible for the SSG-based approach, but I believe the
> > > >>> >     chance is much
> > > >>> >     >>>>> smaller because there's no strong reason for users to
> > > >>> >     specify the
> > > >>> >     >> groups
> > > >>> >     >>>>> with alternate operators like that. We are more likely to
> > > >>> >     get groups
> > > >>> >     >> like
> > > >>> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > > between
> > > >>> >     o_2 and
> > > >>> >     >> o_3.
> > > >>> >     >>>>>
> > > >>> >     >>>>> 3. It complicates the system by having two different
> > > >>> >     mechanisms for
> > > >>> >     >>>> sharing
> > > >>> >     >>>>> managed memory in  a slot.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> - In FLIP-141, we introduced the intra-slot managed
> > memory
> > > >>> >     sharing
> > > >>> >     >>>>> mechanism, where managed memory is first distributed
> > > >>> >     according to the
> > > >>> >     >>>>> consumer type, then further distributed across operators
> > > of that
> > > >>> >     >> consumer
> > > >>> >     >>>>> type.
> > > >>> >     >>>>>
> > > >>> >     >>>>> - With the operator-based approach, managed memory size
> > > >>> >     specified
> > > >>> >     >> for an
> > > >>> >     >>>>> operator should account for all the consumer types of
> > that
> > > >>> >     operator.
> > > >>> >     >> That
> > > >>> >     >>>>> means the managed memory is first distributed across
> > > >>> >     operators, then
> > > >>> >     >>>>> distributed to different consumer types of each operator.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> Unfortunately, the different order of the two calculation
> > > >>> >     steps can
> > > >>> >     >> lead
> > > >>> >     >>>> to
> > > >>> >     >>>>> different results. To be specific, the semantic of the
> > > >>> >     configuration
> > > >>> >     >>>> option
> > > >>> >     >>>>> `consumer-weights` changed (within a slot vs. within an
> > > >>> >     operator).
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> To sum up things:
> > > >>> >     >>>>>
> > > >>> >     >>>>> While (3) might be a bit more implementation related, I
> > > >>> >     think (1)
> > > >>> >     >> and (2)
> > > >>> >     >>>>> somehow suggest that, the price for the proposed approach
> > > to
> > > >>> >     avoid
> > > >>> >     >>>>> specifying resource for every operator is that it's not
> > as
> > > >>> >     >> independent
> > > >>> >     >>>> from
> > > >>> >     >>>>> operator chaining and slot sharing as the operator-based
> > > >>> >     approach
> > > >>> >     >>>> discussed
> > > >>> >     >>>>> in the FLIP.
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> Thank you~
> > > >>> >     >>>>>
> > > >>> >     >>>>> Xintong Song
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>>
> > > >>> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > > >>> >     <sewen@apache.org <ma...@apache.org>>
> > > >>> >     >> wrote:
> > > >>> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > > >>> >     >>>>>>
> > > >>> >     >>>>>> I want to say, first of all, that this is super well
> > > >>> >     written. And
> > > >>> >     >> the
> > > >>> >     >>>>>> points that the FLIP makes about how to expose the
> > > >>> >     configuration to
> > > >>> >     >>>> users
> > > >>> >     >>>>>> is exactly the right thing to figure out first.
> > > >>> >     >>>>>> So good job here!
> > > >>> >     >>>>>>
> > > >>> >     >>>>>> About how to let users specify the resource profiles.
> > If I
> > > >>> >     can sum
> > > >>> >     >> the
> > > >>> >     >>>>> FLIP
> > > >>> >     >>>>>> and previous discussion up in my own words, the problem
> > > is the
> > > >>> >     >>>> following:
> > > >>> >     >>>>>> Operator-level specification is the simplest and
> > cleanest
> > > >>> >     approach,
> > > >>> >     >>>>> because
> > > >>> >     >>>>>>> it avoids mixing operator configuration (resource) and
> > > >>> >     >> scheduling. No
> > > >>> >     >>>>>>> matter what other parameters change (chaining, slot
> > > sharing,
> > > >>> >     >>>> switching
> > > >>> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
> > > >>> >     stay the
> > > >>> >     >>>> same.
> > > >>> >     >>>>>>> But it would require that a user specifies resources on
> > > all
> > > >>> >     >>>> operators,
> > > >>> >     >>>>>>> which makes it hard to use. That's why the FLIP
> > suggests
> > > going
> > > >>> >     >> with
> > > >>> >     >>>>>>> specifying resources on a Sharing-Group.
> > > >>> >     >>>>>>
> > > >>> >     >>>>>> I think both thoughts are important, so can we find a
> > > solution
> > > >>> >     >> where
> > > >>> >     >>>> the
> > > >>> >     >>>>>> Resource Profiles are specified on an Operator, but we
> > > >>> >     still avoid
> > > >>> >     >> that
> > > >>> >     >>>>> we
> > > >>> >     >>>>>> need to specify a resource profile on every operator?
> > > >>> >     >>>>>>
> > > >>> >     >>>>>> What do you think about something like the following:
> > > >>> >     >>>>>>    - Resource Profiles are specified on an operator
> > level.
> > > >>> >     >>>>>>    - Not all operators need profiles
> > > >>> >     >>>>>>    - All Operators without a Resource Profile ended up
> > in
> > > the
> > > >>> >     >> default
> > > >>> >     >>>> slot
> > > >>> >     >>>>>> sharing group with a default profile (will get a default
> > > slot).
> > > >>> >     >>>>>>    - All Operators with a Resource Profile will go into
> > > >>> >     another slot
> > > >>> >     >>>>> sharing
> > > >>> >     >>>>>> group (the resource-specified-group).
> > > >>> >     >>>>>>    - Users can define different slot sharing groups for
> > > >>> >     operators
> > > >>> >     >> like
> > > >>> >     >>>>> they
> > > >>> >     >>>>>> do now, with the exception that you cannot mix operators
> > > >>> >     that have
> > > >>> >     >> a
> > > >>> >     >>>>>> resource profile and operators that have no resource
> > > profile.
> > > >>> >     >>>>>>    - The default case where no operator has a resource
> > > >>> >     profile is
> > > >>> >     >> just a
> > > >>> >     >>>>>> special case of this model
> > > >>> >     >>>>>>    - The chaining logic sums up the profiles per
> > operator,
> > > >>> >     like it
> > > >>> >     >> does
> > > >>> >     >>>>> now,
> > > >>> >     >>>>>> and the scheduler sums up the profiles of the tasks that
> > > it
> > > >>> >     >> schedules
> > > >>> >     >>>>>> together.
> > > >>> >     >>>>>>
> > > >>> >     >>>>>>
> > > >>> >     >>>>>> There is another question about reactive scaling raised
> > > in the
> > > >>> >     >> FLIP. I
> > > >>> >     >>>>> need
> > > >>> >     >>>>>> to think a bit about that. That is indeed a bit more
> > > tricky
> > > >>> >     once we
> > > >>> >     >>>> have
> > > >>> >     >>>>>> slots of different sizes.
> > > >>> >     >>>>>> It is not clear then which of the different slot
> > requests
> > > the
> > > >>> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
> > > >>> >     show up,
> > > >>> >     >> or how
> > > >>> >     >>>>> the
> > > >>> >     >>>>>> JobManager redistributes the slots resources when
> > > resources
> > > >>> >     (TMs)
> > > >>> >     >>>>> disappear
> > > >>> >     >>>>>> This question is pretty orthogonal, though, to the "how
> > to
> > > >>> >     specify
> > > >>> >     >> the
> > > >>> >     >>>>>> resources".
> > > >>> >     >>>>>>
> > > >>> >     >>>>>>
> > > >>> >     >>>>>> Best,
> > > >>> >     >>>>>> Stephan
> > > >>> >     >>>>>>
> > > >>> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > > >>> >     <tonysong820@gmail.com <ma...@gmail.com>
> > > >>> >     >>>>> wrote:
> > > >>> >     >>>>>>> Thanks for drafting the FLIP and driving the
> > discussion,
> > > >>> >     Yangze.
> > > >>> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> @Till,
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> I agree that specifying requirements for SSGs means
> > that
> > > SSGs
> > > >>> >     >> need to
> > > >>> >     >>>>> be
> > > >>> >     >>>>>>> supported in fine-grained resource management,
> > otherwise
> > > each
> > > >>> >     >>>> operator
> > > >>> >     >>>>>>> might use as many resources as the whole group.
> > However,
> > > I
> > > >>> >     cannot
> > > >>> >     >>>> think
> > > >>> >     >>>>>> of
> > > >>> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
> > > >>> >     resource
> > > >>> >     >>>>>>> management.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>>> Interestingly, if all operators have their resources
> > > properly
> > > >>> >     >>>>>> specified,
> > > >>> >     >>>>>>>> then slot sharing is no longer needed because Flink
> > > could
> > > >>> >     >> slice off
> > > >>> >     >>>>> the
> > > >>> >     >>>>>>>> appropriately sized slots for every Task individually.
> > > >>> >     >>>>>>>>
> > > >>> >     >>>>>>> So for example, if we have a job consisting of two
> > > >>> >     operator op_1
> > > >>> >     >> and
> > > >>> >     >>>>> op_2
> > > >>> >     >>>>>>>> where each op needs 100 MB of memory, we would then
> > say
> > > that
> > > >>> >     >> the
> > > >>> >     >>>> slot
> > > >>> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we
> > have
> > > a
> > > >>> >     >> cluster
> > > >>> >     >>>>> with
> > > >>> >     >>>>>> 2
> > > >>> >     >>>>>>>> TMs with one slot of 100 MB each, then the system
> > > cannot run
> > > >>> >     >> this
> > > >>> >     >>>>> job.
> > > >>> >     >>>>>> If
> > > >>> >     >>>>>>>> the resources were specified on an operator level,
> > then
> > > the
> > > >>> >     >> system
> > > >>> >     >>>>>> could
> > > >>> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > op_2
> > > to
> > > >>> >     >> TM_2.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> Couldn't agree more that if all operators' requirements
> > > are
> > > >>> >     >> properly
> > > >>> >     >>>>>>> specified, slot sharing should be no longer needed. I
> > > >>> >     think this
> > > >>> >     >>>>> exactly
> > > >>> >     >>>>>>> disproves the example. If we already know op_1 and op_2
> > > each
> > > >>> >     >> needs
> > > >>> >     >>>> 100
> > > >>> >     >>>>> MB
> > > >>> >     >>>>>>> of memory, why would we put them in the same group? If
> > > >>> >     they are
> > > >>> >     >> in
> > > >>> >     >>>>>> separate
> > > >>> >     >>>>>>> groups, with the proposed approach the system can
> > freely
> > > >>> >     deploy
> > > >>> >     >> them
> > > >>> >     >>>> to
> > > >>> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> Moreover, the precondition for not needing slot sharing
> > > is
> > > >>> >     having
> > > >>> >     >>>>>> resource
> > > >>> >     >>>>>>> requirements properly specified for all operators. This
> > > is not
> > > >>> >     >> always
> > > >>> >     >>>>>>> possible, and usually requires tremendous efforts. One
> > > of the
> > > >>> >     >>>> benefits
> > > >>> >     >>>>>> for
> > > >>> >     >>>>>>> SSG-based requirements is that it allows the user to
> > > freely
> > > >>> >     >> decide
> > > >>> >     >>>> the
> > > >>> >     >>>>>>> granularity, thus efforts they want to pay. I would
> > > >>> >     consider SSG
> > > >>> >     >> in
> > > >>> >     >>>>>>> fine-grained resource management as a group of
> > operators
> > > >>> >     that the
> > > >>> >     >>>> user
> > > >>> >     >>>>>>> would like to specify the total resource for. There can
> > > be
> > > >>> >     only
> > > >>> >     >> one
> > > >>> >     >>>>> group
> > > >>> >     >>>>>>> in the job, 2~3 groups dividing the job into a few
> > major
> > > >>> >     parts,
> > > >>> >     >> or as
> > > >>> >     >>>>>> many
> > > >>> >     >>>>>>> groups as the number of tasks/operators, depending on
> > how
> > > >>> >     >>>> fine-grained
> > > >>> >     >>>>>> the
> > > >>> >     >>>>>>> user is able to specify the resources.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> Having to support SSGs might be a constraint. But given
> > > >>> >     that all
> > > >>> >     >> the
> > > >>> >     >>>>>>> current scheduler implementations already support
> > SSGs, I
> > > >>> >     tend to
> > > >>> >     >>>> think
> > > >>> >     >>>>>>> that as an acceptable price for the above discussed
> > > >>> >     usability and
> > > >>> >     >>>>>>> flexibility.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> @Chesnay
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> Will declaring them on slot sharing groups not also
> > waste
> > > >>> >     >> resources
> > > >>> >     >>>> if
> > > >>> >     >>>>>> the
> > > >>> >     >>>>>>>> parallelism of operators within that group are
> > > different?
> > > >>> >     >>>>>>>>
> > > >>> >     >>>>>>> Yes. It's a trade-off between usability and resource
> > > >>> >     >> utilization. To
> > > >>> >     >>>>>> avoid
> > > >>> >     >>>>>>> such wasting, the user can define more groups, so that
> > > >>> >     each group
> > > >>> >     >>>>>> contains
> > > >>> >     >>>>>>> less operators and the chance of having operators with
> > > >>> >     different
> > > >>> >     >>>>>>> parallelism will be reduced. The price is to have more
> > > >>> >     resource
> > > >>> >     >>>>>>> requirements to specify.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> It also seems like quite a hassle for users having to
> > > >>> >     >> recalculate the
> > > >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> > > >>> >     >>>>>>>> I'd think that it's not really workable for users that
> > > create
> > > >>> >     >> a set
> > > >>> >     >>>>> of
> > > >>> >     >>>>>>>> re-usable operators which are mixed and matched in
> > their
> > > >>> >     >>>>> applications;
> > > >>> >     >>>>>>>> managing the resources requirements in such a setting
> > > >>> >     would be
> > > >>> >     >> a
> > > >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> > > >>> >     >> requirements
> > > >>> >     >>>>> any
> > > >>> >     >>>>>>>> way.
> > > >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> > > increases
> > > >>> >     >>>>> usability.
> > > >>> >     >>>>>>>     - As mentioned in my reply to Till's comment,
> > > there's no
> > > >>> >     >> reason to
> > > >>> >     >>>>> put
> > > >>> >     >>>>>>>     multiple operators whose individual resource
> > > >>> >     requirements are
> > > >>> >     >>>>> already
> > > >>> >     >>>>>>> known
> > > >>> >     >>>>>>>     into the same group in fine-grained resource
> > > management.
> > > >>> >     >>>>>>>     - Even an operator implementation is reused for
> > > multiple
> > > >>> >     >>>>> applications,
> > > >>> >     >>>>>>>     it does not guarantee the same resource
> > requirements.
> > > >>> >     During
> > > >>> >     >> our
> > > >>> >     >>>>> years
> > > >>> >     >>>>>>> of
> > > >>> >     >>>>>>>     practices in Alibaba, with per-operator
> > requirements
> > > >>> >     >> specified for
> > > >>> >     >>>>>>> Blink's
> > > >>> >     >>>>>>>     fine-grained resource management, very few users
> > > >>> >     (including
> > > >>> >     >> our
> > > >>> >     >>>>>>> specialists
> > > >>> >     >>>>>>>     who are dedicated to supporting Blink users) are as
> > > >>> >     >> experienced as
> > > >>> >     >>>>> to
> > > >>> >     >>>>>>>     accurately predict/estimate the operator resource
> > > >>> >     >> requirements.
> > > >>> >     >>>> Most
> > > >>> >     >>>>>>> people
> > > >>> >     >>>>>>>     rely on the execution-time metrics (throughput,
> > > delay, cpu
> > > >>> >     >> load,
> > > >>> >     >>>>>> memory
> > > >>> >     >>>>>>>     usage, GC pressure, etc.) to improve the
> > > specification.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> To sum up:
> > > >>> >     >>>>>>> If the user is capable of providing proper resource
> > > >>> >     requirements
> > > >>> >     >> for
> > > >>> >     >>>>>> every
> > > >>> >     >>>>>>> operator, that's definitely a good thing and we would
> > not
> > > >>> >     need to
> > > >>> >     >>>> rely
> > > >>> >     >>>>> on
> > > >>> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > > >>> >     >> fine-grained
> > > >>> >     >>>>>> resource
> > > >>> >     >>>>>>> management to work. For those users who are capable and
> > > do not
> > > >>> >     >> like
> > > >>> >     >>>>>> having
> > > >>> >     >>>>>>> to set each operator to a separate SSG, I would be ok
> > to
> > > have
> > > >>> >     >> both
> > > >>> >     >>>>>>> SSG-based and operator-based runtime interfaces and to
> > > only
> > > >>> >     >> fallback
> > > >>> >     >>>> to
> > > >>> >     >>>>>> the
> > > >>> >     >>>>>>> SSG requirements when the operator requirements are not
> > > >>> >     >> specified.
> > > >>> >     >>>>>> However,
> > > >>> >     >>>>>>> as the first step, I think we should prioritise the use
> > > cases
> > > >>> >     >> where
> > > >>> >     >>>>> users
> > > >>> >     >>>>>>> are not that experienced.
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> Thank you~
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> Xintong Song
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > > >>> >     >> chesnay@apache.org <ma...@apache.org>>
> > > >>> >     >>>>>>> wrote:
> > > >>> >     >>>>>>>
> > > >>> >     >>>>>>>> Will declaring them on slot sharing groups not also
> > > waste
> > > >>> >     >> resources
> > > >>> >     >>>>> if
> > > >>> >     >>>>>>>> the parallelism of operators within that group are
> > > different?
> > > >>> >     >>>>>>>>
> > > >>> >     >>>>>>>> It also seems like quite a hassle for users having to
> > > >>> >     >> recalculate
> > > >>> >     >>>> the
> > > >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> > > >>> >     >>>>>>>> I'd think that it's not really workable for users that
> > > create
> > > >>> >     >> a set
> > > >>> >     >>>>> of
> > > >>> >     >>>>>>>> re-usable operators which are mixed and matched in
> > their
> > > >>> >     >>>>> applications;
> > > >>> >     >>>>>>>> managing the resources requirements in such a setting
> > > >>> >     would be
> > > >>> >     >> a
> > > >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> > > >>> >     >> requirements
> > > >>> >     >>>>> any
> > > >>> >     >>>>>>>> way.
> > > >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> > > increases
> > > >>> >     >>>>> usability.
> > > >>> >     >>>>>>>> My main worry is that it if we wire the runtime to
> > work
> > > >>> >     on SSGs
> > > >>> >     >>>> it's
> > > >>> >     >>>>>>>> gonna be difficult to implement more fine-grained
> > > approaches,
> > > >>> >     >> which
> > > >>> >     >>>>>>>> would not be the case if, for the runtime, they are
> > > always
> > > >>> >     >> defined
> > > >>> >     >>>> on
> > > >>> >     >>>>>> an
> > > >>> >     >>>>>>>> operator-level.
> > > >>> >     >>>>>>>>
> > > >>> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > >>> >     >>>>>>>>> Thanks for drafting this FLIP and starting this
> > > discussion
> > > >>> >     >>>> Yangze.
> > > >>> >     >>>>>>>>> I like that defining resource requirements on a slot
> > > sharing
> > > >>> >     >>>> group
> > > >>> >     >>>>>>> makes
> > > >>> >     >>>>>>>>> the overall setup easier and improves usability of
> > > resource
> > > >>> >     >>>>>>> requirements.
> > > >>> >     >>>>>>>>> What I do not like about it is that it changes slot
> > > sharing
> > > >>> >     >>>> groups
> > > >>> >     >>>>>> from
> > > >>> >     >>>>>>>>> being a scheduling hint to something which needs to
> > be
> > > >>> >     >> supported
> > > >>> >     >>>> in
> > > >>> >     >>>>>>> order
> > > >>> >     >>>>>>>>> to support fine grained resource requirements. So
> > far,
> > > the
> > > >>> >     >> idea
> > > >>> >     >>>> of
> > > >>> >     >>>>>> slot
> > > >>> >     >>>>>>>>> sharing groups was that it tells the system that a
> > set
> > > of
> > > >>> >     >>>> operators
> > > >>> >     >>>>>> can
> > > >>> >     >>>>>>>> be
> > > >>> >     >>>>>>>>> deployed in the same slot. But the system still had
> > the
> > > >>> >     >> freedom
> > > >>> >     >>>> to
> > > >>> >     >>>>>> say
> > > >>> >     >>>>>>>> that
> > > >>> >     >>>>>>>>> it would rather place these tasks in different slots
> > > if it
> > > >>> >     >>>> wanted.
> > > >>> >     >>>>> If
> > > >>> >     >>>>>>> we
> > > >>> >     >>>>>>>>> now specify resource requirements on a per slot
> > sharing
> > > >>> >     >> group,
> > > >>> >     >>>> then
> > > >>> >     >>>>>> the
> > > >>> >     >>>>>>>>> only option for a scheduler which does not support
> > slot
> > > >>> >     >> sharing
> > > >>> >     >>>>>> groups
> > > >>> >     >>>>>>> is
> > > >>> >     >>>>>>>>> to say that every operator in this slot sharing group
> > > >>> >     needs a
> > > >>> >     >>>> slot
> > > >>> >     >>>>>> with
> > > >>> >     >>>>>>>> the
> > > >>> >     >>>>>>>>> same resources as the whole group.
> > > >>> >     >>>>>>>>>
> > > >>> >     >>>>>>>>> So for example, if we have a job consisting of two
> > > operator
> > > >>> >     >> op_1
> > > >>> >     >>>>> and
> > > >>> >     >>>>>>> op_2
> > > >>> >     >>>>>>>>> where each op needs 100 MB of memory, we would then
> > > say that
> > > >>> >     >> the
> > > >>> >     >>>>> slot
> > > >>> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > > have a
> > > >>> >     >> cluster
> > > >>> >     >>>>>> with
> > > >>> >     >>>>>>> 2
> > > >>> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > > cannot run
> > > >>> >     >> this
> > > >>> >     >>>>>> job.
> > > >>> >     >>>>>>> If
> > > >>> >     >>>>>>>>> the resources were specified on an operator level,
> > > then the
> > > >>> >     >>>> system
> > > >>> >     >>>>>>> could
> > > >>> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > > op_2 to
> > > >>> >     >> TM_2.
> > > >>> >     >>>>>>>>> Originally, one of the primary goals of slot sharing
> > > groups
> > > >>> >     >> was
> > > >>> >     >>>> to
> > > >>> >     >>>>>> make
> > > >>> >     >>>>>>>> it
> > > >>> >     >>>>>>>>> easier for the user to reason about how many slots a
> > > job
> > > >>> >     >> needs
> > > >>> >     >>>>>>>> independent
> > > >>> >     >>>>>>>>> of the actual number of operators in the job.
> > > Interestingly,
> > > >>> >     >> if
> > > >>> >     >>>> all
> > > >>> >     >>>>>>>>> operators have their resources properly specified,
> > > then slot
> > > >>> >     >>>>> sharing
> > > >>> >     >>>>>> is
> > > >>> >     >>>>>>>> no
> > > >>> >     >>>>>>>>> longer needed because Flink could slice off the
> > > >>> >     appropriately
> > > >>> >     >>>> sized
> > > >>> >     >>>>>>> slots
> > > >>> >     >>>>>>>>> for every Task individually. What matters is whether
> > > the
> > > >>> >     >> whole
> > > >>> >     >>>>>> cluster
> > > >>> >     >>>>>>>> has
> > > >>> >     >>>>>>>>> enough resources to run all tasks or not.
> > > >>> >     >>>>>>>>>
> > > >>> >     >>>>>>>>> Cheers,
> > > >>> >     >>>>>>>>> Till
> > > >>> >     >>>>>>>>>
> > > >>> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > > >>> >     >> karmagyz@gmail.com <ma...@gmail.com>>
> > > >>> >     >>>>>> wrote:
> > > >>> >     >>>>>>>>>> Hi, there,
> > > >>> >     >>>>>>>>>>
> > > >>> >     >>>>>>>>>> We would like to start a discussion thread on
> > > "FLIP-156:
> > > >>> >     >> Runtime
> > > >>> >     >>>>>>>>>> Interfaces for Fine-Grained Resource
> > Requirements"[1],
> > > >>> >     >> where we
> > > >>> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > > interfaces
> > > >>> >     >> for
> > > >>> >     >>>>>>>>>> specifying fine-grained resource requirements.
> > > >>> >     >>>>>>>>>>
> > > >>> >     >>>>>>>>>> In this FLIP:
> > > >>> >     >>>>>>>>>> - Expound the user story of fine-grained resource
> > > >>> >     >> management.
> > > >>> >     >>>>>>>>>> - Propose runtime interfaces for specifying
> > SSG-based
> > > >>> >     >> resource
> > > >>> >     >>>>>>>>>> requirements.
> > > >>> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
> > > >>> >     >> granularities
> > > >>> >     >>>>> for
> > > >>> >     >>>>>>>>>> specifying the resource requirements (op, task and
> > > slot
> > > >>> >     >> sharing
> > > >>> >     >>>>>> group)
> > > >>> >     >>>>>>>>>> and explain why we choose the slot sharing group.
> > > >>> >     >>>>>>>>>>
> > > >>> >     >>>>>>>>>> Please find more details in the FLIP wiki document
> > > [1].
> > > >>> >     >> Looking
> > > >>> >     >>>>>>>>>> forward to your feedback.
> > > >>> >     >>>>>>>>>>
> > > >>> >     >>>>>>>>>> [1]
> > > >>> >     >>>>>>>>>>
> > > >>> >     >>
> > > >>> >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > >>> >     <
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > >
> > > >>> >     >>>>>>>>>> Best,
> > > >>> >     >>>>>>>>>> Yangze Guo
> > > >>> >     >>>>>>>>>>
> > > >>> >     >>>>>>>>
> > > >>> >
> > > >>>
> > >
> >

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for summarizing the discussion, Yangze. I agree that setting
resource requirements per operator is not very user friendly. Moreover, I
couldn't come up with a different proposal which would be as easy to use
and wouldn't expose internal scheduling details. In fact, following this
argument then we shouldn't have exposed the slot sharing groups in the
first place.

What is important for the user is that we properly document the limitations
and constraints the fine grained resource specification has. For example,
we should explain how optimizations like chaining are affected by it and
how different execution modes (batch vs. streaming) affect the execution of
operators which have specified resources. These things shouldn't become
part of the contract of this feature and are more caused by internal
implementation details but it will be important to understand these things
properly in order to use this feature effectively.

Hence, +1 for starting the vote for this FLIP.

Cheers,
Till

On Tue, Jan 26, 2021 at 4:37 AM Xintong Song <to...@gmail.com> wrote:

> Thanks for the summary, Yangze.
>
> The changes and follow-up issues LGTM. Let's wait for responses from the
> others before starting a vote.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com> wrote:
>
> > Thanks everyone for the lively discussion. I'd like to try to
> > summarize the current convergence in the discussion. Please let me
> > know if I got things wrong or missed something crucial here.
> >
> > Change of this FLIP:
> > - Treat the SSG resource requirements as a hint instead of a
> > restriction for the runtime. That's should be explicitly explained in
> > the JavaDocs.
> >
> > Potential follow-up issues if needed:
> > - Provide operator-level resource configuration interface.
> > - Provide multiple options for deciding resources for SSGs whose
> > requirement is not specified:
> >     ** Default slot resource.
> >     ** Default operator resource times number of operators.
> >
> > If there are no other issues, I'll update the FLIP accordingly and
> > start a vote thread. Thanks all for the valuable feedback again.
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > >
> > >  FGRuntimeInterface.png
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >>
> > >> I think Chesnay's proposal could actually work. IIUC, the keypoint is
> > to derive operator requirements from SSG requirements on the API side, so
> > that the runtime only deals with operator requirements. It's debatable
> how
> > the deriving should be done though. E.g., an alternative could be to
> evenly
> > divide the SSG requirement into requirements of operators in the group.
> > >>
> > >>
> > >> However, I'm not entirely sure which option is more desired.
> > Illustrating my understanding in the following figure, in which on the
> top
> > is Chesnay's proposal and on the bottom is the SSG-based proposal in this
> > FLIP.
> > >>
> > >>
> > >>
> > >> I think the major difference between the two approaches is where
> > deriving operator requirements from SSG requirements happens.
> > >>
> > >> - Chesnay's proposal simplifies the runtime logic and the interface to
> > expose, at the price of moving more complexity (i.e. the deriving) to the
> > API side. The question is, where do we prefer to keep the complexity? I'm
> > slightly leaning towards having a thin API and keep the complexity in
> > runtime if possible.
> > >>
> > >> - Notice that the dash line arrows represent optional steps that are
> > needed only for schedulers that do not respect SSGs, which we don't have
> at
> > the moment. If we only look at the solid line arrows, then the SSG-based
> > approach is much simpler, without needing to derive and aggregate the
> > requirements back and forth. I'm not sure about complicating the current
> > design only for the potential future needs.
> > >>
> > >>
> > >> Thank you~
> > >>
> > >> Xintong Song
> > >>
> > >>
> > >>
> > >>
> > >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ch...@apache.org>
> > wrote:
> > >>>
> > >>> You're raising a good point, but I think I can rectify that with a
> > minor
> > >>> adjustment.
> > >>>
> > >>> Default requirements are whatever the default requirements are,
> setting
> > >>> the requirements for one operator has no effect on other operators.
> > >>>
> > >>> With these rules, and some API enhancements, the following mockup
> would
> > >>> replicate the SSG-based behavior:
> > >>>
> > >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> > >>> for slotSharingGroup in env.getSlotSharingGroups() {
> > >>>      vertices = slotSharingGroup.getVertices()
> > >>>
> >
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> > >>> vertices.remainint().setRequirements(ZERO)
> > >>> }
> > >>>
> > >>> We could even allow setting requirements on slotsharing-groups
> > >>> colocation-groups and internally translate them accordingly.
> > >>> I can't help but feel this is a plain API issue.
> > >>>
> > >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > >>> > If I understand you correctly Chesnay, then you want to decouple
> the
> > >>> > resource requirement specification from the slot sharing group
> > >>> > assignment. Hence, per default all operators would be in the same
> > slot
> > >>> > sharing group. If there is no operator with a resource
> specification,
> > >>> > then the system would allocate a default slot for it. If there is
> at
> > >>> > least one operator, then the system would sum up all the specified
> > >>> > resources and allocate a slot of this size. This effectively means
> > >>> > that all unspecified operators will implicitly have a zero resource
> > >>> > requirement. Did I understand your idea correctly?
> > >>> >
> > >>> > I am wondering whether this wouldn't lead to a surprising behaviour
> > >>> > for the user. If the user specifies the resource requirements for a
> > >>> > single operator, then he probably will assume that the other
> > operators
> > >>> > will get the default share of resources and not nothing.
> > >>> >
> > >>> > Cheers,
> > >>> > Till
> > >>> >
> > >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <
> chesnay@apache.org
> > >>> > <ma...@apache.org>> wrote:
> > >>> >
> > >>> >     Is there even a functional difference between specifying the
> > >>> >     requirements for an SSG vs specifying the same requirements on
> a
> > >>> >     single
> > >>> >     operator within that group (ideally a colocation group to avoid
> > this
> > >>> >     whole hint business)?
> > >>> >
> > >>> >     Wouldn't we get the best of both worlds in the latter case?
> > >>> >
> > >>> >     Users can take shortcuts to define shared requirements,
> > >>> >     but refine them further as needed on a per-operator basis,
> > >>> >     without changing semantics of slotsharing groups
> > >>> >     nor the runtime being locked into SSG-based requirements.
> > >>> >
> > >>> >     (And before anyone argues what happens if slotsharing groups
> > >>> >     change or
> > >>> >     whatnot, that's a plain API issue that we could surely solve.
> (A
> > >>> >     plain
> > >>> >     iteration over slotsharing groups and therein contained
> operators
> > >>> >     would
> > >>> >     suffice)).
> > >>> >
> > >>> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > >>> >     > Maybe a different minor idea: Would it be possible to treat
> > the SSG
> > >>> >     > resource requirements as a hint for the runtime similar to
> how
> > >>> >     slot sharing
> > >>> >     > groups are designed at the moment? Meaning that we don't give
> > >>> >     the guarantee
> > >>> >     > that Flink will always deploy this set of tasks together no
> > >>> >     matter what
> > >>> >     > comes. If, for example, the runtime can derive by some means
> > the
> > >>> >     resource
> > >>> >     > requirements for each task based on the requirements for the
> > >>> >     SSG, this
> > >>> >     > could be possible. One easy strategy would be to give every
> > task
> > >>> >     the same
> > >>> >     > resources as the whole slot sharing group. Another one could
> be
> > >>> >     > distributing the resources equally among the tasks. This does
> > >>> >     not even have
> > >>> >     > to be implemented but we would give ourselves the freedom to
> > change
> > >>> >     > scheduling if need should arise.
> > >>> >     >
> > >>> >     > Cheers,
> > >>> >     > Till
> > >>> >     >
> > >>> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <
> karmagyz@gmail.com
> > >>> >     <ma...@gmail.com>> wrote:
> > >>> >     >
> > >>> >     >> Thanks for the responses, Till and Xintong.
> > >>> >     >>
> > >>> >     >> I second Xintong's comment that SSG-based runtime interface
> > >>> >     will give
> > >>> >     >> us the flexibility to achieve op/task-based approach. That's
> > one of
> > >>> >     >> the most important reasons for our design choice.
> > >>> >     >>
> > >>> >     >> Some cents regarding the default operator resource:
> > >>> >     >> - It might be good for the scenario of DataStream jobs.
> > >>> >     >>     ** For light-weight operators, the accumulative
> > >>> >     configuration error
> > >>> >     >> will not be significant. Then, the resource of a task used
> is
> > >>> >     >> proportional to the number of operators it contains.
> > >>> >     >>     ** For heavy operators like join and window or operators
> > >>> >     using the
> > >>> >     >> external resources, user will turn to the fine-grained
> > resource
> > >>> >     >> configuration.
> > >>> >     >> - It can increase the stability for the standalone cluster
> > >>> >     where task
> > >>> >     >> executors registered are heterogeneous(with different
> default
> > slot
> > >>> >     >> resources).
> > >>> >     >> - It might not be good for SQL users. The operators that SQL
> > >>> >     will be
> > >>> >     >> transferred to is a black box to the user. We also do not
> > guarantee
> > >>> >     >> the cross-version of consistency of the transformation so
> far.
> > >>> >     >>
> > >>> >     >> I think it can be treated as a follow-up work when the
> > fine-grained
> > >>> >     >> resource management is end-to-end ready.
> > >>> >     >>
> > >>> >     >> Best,
> > >>> >     >> Yangze Guo
> > >>> >     >>
> > >>> >     >>
> > >>> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> > >>> >     <tonysong820@gmail.com <ma...@gmail.com>>
> > >>> >     >> wrote:
> > >>> >     >>> Thanks for the feedback, Till.
> > >>> >     >>>
> > >>> >     >>> ## I feel that what you proposed (operator-based + default
> > >>> >     value) might
> > >>> >     >> be
> > >>> >     >>> subsumed by the SSG-based approach.
> > >>> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> > >>> >     categorized by
> > >>> >     >>> whether the resource requirements are known to the users.
> > >>> >     >>>
> > >>> >     >>>     1. *Both known.* As previously mentioned, there's no
> > >>> >     reason to put
> > >>> >     >>>     multiple operators whose individual resource
> requirements
> > >>> >     are already
> > >>> >     >> known
> > >>> >     >>>     into the same group in fine-grained resource
> management.
> > >>> >     And if op_1
> > >>> >     >> and
> > >>> >     >>>     op_2 are in different groups, there should be no
> problem
> > >>> >     switching
> > >>> >     >> data
> > >>> >     >>>     exchange mode from pipelined to blocking. This is
> > >>> >     equivalent to
> > >>> >     >> specifying
> > >>> >     >>>     operator resource requirements in your proposal.
> > >>> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except
> that
> > >>> >     op_2 is in a
> > >>> >     >>>     SSG whose resource is not specified thus would have the
> > >>> >     default slot
> > >>> >     >>>     resource. This is equivalent to having default operator
> > >>> >     resources in
> > >>> >     >> your
> > >>> >     >>>     proposal.
> > >>> >     >>>     3. *Both unknown*. The user can either set op_1 and
> op_2
> > >>> >     to the same
> > >>> >     >> SSG
> > >>> >     >>>     or separate SSGs.
> > >>> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
> > >>> >     equivalent to
> > >>> >     >> the
> > >>> >     >>>        coarse-grained resource management, where op_1 and
> > op_2
> > >>> >     share a
> > >>> >     >> default
> > >>> >     >>>        size slot no matter which data exchange mode is
> used.
> > >>> >     >>>        - If op_1 and op_2 are in different SSGs, then each
> of
> > >>> >     them will
> > >>> >     >> use
> > >>> >     >>>        a default size slot. This is equivalent to setting
> > them
> > >>> >     with
> > >>> >     >> default
> > >>> >     >>>        operator resources in your proposal.
> > >>> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2
> > is
> > >>> >     known.*
> > >>> >     >>>        - It is possible that the user learns the total /
> max
> > >>> >     resource
> > >>> >     >>>        requirement from executing and monitoring the job,
> > >>> >     while not
> > >>> >     >>> being aware of
> > >>> >     >>>        individual operator requirements.
> > >>> >     >>>        - I believe this is the case your proposal does not
> > >>> >     cover. And TBH,
> > >>> >     >>>        this is probably how most users learn the resource
> > >>> >     requirements,
> > >>> >     >>> according
> > >>> >     >>>        to my experiences.
> > >>> >     >>>        - In this case, the user might need to specify
> > >>> >     different resources
> > >>> >     >> if
> > >>> >     >>>        he wants to switch the execution mode, which should
> > not
> > >>> >     be worse
> > >>> >     >> than not
> > >>> >     >>>        being able to use fine-grained resource management.
> > >>> >     >>>
> > >>> >     >>>
> > >>> >     >>> ## An additional idea inspired by your proposal.
> > >>> >     >>> We may provide multiple options for deciding resources for
> > >>> >     SSGs whose
> > >>> >     >>> requirement is not specified, if needed.
> > >>> >     >>>
> > >>> >     >>>     - Default slot resource (current design)
> > >>> >     >>>     - Default operator resource times number of operators
> > >>> >     (equivalent to
> > >>> >     >>>     your proposal)
> > >>> >     >>>
> > >>> >     >>>
> > >>> >     >>> ## Exposing internal runtime strategies
> > >>> >     >>> Theoretically, yes. Tying to the SSGs, the resource
> > >>> >     requirements might be
> > >>> >     >>> affected if how SSGs are internally handled changes in
> > future.
> > >>> >     >> Practically,
> > >>> >     >>> I do not concretely see at the moment what kind of changes
> we
> > >>> >     may want in
> > >>> >     >>> future that might conflict with this FLIP proposal, as the
> > >>> >     question of
> > >>> >     >>> switching data exchange mode answered above. I'd suggest to
> > >>> >     not give up
> > >>> >     >> the
> > >>> >     >>> user friendliness we may gain now for the future problems
> > that
> > >>> >     may or may
> > >>> >     >>> not exist.
> > >>> >     >>>
> > >>> >     >>> Moreover, the SSG-based approach has the flexibility to
> > >>> >     achieve the
> > >>> >     >>> equivalent behavior as the operator-based approach, if we
> > set each
> > >>> >     >> operator
> > >>> >     >>> (or task) to a separate SSG. We can even provide a shortcut
> > >>> >     option to
> > >>> >     >>> automatically do that for users, if needed.
> > >>> >     >>>
> > >>> >     >>>
> > >>> >     >>> Thank you~
> > >>> >     >>>
> > >>> >     >>> Xintong Song
> > >>> >     >>>
> > >>> >     >>>
> > >>> >     >>>
> > >>> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> > >>> >     <trohrmann@apache.org <ma...@apache.org>>
> > >>> >     >> wrote:
> > >>> >     >>>> Thanks for the responses Xintong and Stephan,
> > >>> >     >>>>
> > >>> >     >>>> I agree that being able to define the resource
> requirements
> > for a
> > >>> >     >> group of
> > >>> >     >>>> operators is more user friendly. However, my concern is
> that
> > >>> >     we are
> > >>> >     >>>> exposing thereby internal runtime strategies which might
> > >>> >     limit our
> > >>> >     >>>> flexibility to execute a given job. Moreover, the
> semantics
> > of
> > >>> >     >> configuring
> > >>> >     >>>> resource requirements for SSGs could break if switching
> from
> > >>> >     streaming
> > >>> >     >> to
> > >>> >     >>>> batch execution. If one defines the resource requirements
> > for
> > >>> >     op_1 ->
> > >>> >     >> op_2
> > >>> >     >>>> which run in pipelined mode when using the streaming
> > >>> >     execution, then
> > >>> >     >> how do
> > >>> >     >>>> we interpret these requirements when op_1 -> op_2 are
> > >>> >     executed with a
> > >>> >     >>>> blocking data exchange in batch execution mode?
> > Consequently,
> > >>> >     I am
> > >>> >     >> still
> > >>> >     >>>> leaning towards Stephan's proposal to set the resource
> > >>> >     requirements per
> > >>> >     >>>> operator.
> > >>> >     >>>>
> > >>> >     >>>> Maybe the following proposal makes the configuration
> easier:
> > >>> >     If the
> > >>> >     >> user
> > >>> >     >>>> wants to use fine-grained resource requirements, then she
> > >>> >     needs to
> > >>> >     >> specify
> > >>> >     >>>> the default size which is used for operators which have no
> > >>> >     explicit
> > >>> >     >>>> resource annotation. If this holds true, then every
> operator
> > >>> >     would
> > >>> >     >> have a
> > >>> >     >>>> resource requirement and the system can try to execute the
> > >>> >     operators
> > >>> >     >> in the
> > >>> >     >>>> best possible manner w/o being constrained by how the user
> > >>> >     set the SSG
> > >>> >     >>>> requirements.
> > >>> >     >>>>
> > >>> >     >>>> Cheers,
> > >>> >     >>>> Till
> > >>> >     >>>>
> > >>> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> > >>> >     <tonysong820@gmail.com <ma...@gmail.com>>
> > >>> >     >>>> wrote:
> > >>> >     >>>>
> > >>> >     >>>>> Thanks for the feedback, Stephan.
> > >>> >     >>>>>
> > >>> >     >>>>> Actually, your proposal has also come to my mind at some
> > >>> >     point. And I
> > >>> >     >>>> have
> > >>> >     >>>>> some concerns about it.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> 1. It does not give users the same control as the
> SSG-based
> > >>> >     approach.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> While both approaches do not require specifying for each
> > >>> >     operator,
> > >>> >     >>>>> SSG-based approach supports the semantic that "some
> > operators
> > >>> >     >> together
> > >>> >     >>>> use
> > >>> >     >>>>> this much resource" while the operator-based approach
> > doesn't.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> > >>> >     o_m), and
> > >>> >     >> at
> > >>> >     >>>> some
> > >>> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
> > >>> >     reduces the
> > >>> >     >> data
> > >>> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> > >>> >     (o_1, ...,
> > >>> >     >> o_n)
> > >>> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much
> higher
> > >>> >     >> parallelisms
> > >>> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> > >>> >     lead to too
> > >>> >     >> much
> > >>> >     >>>>> wasting of resources. If the two SSGs end up needing
> > different
> > >>> >     >> resources,
> > >>> >     >>>>> with the SSG-based approach one can directly specify
> > >>> >     resources for
> > >>> >     >> the
> > >>> >     >>>> two
> > >>> >     >>>>> groups. However, with the operator-based approach, the
> > user will
> > >>> >     >> have to
> > >>> >     >>>>> specify resources for each operator in one of the two
> > >>> >     groups, and
> > >>> >     >> tune
> > >>> >     >>>> the
> > >>> >     >>>>> default slot resource via configurations to fit the other
> > group.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> 2. It increases the chance of breaking operator chains.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> Setting chainnable operators into different slot sharing
> > >>> >     groups will
> > >>> >     >>>>> prevent them from being chained. In the current
> > implementation,
> > >>> >     >>>> downstream
> > >>> >     >>>>> operators, if SSG not explicitly specified, will be set
> to
> > >>> >     the same
> > >>> >     >> group
> > >>> >     >>>>> as the chainable upstream operators (unless multiple
> > upstream
> > >>> >     >> operators
> > >>> >     >>>> in
> > >>> >     >>>>> different groups), to reduce the chance of breaking
> chains.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> > >>> >     deciding
> > >>> >     >> SSGs
> > >>> >     >>>>> based on whether resource is specified we will easily get
> > >>> >     groups like
> > >>> >     >>>> (o_1,
> > >>> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
> > >>> >     chained. This
> > >>> >     >> is
> > >>> >     >>>> also
> > >>> >     >>>>> possible for the SSG-based approach, but I believe the
> > >>> >     chance is much
> > >>> >     >>>>> smaller because there's no strong reason for users to
> > >>> >     specify the
> > >>> >     >> groups
> > >>> >     >>>>> with alternate operators like that. We are more likely to
> > >>> >     get groups
> > >>> >     >> like
> > >>> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> > between
> > >>> >     o_2 and
> > >>> >     >> o_3.
> > >>> >     >>>>>
> > >>> >     >>>>> 3. It complicates the system by having two different
> > >>> >     mechanisms for
> > >>> >     >>>> sharing
> > >>> >     >>>>> managed memory in  a slot.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> - In FLIP-141, we introduced the intra-slot managed
> memory
> > >>> >     sharing
> > >>> >     >>>>> mechanism, where managed memory is first distributed
> > >>> >     according to the
> > >>> >     >>>>> consumer type, then further distributed across operators
> > of that
> > >>> >     >> consumer
> > >>> >     >>>>> type.
> > >>> >     >>>>>
> > >>> >     >>>>> - With the operator-based approach, managed memory size
> > >>> >     specified
> > >>> >     >> for an
> > >>> >     >>>>> operator should account for all the consumer types of
> that
> > >>> >     operator.
> > >>> >     >> That
> > >>> >     >>>>> means the managed memory is first distributed across
> > >>> >     operators, then
> > >>> >     >>>>> distributed to different consumer types of each operator.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> Unfortunately, the different order of the two calculation
> > >>> >     steps can
> > >>> >     >> lead
> > >>> >     >>>> to
> > >>> >     >>>>> different results. To be specific, the semantic of the
> > >>> >     configuration
> > >>> >     >>>> option
> > >>> >     >>>>> `consumer-weights` changed (within a slot vs. within an
> > >>> >     operator).
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> To sum up things:
> > >>> >     >>>>>
> > >>> >     >>>>> While (3) might be a bit more implementation related, I
> > >>> >     think (1)
> > >>> >     >> and (2)
> > >>> >     >>>>> somehow suggest that, the price for the proposed approach
> > to
> > >>> >     avoid
> > >>> >     >>>>> specifying resource for every operator is that it's not
> as
> > >>> >     >> independent
> > >>> >     >>>> from
> > >>> >     >>>>> operator chaining and slot sharing as the operator-based
> > >>> >     approach
> > >>> >     >>>> discussed
> > >>> >     >>>>> in the FLIP.
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> Thank you~
> > >>> >     >>>>>
> > >>> >     >>>>> Xintong Song
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>>
> > >>> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> > >>> >     <sewen@apache.org <ma...@apache.org>>
> > >>> >     >> wrote:
> > >>> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> > >>> >     >>>>>>
> > >>> >     >>>>>> I want to say, first of all, that this is super well
> > >>> >     written. And
> > >>> >     >> the
> > >>> >     >>>>>> points that the FLIP makes about how to expose the
> > >>> >     configuration to
> > >>> >     >>>> users
> > >>> >     >>>>>> is exactly the right thing to figure out first.
> > >>> >     >>>>>> So good job here!
> > >>> >     >>>>>>
> > >>> >     >>>>>> About how to let users specify the resource profiles.
> If I
> > >>> >     can sum
> > >>> >     >> the
> > >>> >     >>>>> FLIP
> > >>> >     >>>>>> and previous discussion up in my own words, the problem
> > is the
> > >>> >     >>>> following:
> > >>> >     >>>>>> Operator-level specification is the simplest and
> cleanest
> > >>> >     approach,
> > >>> >     >>>>> because
> > >>> >     >>>>>>> it avoids mixing operator configuration (resource) and
> > >>> >     >> scheduling. No
> > >>> >     >>>>>>> matter what other parameters change (chaining, slot
> > sharing,
> > >>> >     >>>> switching
> > >>> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
> > >>> >     stay the
> > >>> >     >>>> same.
> > >>> >     >>>>>>> But it would require that a user specifies resources on
> > all
> > >>> >     >>>> operators,
> > >>> >     >>>>>>> which makes it hard to use. That's why the FLIP
> suggests
> > going
> > >>> >     >> with
> > >>> >     >>>>>>> specifying resources on a Sharing-Group.
> > >>> >     >>>>>>
> > >>> >     >>>>>> I think both thoughts are important, so can we find a
> > solution
> > >>> >     >> where
> > >>> >     >>>> the
> > >>> >     >>>>>> Resource Profiles are specified on an Operator, but we
> > >>> >     still avoid
> > >>> >     >> that
> > >>> >     >>>>> we
> > >>> >     >>>>>> need to specify a resource profile on every operator?
> > >>> >     >>>>>>
> > >>> >     >>>>>> What do you think about something like the following:
> > >>> >     >>>>>>    - Resource Profiles are specified on an operator
> level.
> > >>> >     >>>>>>    - Not all operators need profiles
> > >>> >     >>>>>>    - All Operators without a Resource Profile ended up
> in
> > the
> > >>> >     >> default
> > >>> >     >>>> slot
> > >>> >     >>>>>> sharing group with a default profile (will get a default
> > slot).
> > >>> >     >>>>>>    - All Operators with a Resource Profile will go into
> > >>> >     another slot
> > >>> >     >>>>> sharing
> > >>> >     >>>>>> group (the resource-specified-group).
> > >>> >     >>>>>>    - Users can define different slot sharing groups for
> > >>> >     operators
> > >>> >     >> like
> > >>> >     >>>>> they
> > >>> >     >>>>>> do now, with the exception that you cannot mix operators
> > >>> >     that have
> > >>> >     >> a
> > >>> >     >>>>>> resource profile and operators that have no resource
> > profile.
> > >>> >     >>>>>>    - The default case where no operator has a resource
> > >>> >     profile is
> > >>> >     >> just a
> > >>> >     >>>>>> special case of this model
> > >>> >     >>>>>>    - The chaining logic sums up the profiles per
> operator,
> > >>> >     like it
> > >>> >     >> does
> > >>> >     >>>>> now,
> > >>> >     >>>>>> and the scheduler sums up the profiles of the tasks that
> > it
> > >>> >     >> schedules
> > >>> >     >>>>>> together.
> > >>> >     >>>>>>
> > >>> >     >>>>>>
> > >>> >     >>>>>> There is another question about reactive scaling raised
> > in the
> > >>> >     >> FLIP. I
> > >>> >     >>>>> need
> > >>> >     >>>>>> to think a bit about that. That is indeed a bit more
> > tricky
> > >>> >     once we
> > >>> >     >>>> have
> > >>> >     >>>>>> slots of different sizes.
> > >>> >     >>>>>> It is not clear then which of the different slot
> requests
> > the
> > >>> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
> > >>> >     show up,
> > >>> >     >> or how
> > >>> >     >>>>> the
> > >>> >     >>>>>> JobManager redistributes the slots resources when
> > resources
> > >>> >     (TMs)
> > >>> >     >>>>> disappear
> > >>> >     >>>>>> This question is pretty orthogonal, though, to the "how
> to
> > >>> >     specify
> > >>> >     >> the
> > >>> >     >>>>>> resources".
> > >>> >     >>>>>>
> > >>> >     >>>>>>
> > >>> >     >>>>>> Best,
> > >>> >     >>>>>> Stephan
> > >>> >     >>>>>>
> > >>> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> > >>> >     <tonysong820@gmail.com <ma...@gmail.com>
> > >>> >     >>>>> wrote:
> > >>> >     >>>>>>> Thanks for drafting the FLIP and driving the
> discussion,
> > >>> >     Yangze.
> > >>> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> @Till,
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> I agree that specifying requirements for SSGs means
> that
> > SSGs
> > >>> >     >> need to
> > >>> >     >>>>> be
> > >>> >     >>>>>>> supported in fine-grained resource management,
> otherwise
> > each
> > >>> >     >>>> operator
> > >>> >     >>>>>>> might use as many resources as the whole group.
> However,
> > I
> > >>> >     cannot
> > >>> >     >>>> think
> > >>> >     >>>>>> of
> > >>> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
> > >>> >     resource
> > >>> >     >>>>>>> management.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>>
> > >>> >     >>>>>>>> Interestingly, if all operators have their resources
> > properly
> > >>> >     >>>>>> specified,
> > >>> >     >>>>>>>> then slot sharing is no longer needed because Flink
> > could
> > >>> >     >> slice off
> > >>> >     >>>>> the
> > >>> >     >>>>>>>> appropriately sized slots for every Task individually.
> > >>> >     >>>>>>>>
> > >>> >     >>>>>>> So for example, if we have a job consisting of two
> > >>> >     operator op_1
> > >>> >     >> and
> > >>> >     >>>>> op_2
> > >>> >     >>>>>>>> where each op needs 100 MB of memory, we would then
> say
> > that
> > >>> >     >> the
> > >>> >     >>>> slot
> > >>> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we
> have
> > a
> > >>> >     >> cluster
> > >>> >     >>>>> with
> > >>> >     >>>>>> 2
> > >>> >     >>>>>>>> TMs with one slot of 100 MB each, then the system
> > cannot run
> > >>> >     >> this
> > >>> >     >>>>> job.
> > >>> >     >>>>>> If
> > >>> >     >>>>>>>> the resources were specified on an operator level,
> then
> > the
> > >>> >     >> system
> > >>> >     >>>>>> could
> > >>> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and
> op_2
> > to
> > >>> >     >> TM_2.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> Couldn't agree more that if all operators' requirements
> > are
> > >>> >     >> properly
> > >>> >     >>>>>>> specified, slot sharing should be no longer needed. I
> > >>> >     think this
> > >>> >     >>>>> exactly
> > >>> >     >>>>>>> disproves the example. If we already know op_1 and op_2
> > each
> > >>> >     >> needs
> > >>> >     >>>> 100
> > >>> >     >>>>> MB
> > >>> >     >>>>>>> of memory, why would we put them in the same group? If
> > >>> >     they are
> > >>> >     >> in
> > >>> >     >>>>>> separate
> > >>> >     >>>>>>> groups, with the proposed approach the system can
> freely
> > >>> >     deploy
> > >>> >     >> them
> > >>> >     >>>> to
> > >>> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> Moreover, the precondition for not needing slot sharing
> > is
> > >>> >     having
> > >>> >     >>>>>> resource
> > >>> >     >>>>>>> requirements properly specified for all operators. This
> > is not
> > >>> >     >> always
> > >>> >     >>>>>>> possible, and usually requires tremendous efforts. One
> > of the
> > >>> >     >>>> benefits
> > >>> >     >>>>>> for
> > >>> >     >>>>>>> SSG-based requirements is that it allows the user to
> > freely
> > >>> >     >> decide
> > >>> >     >>>> the
> > >>> >     >>>>>>> granularity, thus efforts they want to pay. I would
> > >>> >     consider SSG
> > >>> >     >> in
> > >>> >     >>>>>>> fine-grained resource management as a group of
> operators
> > >>> >     that the
> > >>> >     >>>> user
> > >>> >     >>>>>>> would like to specify the total resource for. There can
> > be
> > >>> >     only
> > >>> >     >> one
> > >>> >     >>>>> group
> > >>> >     >>>>>>> in the job, 2~3 groups dividing the job into a few
> major
> > >>> >     parts,
> > >>> >     >> or as
> > >>> >     >>>>>> many
> > >>> >     >>>>>>> groups as the number of tasks/operators, depending on
> how
> > >>> >     >>>> fine-grained
> > >>> >     >>>>>> the
> > >>> >     >>>>>>> user is able to specify the resources.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> Having to support SSGs might be a constraint. But given
> > >>> >     that all
> > >>> >     >> the
> > >>> >     >>>>>>> current scheduler implementations already support
> SSGs, I
> > >>> >     tend to
> > >>> >     >>>> think
> > >>> >     >>>>>>> that as an acceptable price for the above discussed
> > >>> >     usability and
> > >>> >     >>>>>>> flexibility.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> @Chesnay
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> Will declaring them on slot sharing groups not also
> waste
> > >>> >     >> resources
> > >>> >     >>>> if
> > >>> >     >>>>>> the
> > >>> >     >>>>>>>> parallelism of operators within that group are
> > different?
> > >>> >     >>>>>>>>
> > >>> >     >>>>>>> Yes. It's a trade-off between usability and resource
> > >>> >     >> utilization. To
> > >>> >     >>>>>> avoid
> > >>> >     >>>>>>> such wasting, the user can define more groups, so that
> > >>> >     each group
> > >>> >     >>>>>> contains
> > >>> >     >>>>>>> less operators and the chance of having operators with
> > >>> >     different
> > >>> >     >>>>>>> parallelism will be reduced. The price is to have more
> > >>> >     resource
> > >>> >     >>>>>>> requirements to specify.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> It also seems like quite a hassle for users having to
> > >>> >     >> recalculate the
> > >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> > >>> >     >>>>>>>> I'd think that it's not really workable for users that
> > create
> > >>> >     >> a set
> > >>> >     >>>>> of
> > >>> >     >>>>>>>> re-usable operators which are mixed and matched in
> their
> > >>> >     >>>>> applications;
> > >>> >     >>>>>>>> managing the resources requirements in such a setting
> > >>> >     would be
> > >>> >     >> a
> > >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> > >>> >     >> requirements
> > >>> >     >>>>> any
> > >>> >     >>>>>>>> way.
> > >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> > increases
> > >>> >     >>>>> usability.
> > >>> >     >>>>>>>     - As mentioned in my reply to Till's comment,
> > there's no
> > >>> >     >> reason to
> > >>> >     >>>>> put
> > >>> >     >>>>>>>     multiple operators whose individual resource
> > >>> >     requirements are
> > >>> >     >>>>> already
> > >>> >     >>>>>>> known
> > >>> >     >>>>>>>     into the same group in fine-grained resource
> > management.
> > >>> >     >>>>>>>     - Even an operator implementation is reused for
> > multiple
> > >>> >     >>>>> applications,
> > >>> >     >>>>>>>     it does not guarantee the same resource
> requirements.
> > >>> >     During
> > >>> >     >> our
> > >>> >     >>>>> years
> > >>> >     >>>>>>> of
> > >>> >     >>>>>>>     practices in Alibaba, with per-operator
> requirements
> > >>> >     >> specified for
> > >>> >     >>>>>>> Blink's
> > >>> >     >>>>>>>     fine-grained resource management, very few users
> > >>> >     (including
> > >>> >     >> our
> > >>> >     >>>>>>> specialists
> > >>> >     >>>>>>>     who are dedicated to supporting Blink users) are as
> > >>> >     >> experienced as
> > >>> >     >>>>> to
> > >>> >     >>>>>>>     accurately predict/estimate the operator resource
> > >>> >     >> requirements.
> > >>> >     >>>> Most
> > >>> >     >>>>>>> people
> > >>> >     >>>>>>>     rely on the execution-time metrics (throughput,
> > delay, cpu
> > >>> >     >> load,
> > >>> >     >>>>>> memory
> > >>> >     >>>>>>>     usage, GC pressure, etc.) to improve the
> > specification.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> To sum up:
> > >>> >     >>>>>>> If the user is capable of providing proper resource
> > >>> >     requirements
> > >>> >     >> for
> > >>> >     >>>>>> every
> > >>> >     >>>>>>> operator, that's definitely a good thing and we would
> not
> > >>> >     need to
> > >>> >     >>>> rely
> > >>> >     >>>>> on
> > >>> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> > >>> >     >> fine-grained
> > >>> >     >>>>>> resource
> > >>> >     >>>>>>> management to work. For those users who are capable and
> > do not
> > >>> >     >> like
> > >>> >     >>>>>> having
> > >>> >     >>>>>>> to set each operator to a separate SSG, I would be ok
> to
> > have
> > >>> >     >> both
> > >>> >     >>>>>>> SSG-based and operator-based runtime interfaces and to
> > only
> > >>> >     >> fallback
> > >>> >     >>>> to
> > >>> >     >>>>>> the
> > >>> >     >>>>>>> SSG requirements when the operator requirements are not
> > >>> >     >> specified.
> > >>> >     >>>>>> However,
> > >>> >     >>>>>>> as the first step, I think we should prioritise the use
> > cases
> > >>> >     >> where
> > >>> >     >>>>> users
> > >>> >     >>>>>>> are not that experienced.
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> Thank you~
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> Xintong Song
> > >>> >     >>>>>>>
> > >>> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> > >>> >     >> chesnay@apache.org <ma...@apache.org>>
> > >>> >     >>>>>>> wrote:
> > >>> >     >>>>>>>
> > >>> >     >>>>>>>> Will declaring them on slot sharing groups not also
> > waste
> > >>> >     >> resources
> > >>> >     >>>>> if
> > >>> >     >>>>>>>> the parallelism of operators within that group are
> > different?
> > >>> >     >>>>>>>>
> > >>> >     >>>>>>>> It also seems like quite a hassle for users having to
> > >>> >     >> recalculate
> > >>> >     >>>> the
> > >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> > >>> >     >>>>>>>> I'd think that it's not really workable for users that
> > create
> > >>> >     >> a set
> > >>> >     >>>>> of
> > >>> >     >>>>>>>> re-usable operators which are mixed and matched in
> their
> > >>> >     >>>>> applications;
> > >>> >     >>>>>>>> managing the resources requirements in such a setting
> > >>> >     would be
> > >>> >     >> a
> > >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> > >>> >     >> requirements
> > >>> >     >>>>> any
> > >>> >     >>>>>>>> way.
> > >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> > increases
> > >>> >     >>>>> usability.
> > >>> >     >>>>>>>> My main worry is that it if we wire the runtime to
> work
> > >>> >     on SSGs
> > >>> >     >>>> it's
> > >>> >     >>>>>>>> gonna be difficult to implement more fine-grained
> > approaches,
> > >>> >     >> which
> > >>> >     >>>>>>>> would not be the case if, for the runtime, they are
> > always
> > >>> >     >> defined
> > >>> >     >>>> on
> > >>> >     >>>>>> an
> > >>> >     >>>>>>>> operator-level.
> > >>> >     >>>>>>>>
> > >>> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > >>> >     >>>>>>>>> Thanks for drafting this FLIP and starting this
> > discussion
> > >>> >     >>>> Yangze.
> > >>> >     >>>>>>>>> I like that defining resource requirements on a slot
> > sharing
> > >>> >     >>>> group
> > >>> >     >>>>>>> makes
> > >>> >     >>>>>>>>> the overall setup easier and improves usability of
> > resource
> > >>> >     >>>>>>> requirements.
> > >>> >     >>>>>>>>> What I do not like about it is that it changes slot
> > sharing
> > >>> >     >>>> groups
> > >>> >     >>>>>> from
> > >>> >     >>>>>>>>> being a scheduling hint to something which needs to
> be
> > >>> >     >> supported
> > >>> >     >>>> in
> > >>> >     >>>>>>> order
> > >>> >     >>>>>>>>> to support fine grained resource requirements. So
> far,
> > the
> > >>> >     >> idea
> > >>> >     >>>> of
> > >>> >     >>>>>> slot
> > >>> >     >>>>>>>>> sharing groups was that it tells the system that a
> set
> > of
> > >>> >     >>>> operators
> > >>> >     >>>>>> can
> > >>> >     >>>>>>>> be
> > >>> >     >>>>>>>>> deployed in the same slot. But the system still had
> the
> > >>> >     >> freedom
> > >>> >     >>>> to
> > >>> >     >>>>>> say
> > >>> >     >>>>>>>> that
> > >>> >     >>>>>>>>> it would rather place these tasks in different slots
> > if it
> > >>> >     >>>> wanted.
> > >>> >     >>>>> If
> > >>> >     >>>>>>> we
> > >>> >     >>>>>>>>> now specify resource requirements on a per slot
> sharing
> > >>> >     >> group,
> > >>> >     >>>> then
> > >>> >     >>>>>> the
> > >>> >     >>>>>>>>> only option for a scheduler which does not support
> slot
> > >>> >     >> sharing
> > >>> >     >>>>>> groups
> > >>> >     >>>>>>> is
> > >>> >     >>>>>>>>> to say that every operator in this slot sharing group
> > >>> >     needs a
> > >>> >     >>>> slot
> > >>> >     >>>>>> with
> > >>> >     >>>>>>>> the
> > >>> >     >>>>>>>>> same resources as the whole group.
> > >>> >     >>>>>>>>>
> > >>> >     >>>>>>>>> So for example, if we have a job consisting of two
> > operator
> > >>> >     >> op_1
> > >>> >     >>>>> and
> > >>> >     >>>>>>> op_2
> > >>> >     >>>>>>>>> where each op needs 100 MB of memory, we would then
> > say that
> > >>> >     >> the
> > >>> >     >>>>> slot
> > >>> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> > have a
> > >>> >     >> cluster
> > >>> >     >>>>>> with
> > >>> >     >>>>>>> 2
> > >>> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system
> > cannot run
> > >>> >     >> this
> > >>> >     >>>>>> job.
> > >>> >     >>>>>>> If
> > >>> >     >>>>>>>>> the resources were specified on an operator level,
> > then the
> > >>> >     >>>> system
> > >>> >     >>>>>>> could
> > >>> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> > op_2 to
> > >>> >     >> TM_2.
> > >>> >     >>>>>>>>> Originally, one of the primary goals of slot sharing
> > groups
> > >>> >     >> was
> > >>> >     >>>> to
> > >>> >     >>>>>> make
> > >>> >     >>>>>>>> it
> > >>> >     >>>>>>>>> easier for the user to reason about how many slots a
> > job
> > >>> >     >> needs
> > >>> >     >>>>>>>> independent
> > >>> >     >>>>>>>>> of the actual number of operators in the job.
> > Interestingly,
> > >>> >     >> if
> > >>> >     >>>> all
> > >>> >     >>>>>>>>> operators have their resources properly specified,
> > then slot
> > >>> >     >>>>> sharing
> > >>> >     >>>>>> is
> > >>> >     >>>>>>>> no
> > >>> >     >>>>>>>>> longer needed because Flink could slice off the
> > >>> >     appropriately
> > >>> >     >>>> sized
> > >>> >     >>>>>>> slots
> > >>> >     >>>>>>>>> for every Task individually. What matters is whether
> > the
> > >>> >     >> whole
> > >>> >     >>>>>> cluster
> > >>> >     >>>>>>>> has
> > >>> >     >>>>>>>>> enough resources to run all tasks or not.
> > >>> >     >>>>>>>>>
> > >>> >     >>>>>>>>> Cheers,
> > >>> >     >>>>>>>>> Till
> > >>> >     >>>>>>>>>
> > >>> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> > >>> >     >> karmagyz@gmail.com <ma...@gmail.com>>
> > >>> >     >>>>>> wrote:
> > >>> >     >>>>>>>>>> Hi, there,
> > >>> >     >>>>>>>>>>
> > >>> >     >>>>>>>>>> We would like to start a discussion thread on
> > "FLIP-156:
> > >>> >     >> Runtime
> > >>> >     >>>>>>>>>> Interfaces for Fine-Grained Resource
> Requirements"[1],
> > >>> >     >> where we
> > >>> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> > interfaces
> > >>> >     >> for
> > >>> >     >>>>>>>>>> specifying fine-grained resource requirements.
> > >>> >     >>>>>>>>>>
> > >>> >     >>>>>>>>>> In this FLIP:
> > >>> >     >>>>>>>>>> - Expound the user story of fine-grained resource
> > >>> >     >> management.
> > >>> >     >>>>>>>>>> - Propose runtime interfaces for specifying
> SSG-based
> > >>> >     >> resource
> > >>> >     >>>>>>>>>> requirements.
> > >>> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
> > >>> >     >> granularities
> > >>> >     >>>>> for
> > >>> >     >>>>>>>>>> specifying the resource requirements (op, task and
> > slot
> > >>> >     >> sharing
> > >>> >     >>>>>> group)
> > >>> >     >>>>>>>>>> and explain why we choose the slot sharing group.
> > >>> >     >>>>>>>>>>
> > >>> >     >>>>>>>>>> Please find more details in the FLIP wiki document
> > [1].
> > >>> >     >> Looking
> > >>> >     >>>>>>>>>> forward to your feedback.
> > >>> >     >>>>>>>>>>
> > >>> >     >>>>>>>>>> [1]
> > >>> >     >>>>>>>>>>
> > >>> >     >>
> > >>> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > >>> >     <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > >
> > >>> >     >>>>>>>>>> Best,
> > >>> >     >>>>>>>>>> Yangze Guo
> > >>> >     >>>>>>>>>>
> > >>> >     >>>>>>>>
> > >>> >
> > >>>
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

Thanks for the summary, Yangze.

The changes and follow-up issues LGTM. Let's wait for responses from the
others before starting a vote.

Thank you~

Xintong Song



On Tue, Jan 26, 2021 at 11:08 AM Yangze Guo <ka...@gmail.com> wrote:

> Thanks everyone for the lively discussion. I'd like to try to
> summarize the current convergence in the discussion. Please let me
> know if I got things wrong or missed something crucial here.
>
> Change of this FLIP:
> - Treat the SSG resource requirements as a hint instead of a
> restriction for the runtime. That's should be explicitly explained in
> the JavaDocs.
>
> Potential follow-up issues if needed:
> - Provide operator-level resource configuration interface.
> - Provide multiple options for deciding resources for SSGs whose
> requirement is not specified:
>     ** Default slot resource.
>     ** Default operator resource times number of operators.
>
> If there are no other issues, I'll update the FLIP accordingly and
> start a vote thread. Thanks all for the valuable feedback again.
>
> Best,
> Yangze Guo
>
> Best,
> Yangze Guo
>
>
> On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> >
> >  FGRuntimeInterface.png
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <to...@gmail.com>
> wrote:
> >>
> >> I think Chesnay's proposal could actually work. IIUC, the keypoint is
> to derive operator requirements from SSG requirements on the API side, so
> that the runtime only deals with operator requirements. It's debatable how
> the deriving should be done though. E.g., an alternative could be to evenly
> divide the SSG requirement into requirements of operators in the group.
> >>
> >>
> >> However, I'm not entirely sure which option is more desired.
> Illustrating my understanding in the following figure, in which on the top
> is Chesnay's proposal and on the bottom is the SSG-based proposal in this
> FLIP.
> >>
> >>
> >>
> >> I think the major difference between the two approaches is where
> deriving operator requirements from SSG requirements happens.
> >>
> >> - Chesnay's proposal simplifies the runtime logic and the interface to
> expose, at the price of moving more complexity (i.e. the deriving) to the
> API side. The question is, where do we prefer to keep the complexity? I'm
> slightly leaning towards having a thin API and keep the complexity in
> runtime if possible.
> >>
> >> - Notice that the dash line arrows represent optional steps that are
> needed only for schedulers that do not respect SSGs, which we don't have at
> the moment. If we only look at the solid line arrows, then the SSG-based
> approach is much simpler, without needing to derive and aggregate the
> requirements back and forth. I'm not sure about complicating the current
> design only for the potential future needs.
> >>
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >>
> >> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ch...@apache.org>
> wrote:
> >>>
> >>> You're raising a good point, but I think I can rectify that with a
> minor
> >>> adjustment.
> >>>
> >>> Default requirements are whatever the default requirements are, setting
> >>> the requirements for one operator has no effect on other operators.
> >>>
> >>> With these rules, and some API enhancements, the following mockup would
> >>> replicate the SSG-based behavior:
> >>>
> >>> Map<SlotSharingGroupId, Requirements> requirements = ...
> >>> for slotSharingGroup in env.getSlotSharingGroups() {
> >>>      vertices = slotSharingGroup.getVertices()
> >>>
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> >>> vertices.remainint().setRequirements(ZERO)
> >>> }
> >>>
> >>> We could even allow setting requirements on slotsharing-groups
> >>> colocation-groups and internally translate them accordingly.
> >>> I can't help but feel this is a plain API issue.
> >>>
> >>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> >>> > If I understand you correctly Chesnay, then you want to decouple the
> >>> > resource requirement specification from the slot sharing group
> >>> > assignment. Hence, per default all operators would be in the same
> slot
> >>> > sharing group. If there is no operator with a resource specification,
> >>> > then the system would allocate a default slot for it. If there is at
> >>> > least one operator, then the system would sum up all the specified
> >>> > resources and allocate a slot of this size. This effectively means
> >>> > that all unspecified operators will implicitly have a zero resource
> >>> > requirement. Did I understand your idea correctly?
> >>> >
> >>> > I am wondering whether this wouldn't lead to a surprising behaviour
> >>> > for the user. If the user specifies the resource requirements for a
> >>> > single operator, then he probably will assume that the other
> operators
> >>> > will get the default share of resources and not nothing.
> >>> >
> >>> > Cheers,
> >>> > Till
> >>> >
> >>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <chesnay@apache.org
> >>> > <ma...@apache.org>> wrote:
> >>> >
> >>> >     Is there even a functional difference between specifying the
> >>> >     requirements for an SSG vs specifying the same requirements on a
> >>> >     single
> >>> >     operator within that group (ideally a colocation group to avoid
> this
> >>> >     whole hint business)?
> >>> >
> >>> >     Wouldn't we get the best of both worlds in the latter case?
> >>> >
> >>> >     Users can take shortcuts to define shared requirements,
> >>> >     but refine them further as needed on a per-operator basis,
> >>> >     without changing semantics of slotsharing groups
> >>> >     nor the runtime being locked into SSG-based requirements.
> >>> >
> >>> >     (And before anyone argues what happens if slotsharing groups
> >>> >     change or
> >>> >     whatnot, that's a plain API issue that we could surely solve. (A
> >>> >     plain
> >>> >     iteration over slotsharing groups and therein contained operators
> >>> >     would
> >>> >     suffice)).
> >>> >
> >>> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> >>> >     > Maybe a different minor idea: Would it be possible to treat
> the SSG
> >>> >     > resource requirements as a hint for the runtime similar to how
> >>> >     slot sharing
> >>> >     > groups are designed at the moment? Meaning that we don't give
> >>> >     the guarantee
> >>> >     > that Flink will always deploy this set of tasks together no
> >>> >     matter what
> >>> >     > comes. If, for example, the runtime can derive by some means
> the
> >>> >     resource
> >>> >     > requirements for each task based on the requirements for the
> >>> >     SSG, this
> >>> >     > could be possible. One easy strategy would be to give every
> task
> >>> >     the same
> >>> >     > resources as the whole slot sharing group. Another one could be
> >>> >     > distributing the resources equally among the tasks. This does
> >>> >     not even have
> >>> >     > to be implemented but we would give ourselves the freedom to
> change
> >>> >     > scheduling if need should arise.
> >>> >     >
> >>> >     > Cheers,
> >>> >     > Till
> >>> >     >
> >>> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karmagyz@gmail.com
> >>> >     <ma...@gmail.com>> wrote:
> >>> >     >
> >>> >     >> Thanks for the responses, Till and Xintong.
> >>> >     >>
> >>> >     >> I second Xintong's comment that SSG-based runtime interface
> >>> >     will give
> >>> >     >> us the flexibility to achieve op/task-based approach. That's
> one of
> >>> >     >> the most important reasons for our design choice.
> >>> >     >>
> >>> >     >> Some cents regarding the default operator resource:
> >>> >     >> - It might be good for the scenario of DataStream jobs.
> >>> >     >>     ** For light-weight operators, the accumulative
> >>> >     configuration error
> >>> >     >> will not be significant. Then, the resource of a task used is
> >>> >     >> proportional to the number of operators it contains.
> >>> >     >>     ** For heavy operators like join and window or operators
> >>> >     using the
> >>> >     >> external resources, user will turn to the fine-grained
> resource
> >>> >     >> configuration.
> >>> >     >> - It can increase the stability for the standalone cluster
> >>> >     where task
> >>> >     >> executors registered are heterogeneous(with different default
> slot
> >>> >     >> resources).
> >>> >     >> - It might not be good for SQL users. The operators that SQL
> >>> >     will be
> >>> >     >> transferred to is a black box to the user. We also do not
> guarantee
> >>> >     >> the cross-version of consistency of the transformation so far.
> >>> >     >>
> >>> >     >> I think it can be treated as a follow-up work when the
> fine-grained
> >>> >     >> resource management is end-to-end ready.
> >>> >     >>
> >>> >     >> Best,
> >>> >     >> Yangze Guo
> >>> >     >>
> >>> >     >>
> >>> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> >>> >     <tonysong820@gmail.com <ma...@gmail.com>>
> >>> >     >> wrote:
> >>> >     >>> Thanks for the feedback, Till.
> >>> >     >>>
> >>> >     >>> ## I feel that what you proposed (operator-based + default
> >>> >     value) might
> >>> >     >> be
> >>> >     >>> subsumed by the SSG-based approach.
> >>> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> >>> >     categorized by
> >>> >     >>> whether the resource requirements are known to the users.
> >>> >     >>>
> >>> >     >>>     1. *Both known.* As previously mentioned, there's no
> >>> >     reason to put
> >>> >     >>>     multiple operators whose individual resource requirements
> >>> >     are already
> >>> >     >> known
> >>> >     >>>     into the same group in fine-grained resource management.
> >>> >     And if op_1
> >>> >     >> and
> >>> >     >>>     op_2 are in different groups, there should be no problem
> >>> >     switching
> >>> >     >> data
> >>> >     >>>     exchange mode from pipelined to blocking. This is
> >>> >     equivalent to
> >>> >     >> specifying
> >>> >     >>>     operator resource requirements in your proposal.
> >>> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that
> >>> >     op_2 is in a
> >>> >     >>>     SSG whose resource is not specified thus would have the
> >>> >     default slot
> >>> >     >>>     resource. This is equivalent to having default operator
> >>> >     resources in
> >>> >     >> your
> >>> >     >>>     proposal.
> >>> >     >>>     3. *Both unknown*. The user can either set op_1 and op_2
> >>> >     to the same
> >>> >     >> SSG
> >>> >     >>>     or separate SSGs.
> >>> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
> >>> >     equivalent to
> >>> >     >> the
> >>> >     >>>        coarse-grained resource management, where op_1 and
> op_2
> >>> >     share a
> >>> >     >> default
> >>> >     >>>        size slot no matter which data exchange mode is used.
> >>> >     >>>        - If op_1 and op_2 are in different SSGs, then each of
> >>> >     them will
> >>> >     >> use
> >>> >     >>>        a default size slot. This is equivalent to setting
> them
> >>> >     with
> >>> >     >> default
> >>> >     >>>        operator resources in your proposal.
> >>> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2
> is
> >>> >     known.*
> >>> >     >>>        - It is possible that the user learns the total / max
> >>> >     resource
> >>> >     >>>        requirement from executing and monitoring the job,
> >>> >     while not
> >>> >     >>> being aware of
> >>> >     >>>        individual operator requirements.
> >>> >     >>>        - I believe this is the case your proposal does not
> >>> >     cover. And TBH,
> >>> >     >>>        this is probably how most users learn the resource
> >>> >     requirements,
> >>> >     >>> according
> >>> >     >>>        to my experiences.
> >>> >     >>>        - In this case, the user might need to specify
> >>> >     different resources
> >>> >     >> if
> >>> >     >>>        he wants to switch the execution mode, which should
> not
> >>> >     be worse
> >>> >     >> than not
> >>> >     >>>        being able to use fine-grained resource management.
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> ## An additional idea inspired by your proposal.
> >>> >     >>> We may provide multiple options for deciding resources for
> >>> >     SSGs whose
> >>> >     >>> requirement is not specified, if needed.
> >>> >     >>>
> >>> >     >>>     - Default slot resource (current design)
> >>> >     >>>     - Default operator resource times number of operators
> >>> >     (equivalent to
> >>> >     >>>     your proposal)
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> ## Exposing internal runtime strategies
> >>> >     >>> Theoretically, yes. Tying to the SSGs, the resource
> >>> >     requirements might be
> >>> >     >>> affected if how SSGs are internally handled changes in
> future.
> >>> >     >> Practically,
> >>> >     >>> I do not concretely see at the moment what kind of changes we
> >>> >     may want in
> >>> >     >>> future that might conflict with this FLIP proposal, as the
> >>> >     question of
> >>> >     >>> switching data exchange mode answered above. I'd suggest to
> >>> >     not give up
> >>> >     >> the
> >>> >     >>> user friendliness we may gain now for the future problems
> that
> >>> >     may or may
> >>> >     >>> not exist.
> >>> >     >>>
> >>> >     >>> Moreover, the SSG-based approach has the flexibility to
> >>> >     achieve the
> >>> >     >>> equivalent behavior as the operator-based approach, if we
> set each
> >>> >     >> operator
> >>> >     >>> (or task) to a separate SSG. We can even provide a shortcut
> >>> >     option to
> >>> >     >>> automatically do that for users, if needed.
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> Thank you~
> >>> >     >>>
> >>> >     >>> Xintong Song
> >>> >     >>>
> >>> >     >>>
> >>> >     >>>
> >>> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> >>> >     <trohrmann@apache.org <ma...@apache.org>>
> >>> >     >> wrote:
> >>> >     >>>> Thanks for the responses Xintong and Stephan,
> >>> >     >>>>
> >>> >     >>>> I agree that being able to define the resource requirements
> for a
> >>> >     >> group of
> >>> >     >>>> operators is more user friendly. However, my concern is that
> >>> >     we are
> >>> >     >>>> exposing thereby internal runtime strategies which might
> >>> >     limit our
> >>> >     >>>> flexibility to execute a given job. Moreover, the semantics
> of
> >>> >     >> configuring
> >>> >     >>>> resource requirements for SSGs could break if switching from
> >>> >     streaming
> >>> >     >> to
> >>> >     >>>> batch execution. If one defines the resource requirements
> for
> >>> >     op_1 ->
> >>> >     >> op_2
> >>> >     >>>> which run in pipelined mode when using the streaming
> >>> >     execution, then
> >>> >     >> how do
> >>> >     >>>> we interpret these requirements when op_1 -> op_2 are
> >>> >     executed with a
> >>> >     >>>> blocking data exchange in batch execution mode?
> Consequently,
> >>> >     I am
> >>> >     >> still
> >>> >     >>>> leaning towards Stephan's proposal to set the resource
> >>> >     requirements per
> >>> >     >>>> operator.
> >>> >     >>>>
> >>> >     >>>> Maybe the following proposal makes the configuration easier:
> >>> >     If the
> >>> >     >> user
> >>> >     >>>> wants to use fine-grained resource requirements, then she
> >>> >     needs to
> >>> >     >> specify
> >>> >     >>>> the default size which is used for operators which have no
> >>> >     explicit
> >>> >     >>>> resource annotation. If this holds true, then every operator
> >>> >     would
> >>> >     >> have a
> >>> >     >>>> resource requirement and the system can try to execute the
> >>> >     operators
> >>> >     >> in the
> >>> >     >>>> best possible manner w/o being constrained by how the user
> >>> >     set the SSG
> >>> >     >>>> requirements.
> >>> >     >>>>
> >>> >     >>>> Cheers,
> >>> >     >>>> Till
> >>> >     >>>>
> >>> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> >>> >     <tonysong820@gmail.com <ma...@gmail.com>>
> >>> >     >>>> wrote:
> >>> >     >>>>
> >>> >     >>>>> Thanks for the feedback, Stephan.
> >>> >     >>>>>
> >>> >     >>>>> Actually, your proposal has also come to my mind at some
> >>> >     point. And I
> >>> >     >>>> have
> >>> >     >>>>> some concerns about it.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> 1. It does not give users the same control as the SSG-based
> >>> >     approach.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> While both approaches do not require specifying for each
> >>> >     operator,
> >>> >     >>>>> SSG-based approach supports the semantic that "some
> operators
> >>> >     >> together
> >>> >     >>>> use
> >>> >     >>>>> this much resource" while the operator-based approach
> doesn't.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> >>> >     o_m), and
> >>> >     >> at
> >>> >     >>>> some
> >>> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
> >>> >     reduces the
> >>> >     >> data
> >>> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> >>> >     (o_1, ...,
> >>> >     >> o_n)
> >>> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> >>> >     >> parallelisms
> >>> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> >>> >     lead to too
> >>> >     >> much
> >>> >     >>>>> wasting of resources. If the two SSGs end up needing
> different
> >>> >     >> resources,
> >>> >     >>>>> with the SSG-based approach one can directly specify
> >>> >     resources for
> >>> >     >> the
> >>> >     >>>> two
> >>> >     >>>>> groups. However, with the operator-based approach, the
> user will
> >>> >     >> have to
> >>> >     >>>>> specify resources for each operator in one of the two
> >>> >     groups, and
> >>> >     >> tune
> >>> >     >>>> the
> >>> >     >>>>> default slot resource via configurations to fit the other
> group.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> 2. It increases the chance of breaking operator chains.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Setting chainnable operators into different slot sharing
> >>> >     groups will
> >>> >     >>>>> prevent them from being chained. In the current
> implementation,
> >>> >     >>>> downstream
> >>> >     >>>>> operators, if SSG not explicitly specified, will be set to
> >>> >     the same
> >>> >     >> group
> >>> >     >>>>> as the chainable upstream operators (unless multiple
> upstream
> >>> >     >> operators
> >>> >     >>>> in
> >>> >     >>>>> different groups), to reduce the chance of breaking chains.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> >>> >     deciding
> >>> >     >> SSGs
> >>> >     >>>>> based on whether resource is specified we will easily get
> >>> >     groups like
> >>> >     >>>> (o_1,
> >>> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
> >>> >     chained. This
> >>> >     >> is
> >>> >     >>>> also
> >>> >     >>>>> possible for the SSG-based approach, but I believe the
> >>> >     chance is much
> >>> >     >>>>> smaller because there's no strong reason for users to
> >>> >     specify the
> >>> >     >> groups
> >>> >     >>>>> with alternate operators like that. We are more likely to
> >>> >     get groups
> >>> >     >> like
> >>> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only
> between
> >>> >     o_2 and
> >>> >     >> o_3.
> >>> >     >>>>>
> >>> >     >>>>> 3. It complicates the system by having two different
> >>> >     mechanisms for
> >>> >     >>>> sharing
> >>> >     >>>>> managed memory in  a slot.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> - In FLIP-141, we introduced the intra-slot managed memory
> >>> >     sharing
> >>> >     >>>>> mechanism, where managed memory is first distributed
> >>> >     according to the
> >>> >     >>>>> consumer type, then further distributed across operators
> of that
> >>> >     >> consumer
> >>> >     >>>>> type.
> >>> >     >>>>>
> >>> >     >>>>> - With the operator-based approach, managed memory size
> >>> >     specified
> >>> >     >> for an
> >>> >     >>>>> operator should account for all the consumer types of that
> >>> >     operator.
> >>> >     >> That
> >>> >     >>>>> means the managed memory is first distributed across
> >>> >     operators, then
> >>> >     >>>>> distributed to different consumer types of each operator.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Unfortunately, the different order of the two calculation
> >>> >     steps can
> >>> >     >> lead
> >>> >     >>>> to
> >>> >     >>>>> different results. To be specific, the semantic of the
> >>> >     configuration
> >>> >     >>>> option
> >>> >     >>>>> `consumer-weights` changed (within a slot vs. within an
> >>> >     operator).
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> To sum up things:
> >>> >     >>>>>
> >>> >     >>>>> While (3) might be a bit more implementation related, I
> >>> >     think (1)
> >>> >     >> and (2)
> >>> >     >>>>> somehow suggest that, the price for the proposed approach
> to
> >>> >     avoid
> >>> >     >>>>> specifying resource for every operator is that it's not as
> >>> >     >> independent
> >>> >     >>>> from
> >>> >     >>>>> operator chaining and slot sharing as the operator-based
> >>> >     approach
> >>> >     >>>> discussed
> >>> >     >>>>> in the FLIP.
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> Thank you~
> >>> >     >>>>>
> >>> >     >>>>> Xintong Song
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>>
> >>> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> >>> >     <sewen@apache.org <ma...@apache.org>>
> >>> >     >> wrote:
> >>> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> >>> >     >>>>>>
> >>> >     >>>>>> I want to say, first of all, that this is super well
> >>> >     written. And
> >>> >     >> the
> >>> >     >>>>>> points that the FLIP makes about how to expose the
> >>> >     configuration to
> >>> >     >>>> users
> >>> >     >>>>>> is exactly the right thing to figure out first.
> >>> >     >>>>>> So good job here!
> >>> >     >>>>>>
> >>> >     >>>>>> About how to let users specify the resource profiles. If I
> >>> >     can sum
> >>> >     >> the
> >>> >     >>>>> FLIP
> >>> >     >>>>>> and previous discussion up in my own words, the problem
> is the
> >>> >     >>>> following:
> >>> >     >>>>>> Operator-level specification is the simplest and cleanest
> >>> >     approach,
> >>> >     >>>>> because
> >>> >     >>>>>>> it avoids mixing operator configuration (resource) and
> >>> >     >> scheduling. No
> >>> >     >>>>>>> matter what other parameters change (chaining, slot
> sharing,
> >>> >     >>>> switching
> >>> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
> >>> >     stay the
> >>> >     >>>> same.
> >>> >     >>>>>>> But it would require that a user specifies resources on
> all
> >>> >     >>>> operators,
> >>> >     >>>>>>> which makes it hard to use. That's why the FLIP suggests
> going
> >>> >     >> with
> >>> >     >>>>>>> specifying resources on a Sharing-Group.
> >>> >     >>>>>>
> >>> >     >>>>>> I think both thoughts are important, so can we find a
> solution
> >>> >     >> where
> >>> >     >>>> the
> >>> >     >>>>>> Resource Profiles are specified on an Operator, but we
> >>> >     still avoid
> >>> >     >> that
> >>> >     >>>>> we
> >>> >     >>>>>> need to specify a resource profile on every operator?
> >>> >     >>>>>>
> >>> >     >>>>>> What do you think about something like the following:
> >>> >     >>>>>>    - Resource Profiles are specified on an operator level.
> >>> >     >>>>>>    - Not all operators need profiles
> >>> >     >>>>>>    - All Operators without a Resource Profile ended up in
> the
> >>> >     >> default
> >>> >     >>>> slot
> >>> >     >>>>>> sharing group with a default profile (will get a default
> slot).
> >>> >     >>>>>>    - All Operators with a Resource Profile will go into
> >>> >     another slot
> >>> >     >>>>> sharing
> >>> >     >>>>>> group (the resource-specified-group).
> >>> >     >>>>>>    - Users can define different slot sharing groups for
> >>> >     operators
> >>> >     >> like
> >>> >     >>>>> they
> >>> >     >>>>>> do now, with the exception that you cannot mix operators
> >>> >     that have
> >>> >     >> a
> >>> >     >>>>>> resource profile and operators that have no resource
> profile.
> >>> >     >>>>>>    - The default case where no operator has a resource
> >>> >     profile is
> >>> >     >> just a
> >>> >     >>>>>> special case of this model
> >>> >     >>>>>>    - The chaining logic sums up the profiles per operator,
> >>> >     like it
> >>> >     >> does
> >>> >     >>>>> now,
> >>> >     >>>>>> and the scheduler sums up the profiles of the tasks that
> it
> >>> >     >> schedules
> >>> >     >>>>>> together.
> >>> >     >>>>>>
> >>> >     >>>>>>
> >>> >     >>>>>> There is another question about reactive scaling raised
> in the
> >>> >     >> FLIP. I
> >>> >     >>>>> need
> >>> >     >>>>>> to think a bit about that. That is indeed a bit more
> tricky
> >>> >     once we
> >>> >     >>>> have
> >>> >     >>>>>> slots of different sizes.
> >>> >     >>>>>> It is not clear then which of the different slot requests
> the
> >>> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
> >>> >     show up,
> >>> >     >> or how
> >>> >     >>>>> the
> >>> >     >>>>>> JobManager redistributes the slots resources when
> resources
> >>> >     (TMs)
> >>> >     >>>>> disappear
> >>> >     >>>>>> This question is pretty orthogonal, though, to the "how to
> >>> >     specify
> >>> >     >> the
> >>> >     >>>>>> resources".
> >>> >     >>>>>>
> >>> >     >>>>>>
> >>> >     >>>>>> Best,
> >>> >     >>>>>> Stephan
> >>> >     >>>>>>
> >>> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> >>> >     <tonysong820@gmail.com <ma...@gmail.com>
> >>> >     >>>>> wrote:
> >>> >     >>>>>>> Thanks for drafting the FLIP and driving the discussion,
> >>> >     Yangze.
> >>> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
> >>> >     >>>>>>>
> >>> >     >>>>>>> @Till,
> >>> >     >>>>>>>
> >>> >     >>>>>>> I agree that specifying requirements for SSGs means that
> SSGs
> >>> >     >> need to
> >>> >     >>>>> be
> >>> >     >>>>>>> supported in fine-grained resource management, otherwise
> each
> >>> >     >>>> operator
> >>> >     >>>>>>> might use as many resources as the whole group. However,
> I
> >>> >     cannot
> >>> >     >>>> think
> >>> >     >>>>>> of
> >>> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
> >>> >     resource
> >>> >     >>>>>>> management.
> >>> >     >>>>>>>
> >>> >     >>>>>>>
> >>> >     >>>>>>>> Interestingly, if all operators have their resources
> properly
> >>> >     >>>>>> specified,
> >>> >     >>>>>>>> then slot sharing is no longer needed because Flink
> could
> >>> >     >> slice off
> >>> >     >>>>> the
> >>> >     >>>>>>>> appropriately sized slots for every Task individually.
> >>> >     >>>>>>>>
> >>> >     >>>>>>> So for example, if we have a job consisting of two
> >>> >     operator op_1
> >>> >     >> and
> >>> >     >>>>> op_2
> >>> >     >>>>>>>> where each op needs 100 MB of memory, we would then say
> that
> >>> >     >> the
> >>> >     >>>> slot
> >>> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we have
> a
> >>> >     >> cluster
> >>> >     >>>>> with
> >>> >     >>>>>> 2
> >>> >     >>>>>>>> TMs with one slot of 100 MB each, then the system
> cannot run
> >>> >     >> this
> >>> >     >>>>> job.
> >>> >     >>>>>> If
> >>> >     >>>>>>>> the resources were specified on an operator level, then
> the
> >>> >     >> system
> >>> >     >>>>>> could
> >>> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2
> to
> >>> >     >> TM_2.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Couldn't agree more that if all operators' requirements
> are
> >>> >     >> properly
> >>> >     >>>>>>> specified, slot sharing should be no longer needed. I
> >>> >     think this
> >>> >     >>>>> exactly
> >>> >     >>>>>>> disproves the example. If we already know op_1 and op_2
> each
> >>> >     >> needs
> >>> >     >>>> 100
> >>> >     >>>>> MB
> >>> >     >>>>>>> of memory, why would we put them in the same group? If
> >>> >     they are
> >>> >     >> in
> >>> >     >>>>>> separate
> >>> >     >>>>>>> groups, with the proposed approach the system can freely
> >>> >     deploy
> >>> >     >> them
> >>> >     >>>> to
> >>> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Moreover, the precondition for not needing slot sharing
> is
> >>> >     having
> >>> >     >>>>>> resource
> >>> >     >>>>>>> requirements properly specified for all operators. This
> is not
> >>> >     >> always
> >>> >     >>>>>>> possible, and usually requires tremendous efforts. One
> of the
> >>> >     >>>> benefits
> >>> >     >>>>>> for
> >>> >     >>>>>>> SSG-based requirements is that it allows the user to
> freely
> >>> >     >> decide
> >>> >     >>>> the
> >>> >     >>>>>>> granularity, thus efforts they want to pay. I would
> >>> >     consider SSG
> >>> >     >> in
> >>> >     >>>>>>> fine-grained resource management as a group of operators
> >>> >     that the
> >>> >     >>>> user
> >>> >     >>>>>>> would like to specify the total resource for. There can
> be
> >>> >     only
> >>> >     >> one
> >>> >     >>>>> group
> >>> >     >>>>>>> in the job, 2~3 groups dividing the job into a few major
> >>> >     parts,
> >>> >     >> or as
> >>> >     >>>>>> many
> >>> >     >>>>>>> groups as the number of tasks/operators, depending on how
> >>> >     >>>> fine-grained
> >>> >     >>>>>> the
> >>> >     >>>>>>> user is able to specify the resources.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Having to support SSGs might be a constraint. But given
> >>> >     that all
> >>> >     >> the
> >>> >     >>>>>>> current scheduler implementations already support SSGs, I
> >>> >     tend to
> >>> >     >>>> think
> >>> >     >>>>>>> that as an acceptable price for the above discussed
> >>> >     usability and
> >>> >     >>>>>>> flexibility.
> >>> >     >>>>>>>
> >>> >     >>>>>>> @Chesnay
> >>> >     >>>>>>>
> >>> >     >>>>>>> Will declaring them on slot sharing groups not also waste
> >>> >     >> resources
> >>> >     >>>> if
> >>> >     >>>>>> the
> >>> >     >>>>>>>> parallelism of operators within that group are
> different?
> >>> >     >>>>>>>>
> >>> >     >>>>>>> Yes. It's a trade-off between usability and resource
> >>> >     >> utilization. To
> >>> >     >>>>>> avoid
> >>> >     >>>>>>> such wasting, the user can define more groups, so that
> >>> >     each group
> >>> >     >>>>>> contains
> >>> >     >>>>>>> less operators and the chance of having operators with
> >>> >     different
> >>> >     >>>>>>> parallelism will be reduced. The price is to have more
> >>> >     resource
> >>> >     >>>>>>> requirements to specify.
> >>> >     >>>>>>>
> >>> >     >>>>>>> It also seems like quite a hassle for users having to
> >>> >     >> recalculate the
> >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> >>> >     >>>>>>>> I'd think that it's not really workable for users that
> create
> >>> >     >> a set
> >>> >     >>>>> of
> >>> >     >>>>>>>> re-usable operators which are mixed and matched in their
> >>> >     >>>>> applications;
> >>> >     >>>>>>>> managing the resources requirements in such a setting
> >>> >     would be
> >>> >     >> a
> >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> >>> >     >> requirements
> >>> >     >>>>> any
> >>> >     >>>>>>>> way.
> >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> increases
> >>> >     >>>>> usability.
> >>> >     >>>>>>>     - As mentioned in my reply to Till's comment,
> there's no
> >>> >     >> reason to
> >>> >     >>>>> put
> >>> >     >>>>>>>     multiple operators whose individual resource
> >>> >     requirements are
> >>> >     >>>>> already
> >>> >     >>>>>>> known
> >>> >     >>>>>>>     into the same group in fine-grained resource
> management.
> >>> >     >>>>>>>     - Even an operator implementation is reused for
> multiple
> >>> >     >>>>> applications,
> >>> >     >>>>>>>     it does not guarantee the same resource requirements.
> >>> >     During
> >>> >     >> our
> >>> >     >>>>> years
> >>> >     >>>>>>> of
> >>> >     >>>>>>>     practices in Alibaba, with per-operator requirements
> >>> >     >> specified for
> >>> >     >>>>>>> Blink's
> >>> >     >>>>>>>     fine-grained resource management, very few users
> >>> >     (including
> >>> >     >> our
> >>> >     >>>>>>> specialists
> >>> >     >>>>>>>     who are dedicated to supporting Blink users) are as
> >>> >     >> experienced as
> >>> >     >>>>> to
> >>> >     >>>>>>>     accurately predict/estimate the operator resource
> >>> >     >> requirements.
> >>> >     >>>> Most
> >>> >     >>>>>>> people
> >>> >     >>>>>>>     rely on the execution-time metrics (throughput,
> delay, cpu
> >>> >     >> load,
> >>> >     >>>>>> memory
> >>> >     >>>>>>>     usage, GC pressure, etc.) to improve the
> specification.
> >>> >     >>>>>>>
> >>> >     >>>>>>> To sum up:
> >>> >     >>>>>>> If the user is capable of providing proper resource
> >>> >     requirements
> >>> >     >> for
> >>> >     >>>>>> every
> >>> >     >>>>>>> operator, that's definitely a good thing and we would not
> >>> >     need to
> >>> >     >>>> rely
> >>> >     >>>>> on
> >>> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> >>> >     >> fine-grained
> >>> >     >>>>>> resource
> >>> >     >>>>>>> management to work. For those users who are capable and
> do not
> >>> >     >> like
> >>> >     >>>>>> having
> >>> >     >>>>>>> to set each operator to a separate SSG, I would be ok to
> have
> >>> >     >> both
> >>> >     >>>>>>> SSG-based and operator-based runtime interfaces and to
> only
> >>> >     >> fallback
> >>> >     >>>> to
> >>> >     >>>>>> the
> >>> >     >>>>>>> SSG requirements when the operator requirements are not
> >>> >     >> specified.
> >>> >     >>>>>> However,
> >>> >     >>>>>>> as the first step, I think we should prioritise the use
> cases
> >>> >     >> where
> >>> >     >>>>> users
> >>> >     >>>>>>> are not that experienced.
> >>> >     >>>>>>>
> >>> >     >>>>>>> Thank you~
> >>> >     >>>>>>>
> >>> >     >>>>>>> Xintong Song
> >>> >     >>>>>>>
> >>> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> >>> >     >> chesnay@apache.org <ma...@apache.org>>
> >>> >     >>>>>>> wrote:
> >>> >     >>>>>>>
> >>> >     >>>>>>>> Will declaring them on slot sharing groups not also
> waste
> >>> >     >> resources
> >>> >     >>>>> if
> >>> >     >>>>>>>> the parallelism of operators within that group are
> different?
> >>> >     >>>>>>>>
> >>> >     >>>>>>>> It also seems like quite a hassle for users having to
> >>> >     >> recalculate
> >>> >     >>>> the
> >>> >     >>>>>>>> resource requirements if they change the slot sharing.
> >>> >     >>>>>>>> I'd think that it's not really workable for users that
> create
> >>> >     >> a set
> >>> >     >>>>> of
> >>> >     >>>>>>>> re-usable operators which are mixed and matched in their
> >>> >     >>>>> applications;
> >>> >     >>>>>>>> managing the resources requirements in such a setting
> >>> >     would be
> >>> >     >> a
> >>> >     >>>>>>>> nightmare, and in the end would require operator-level
> >>> >     >> requirements
> >>> >     >>>>> any
> >>> >     >>>>>>>> way.
> >>> >     >>>>>>>> In that sense, I'm not even sure whether it really
> increases
> >>> >     >>>>> usability.
> >>> >     >>>>>>>> My main worry is that it if we wire the runtime to work
> >>> >     on SSGs
> >>> >     >>>> it's
> >>> >     >>>>>>>> gonna be difficult to implement more fine-grained
> approaches,
> >>> >     >> which
> >>> >     >>>>>>>> would not be the case if, for the runtime, they are
> always
> >>> >     >> defined
> >>> >     >>>> on
> >>> >     >>>>>> an
> >>> >     >>>>>>>> operator-level.
> >>> >     >>>>>>>>
> >>> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> >>> >     >>>>>>>>> Thanks for drafting this FLIP and starting this
> discussion
> >>> >     >>>> Yangze.
> >>> >     >>>>>>>>> I like that defining resource requirements on a slot
> sharing
> >>> >     >>>> group
> >>> >     >>>>>>> makes
> >>> >     >>>>>>>>> the overall setup easier and improves usability of
> resource
> >>> >     >>>>>>> requirements.
> >>> >     >>>>>>>>> What I do not like about it is that it changes slot
> sharing
> >>> >     >>>> groups
> >>> >     >>>>>> from
> >>> >     >>>>>>>>> being a scheduling hint to something which needs to be
> >>> >     >> supported
> >>> >     >>>> in
> >>> >     >>>>>>> order
> >>> >     >>>>>>>>> to support fine grained resource requirements. So far,
> the
> >>> >     >> idea
> >>> >     >>>> of
> >>> >     >>>>>> slot
> >>> >     >>>>>>>>> sharing groups was that it tells the system that a set
> of
> >>> >     >>>> operators
> >>> >     >>>>>> can
> >>> >     >>>>>>>> be
> >>> >     >>>>>>>>> deployed in the same slot. But the system still had the
> >>> >     >> freedom
> >>> >     >>>> to
> >>> >     >>>>>> say
> >>> >     >>>>>>>> that
> >>> >     >>>>>>>>> it would rather place these tasks in different slots
> if it
> >>> >     >>>> wanted.
> >>> >     >>>>> If
> >>> >     >>>>>>> we
> >>> >     >>>>>>>>> now specify resource requirements on a per slot sharing
> >>> >     >> group,
> >>> >     >>>> then
> >>> >     >>>>>> the
> >>> >     >>>>>>>>> only option for a scheduler which does not support slot
> >>> >     >> sharing
> >>> >     >>>>>> groups
> >>> >     >>>>>>> is
> >>> >     >>>>>>>>> to say that every operator in this slot sharing group
> >>> >     needs a
> >>> >     >>>> slot
> >>> >     >>>>>> with
> >>> >     >>>>>>>> the
> >>> >     >>>>>>>>> same resources as the whole group.
> >>> >     >>>>>>>>>
> >>> >     >>>>>>>>> So for example, if we have a job consisting of two
> operator
> >>> >     >> op_1
> >>> >     >>>>> and
> >>> >     >>>>>>> op_2
> >>> >     >>>>>>>>> where each op needs 100 MB of memory, we would then
> say that
> >>> >     >> the
> >>> >     >>>>> slot
> >>> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we
> have a
> >>> >     >> cluster
> >>> >     >>>>>> with
> >>> >     >>>>>>> 2
> >>> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system
> cannot run
> >>> >     >> this
> >>> >     >>>>>> job.
> >>> >     >>>>>>> If
> >>> >     >>>>>>>>> the resources were specified on an operator level,
> then the
> >>> >     >>>> system
> >>> >     >>>>>>> could
> >>> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and
> op_2 to
> >>> >     >> TM_2.
> >>> >     >>>>>>>>> Originally, one of the primary goals of slot sharing
> groups
> >>> >     >> was
> >>> >     >>>> to
> >>> >     >>>>>> make
> >>> >     >>>>>>>> it
> >>> >     >>>>>>>>> easier for the user to reason about how many slots a
> job
> >>> >     >> needs
> >>> >     >>>>>>>> independent
> >>> >     >>>>>>>>> of the actual number of operators in the job.
> Interestingly,
> >>> >     >> if
> >>> >     >>>> all
> >>> >     >>>>>>>>> operators have their resources properly specified,
> then slot
> >>> >     >>>>> sharing
> >>> >     >>>>>> is
> >>> >     >>>>>>>> no
> >>> >     >>>>>>>>> longer needed because Flink could slice off the
> >>> >     appropriately
> >>> >     >>>> sized
> >>> >     >>>>>>> slots
> >>> >     >>>>>>>>> for every Task individually. What matters is whether
> the
> >>> >     >> whole
> >>> >     >>>>>> cluster
> >>> >     >>>>>>>> has
> >>> >     >>>>>>>>> enough resources to run all tasks or not.
> >>> >     >>>>>>>>>
> >>> >     >>>>>>>>> Cheers,
> >>> >     >>>>>>>>> Till
> >>> >     >>>>>>>>>
> >>> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> >>> >     >> karmagyz@gmail.com <ma...@gmail.com>>
> >>> >     >>>>>> wrote:
> >>> >     >>>>>>>>>> Hi, there,
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> We would like to start a discussion thread on
> "FLIP-156:
> >>> >     >> Runtime
> >>> >     >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
> >>> >     >> where we
> >>> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime
> interfaces
> >>> >     >> for
> >>> >     >>>>>>>>>> specifying fine-grained resource requirements.
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> In this FLIP:
> >>> >     >>>>>>>>>> - Expound the user story of fine-grained resource
> >>> >     >> management.
> >>> >     >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
> >>> >     >> resource
> >>> >     >>>>>>>>>> requirements.
> >>> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
> >>> >     >> granularities
> >>> >     >>>>> for
> >>> >     >>>>>>>>>> specifying the resource requirements (op, task and
> slot
> >>> >     >> sharing
> >>> >     >>>>>> group)
> >>> >     >>>>>>>>>> and explain why we choose the slot sharing group.
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> Please find more details in the FLIP wiki document
> [1].
> >>> >     >> Looking
> >>> >     >>>>>>>>>> forward to your feedback.
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>>> [1]
> >>> >     >>>>>>>>>>
> >>> >     >>
> >>> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >>> >     <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >
> >>> >     >>>>>>>>>> Best,
> >>> >     >>>>>>>>>> Yangze Guo
> >>> >     >>>>>>>>>>
> >>> >     >>>>>>>>
> >>> >
> >>>
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Yangze Guo <ka...@gmail.com>.

Thanks everyone for the lively discussion. I'd like to try to
summarize the current convergence in the discussion. Please let me
know if I got things wrong or missed something crucial here.

Change of this FLIP:
- Treat the SSG resource requirements as a hint instead of a
restriction for the runtime. That's should be explicitly explained in
the JavaDocs.

Potential follow-up issues if needed:
- Provide operator-level resource configuration interface.
- Provide multiple options for deciding resources for SSGs whose
requirement is not specified:
    ** Default slot resource.
    ** Default operator resource times number of operators.

If there are no other issues, I'll update the FLIP accordingly and
start a vote thread. Thanks all for the valuable feedback again.

Best,
Yangze Guo

Best,
Yangze Guo


On Fri, Jan 22, 2021 at 11:30 AM Xintong Song <to...@gmail.com> wrote:
>
>
>  FGRuntimeInterface.png
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <to...@gmail.com> wrote:
>>
>> I think Chesnay's proposal could actually work. IIUC, the keypoint is to derive operator requirements from SSG requirements on the API side, so that the runtime only deals with operator requirements. It's debatable how the deriving should be done though. E.g., an alternative could be to evenly divide the SSG requirement into requirements of operators in the group.
>>
>>
>> However, I'm not entirely sure which option is more desired. Illustrating my understanding in the following figure, in which on the top is Chesnay's proposal and on the bottom is the SSG-based proposal in this FLIP.
>>
>>
>>
>> I think the major difference between the two approaches is where deriving operator requirements from SSG requirements happens.
>>
>> - Chesnay's proposal simplifies the runtime logic and the interface to expose, at the price of moving more complexity (i.e. the deriving) to the API side. The question is, where do we prefer to keep the complexity? I'm slightly leaning towards having a thin API and keep the complexity in runtime if possible.
>>
>> - Notice that the dash line arrows represent optional steps that are needed only for schedulers that do not respect SSGs, which we don't have at the moment. If we only look at the solid line arrows, then the SSG-based approach is much simpler, without needing to derive and aggregate the requirements back and forth. I'm not sure about complicating the current design only for the potential future needs.
>>
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>>
>> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ch...@apache.org> wrote:
>>>
>>> You're raising a good point, but I think I can rectify that with a minor
>>> adjustment.
>>>
>>> Default requirements are whatever the default requirements are, setting
>>> the requirements for one operator has no effect on other operators.
>>>
>>> With these rules, and some API enhancements, the following mockup would
>>> replicate the SSG-based behavior:
>>>
>>> Map<SlotSharingGroupId, Requirements> requirements = ...
>>> for slotSharingGroup in env.getSlotSharingGroups() {
>>>      vertices = slotSharingGroup.getVertices()
>>> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
>>> vertices.remainint().setRequirements(ZERO)
>>> }
>>>
>>> We could even allow setting requirements on slotsharing-groups
>>> colocation-groups and internally translate them accordingly.
>>> I can't help but feel this is a plain API issue.
>>>
>>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
>>> > If I understand you correctly Chesnay, then you want to decouple the
>>> > resource requirement specification from the slot sharing group
>>> > assignment. Hence, per default all operators would be in the same slot
>>> > sharing group. If there is no operator with a resource specification,
>>> > then the system would allocate a default slot for it. If there is at
>>> > least one operator, then the system would sum up all the specified
>>> > resources and allocate a slot of this size. This effectively means
>>> > that all unspecified operators will implicitly have a zero resource
>>> > requirement. Did I understand your idea correctly?
>>> >
>>> > I am wondering whether this wouldn't lead to a surprising behaviour
>>> > for the user. If the user specifies the resource requirements for a
>>> > single operator, then he probably will assume that the other operators
>>> > will get the default share of resources and not nothing.
>>> >
>>> > Cheers,
>>> > Till
>>> >
>>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <chesnay@apache.org
>>> > <ma...@apache.org>> wrote:
>>> >
>>> >     Is there even a functional difference between specifying the
>>> >     requirements for an SSG vs specifying the same requirements on a
>>> >     single
>>> >     operator within that group (ideally a colocation group to avoid this
>>> >     whole hint business)?
>>> >
>>> >     Wouldn't we get the best of both worlds in the latter case?
>>> >
>>> >     Users can take shortcuts to define shared requirements,
>>> >     but refine them further as needed on a per-operator basis,
>>> >     without changing semantics of slotsharing groups
>>> >     nor the runtime being locked into SSG-based requirements.
>>> >
>>> >     (And before anyone argues what happens if slotsharing groups
>>> >     change or
>>> >     whatnot, that's a plain API issue that we could surely solve. (A
>>> >     plain
>>> >     iteration over slotsharing groups and therein contained operators
>>> >     would
>>> >     suffice)).
>>> >
>>> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
>>> >     > Maybe a different minor idea: Would it be possible to treat the SSG
>>> >     > resource requirements as a hint for the runtime similar to how
>>> >     slot sharing
>>> >     > groups are designed at the moment? Meaning that we don't give
>>> >     the guarantee
>>> >     > that Flink will always deploy this set of tasks together no
>>> >     matter what
>>> >     > comes. If, for example, the runtime can derive by some means the
>>> >     resource
>>> >     > requirements for each task based on the requirements for the
>>> >     SSG, this
>>> >     > could be possible. One easy strategy would be to give every task
>>> >     the same
>>> >     > resources as the whole slot sharing group. Another one could be
>>> >     > distributing the resources equally among the tasks. This does
>>> >     not even have
>>> >     > to be implemented but we would give ourselves the freedom to change
>>> >     > scheduling if need should arise.
>>> >     >
>>> >     > Cheers,
>>> >     > Till
>>> >     >
>>> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karmagyz@gmail.com
>>> >     <ma...@gmail.com>> wrote:
>>> >     >
>>> >     >> Thanks for the responses, Till and Xintong.
>>> >     >>
>>> >     >> I second Xintong's comment that SSG-based runtime interface
>>> >     will give
>>> >     >> us the flexibility to achieve op/task-based approach. That's one of
>>> >     >> the most important reasons for our design choice.
>>> >     >>
>>> >     >> Some cents regarding the default operator resource:
>>> >     >> - It might be good for the scenario of DataStream jobs.
>>> >     >>     ** For light-weight operators, the accumulative
>>> >     configuration error
>>> >     >> will not be significant. Then, the resource of a task used is
>>> >     >> proportional to the number of operators it contains.
>>> >     >>     ** For heavy operators like join and window or operators
>>> >     using the
>>> >     >> external resources, user will turn to the fine-grained resource
>>> >     >> configuration.
>>> >     >> - It can increase the stability for the standalone cluster
>>> >     where task
>>> >     >> executors registered are heterogeneous(with different default slot
>>> >     >> resources).
>>> >     >> - It might not be good for SQL users. The operators that SQL
>>> >     will be
>>> >     >> transferred to is a black box to the user. We also do not guarantee
>>> >     >> the cross-version of consistency of the transformation so far.
>>> >     >>
>>> >     >> I think it can be treated as a follow-up work when the fine-grained
>>> >     >> resource management is end-to-end ready.
>>> >     >>
>>> >     >> Best,
>>> >     >> Yangze Guo
>>> >     >>
>>> >     >>
>>> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
>>> >     <tonysong820@gmail.com <ma...@gmail.com>>
>>> >     >> wrote:
>>> >     >>> Thanks for the feedback, Till.
>>> >     >>>
>>> >     >>> ## I feel that what you proposed (operator-based + default
>>> >     value) might
>>> >     >> be
>>> >     >>> subsumed by the SSG-based approach.
>>> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
>>> >     categorized by
>>> >     >>> whether the resource requirements are known to the users.
>>> >     >>>
>>> >     >>>     1. *Both known.* As previously mentioned, there's no
>>> >     reason to put
>>> >     >>>     multiple operators whose individual resource requirements
>>> >     are already
>>> >     >> known
>>> >     >>>     into the same group in fine-grained resource management.
>>> >     And if op_1
>>> >     >> and
>>> >     >>>     op_2 are in different groups, there should be no problem
>>> >     switching
>>> >     >> data
>>> >     >>>     exchange mode from pipelined to blocking. This is
>>> >     equivalent to
>>> >     >> specifying
>>> >     >>>     operator resource requirements in your proposal.
>>> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that
>>> >     op_2 is in a
>>> >     >>>     SSG whose resource is not specified thus would have the
>>> >     default slot
>>> >     >>>     resource. This is equivalent to having default operator
>>> >     resources in
>>> >     >> your
>>> >     >>>     proposal.
>>> >     >>>     3. *Both unknown*. The user can either set op_1 and op_2
>>> >     to the same
>>> >     >> SSG
>>> >     >>>     or separate SSGs.
>>> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
>>> >     equivalent to
>>> >     >> the
>>> >     >>>        coarse-grained resource management, where op_1 and op_2
>>> >     share a
>>> >     >> default
>>> >     >>>        size slot no matter which data exchange mode is used.
>>> >     >>>        - If op_1 and op_2 are in different SSGs, then each of
>>> >     them will
>>> >     >> use
>>> >     >>>        a default size slot. This is equivalent to setting them
>>> >     with
>>> >     >> default
>>> >     >>>        operator resources in your proposal.
>>> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is
>>> >     known.*
>>> >     >>>        - It is possible that the user learns the total / max
>>> >     resource
>>> >     >>>        requirement from executing and monitoring the job,
>>> >     while not
>>> >     >>> being aware of
>>> >     >>>        individual operator requirements.
>>> >     >>>        - I believe this is the case your proposal does not
>>> >     cover. And TBH,
>>> >     >>>        this is probably how most users learn the resource
>>> >     requirements,
>>> >     >>> according
>>> >     >>>        to my experiences.
>>> >     >>>        - In this case, the user might need to specify
>>> >     different resources
>>> >     >> if
>>> >     >>>        he wants to switch the execution mode, which should not
>>> >     be worse
>>> >     >> than not
>>> >     >>>        being able to use fine-grained resource management.
>>> >     >>>
>>> >     >>>
>>> >     >>> ## An additional idea inspired by your proposal.
>>> >     >>> We may provide multiple options for deciding resources for
>>> >     SSGs whose
>>> >     >>> requirement is not specified, if needed.
>>> >     >>>
>>> >     >>>     - Default slot resource (current design)
>>> >     >>>     - Default operator resource times number of operators
>>> >     (equivalent to
>>> >     >>>     your proposal)
>>> >     >>>
>>> >     >>>
>>> >     >>> ## Exposing internal runtime strategies
>>> >     >>> Theoretically, yes. Tying to the SSGs, the resource
>>> >     requirements might be
>>> >     >>> affected if how SSGs are internally handled changes in future.
>>> >     >> Practically,
>>> >     >>> I do not concretely see at the moment what kind of changes we
>>> >     may want in
>>> >     >>> future that might conflict with this FLIP proposal, as the
>>> >     question of
>>> >     >>> switching data exchange mode answered above. I'd suggest to
>>> >     not give up
>>> >     >> the
>>> >     >>> user friendliness we may gain now for the future problems that
>>> >     may or may
>>> >     >>> not exist.
>>> >     >>>
>>> >     >>> Moreover, the SSG-based approach has the flexibility to
>>> >     achieve the
>>> >     >>> equivalent behavior as the operator-based approach, if we set each
>>> >     >> operator
>>> >     >>> (or task) to a separate SSG. We can even provide a shortcut
>>> >     option to
>>> >     >>> automatically do that for users, if needed.
>>> >     >>>
>>> >     >>>
>>> >     >>> Thank you~
>>> >     >>>
>>> >     >>> Xintong Song
>>> >     >>>
>>> >     >>>
>>> >     >>>
>>> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
>>> >     <trohrmann@apache.org <ma...@apache.org>>
>>> >     >> wrote:
>>> >     >>>> Thanks for the responses Xintong and Stephan,
>>> >     >>>>
>>> >     >>>> I agree that being able to define the resource requirements for a
>>> >     >> group of
>>> >     >>>> operators is more user friendly. However, my concern is that
>>> >     we are
>>> >     >>>> exposing thereby internal runtime strategies which might
>>> >     limit our
>>> >     >>>> flexibility to execute a given job. Moreover, the semantics of
>>> >     >> configuring
>>> >     >>>> resource requirements for SSGs could break if switching from
>>> >     streaming
>>> >     >> to
>>> >     >>>> batch execution. If one defines the resource requirements for
>>> >     op_1 ->
>>> >     >> op_2
>>> >     >>>> which run in pipelined mode when using the streaming
>>> >     execution, then
>>> >     >> how do
>>> >     >>>> we interpret these requirements when op_1 -> op_2 are
>>> >     executed with a
>>> >     >>>> blocking data exchange in batch execution mode? Consequently,
>>> >     I am
>>> >     >> still
>>> >     >>>> leaning towards Stephan's proposal to set the resource
>>> >     requirements per
>>> >     >>>> operator.
>>> >     >>>>
>>> >     >>>> Maybe the following proposal makes the configuration easier:
>>> >     If the
>>> >     >> user
>>> >     >>>> wants to use fine-grained resource requirements, then she
>>> >     needs to
>>> >     >> specify
>>> >     >>>> the default size which is used for operators which have no
>>> >     explicit
>>> >     >>>> resource annotation. If this holds true, then every operator
>>> >     would
>>> >     >> have a
>>> >     >>>> resource requirement and the system can try to execute the
>>> >     operators
>>> >     >> in the
>>> >     >>>> best possible manner w/o being constrained by how the user
>>> >     set the SSG
>>> >     >>>> requirements.
>>> >     >>>>
>>> >     >>>> Cheers,
>>> >     >>>> Till
>>> >     >>>>
>>> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
>>> >     <tonysong820@gmail.com <ma...@gmail.com>>
>>> >     >>>> wrote:
>>> >     >>>>
>>> >     >>>>> Thanks for the feedback, Stephan.
>>> >     >>>>>
>>> >     >>>>> Actually, your proposal has also come to my mind at some
>>> >     point. And I
>>> >     >>>> have
>>> >     >>>>> some concerns about it.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> 1. It does not give users the same control as the SSG-based
>>> >     approach.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> While both approaches do not require specifying for each
>>> >     operator,
>>> >     >>>>> SSG-based approach supports the semantic that "some operators
>>> >     >> together
>>> >     >>>> use
>>> >     >>>>> this much resource" while the operator-based approach doesn't.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
>>> >     o_m), and
>>> >     >> at
>>> >     >>>> some
>>> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
>>> >     reduces the
>>> >     >> data
>>> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
>>> >     (o_1, ...,
>>> >     >> o_n)
>>> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
>>> >     >> parallelisms
>>> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
>>> >     lead to too
>>> >     >> much
>>> >     >>>>> wasting of resources. If the two SSGs end up needing different
>>> >     >> resources,
>>> >     >>>>> with the SSG-based approach one can directly specify
>>> >     resources for
>>> >     >> the
>>> >     >>>> two
>>> >     >>>>> groups. However, with the operator-based approach, the user will
>>> >     >> have to
>>> >     >>>>> specify resources for each operator in one of the two
>>> >     groups, and
>>> >     >> tune
>>> >     >>>> the
>>> >     >>>>> default slot resource via configurations to fit the other group.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> 2. It increases the chance of breaking operator chains.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> Setting chainnable operators into different slot sharing
>>> >     groups will
>>> >     >>>>> prevent them from being chained. In the current implementation,
>>> >     >>>> downstream
>>> >     >>>>> operators, if SSG not explicitly specified, will be set to
>>> >     the same
>>> >     >> group
>>> >     >>>>> as the chainable upstream operators (unless multiple upstream
>>> >     >> operators
>>> >     >>>> in
>>> >     >>>>> different groups), to reduce the chance of breaking chains.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
>>> >     deciding
>>> >     >> SSGs
>>> >     >>>>> based on whether resource is specified we will easily get
>>> >     groups like
>>> >     >>>> (o_1,
>>> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
>>> >     chained. This
>>> >     >> is
>>> >     >>>> also
>>> >     >>>>> possible for the SSG-based approach, but I believe the
>>> >     chance is much
>>> >     >>>>> smaller because there's no strong reason for users to
>>> >     specify the
>>> >     >> groups
>>> >     >>>>> with alternate operators like that. We are more likely to
>>> >     get groups
>>> >     >> like
>>> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between
>>> >     o_2 and
>>> >     >> o_3.
>>> >     >>>>>
>>> >     >>>>> 3. It complicates the system by having two different
>>> >     mechanisms for
>>> >     >>>> sharing
>>> >     >>>>> managed memory in  a slot.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> - In FLIP-141, we introduced the intra-slot managed memory
>>> >     sharing
>>> >     >>>>> mechanism, where managed memory is first distributed
>>> >     according to the
>>> >     >>>>> consumer type, then further distributed across operators of that
>>> >     >> consumer
>>> >     >>>>> type.
>>> >     >>>>>
>>> >     >>>>> - With the operator-based approach, managed memory size
>>> >     specified
>>> >     >> for an
>>> >     >>>>> operator should account for all the consumer types of that
>>> >     operator.
>>> >     >> That
>>> >     >>>>> means the managed memory is first distributed across
>>> >     operators, then
>>> >     >>>>> distributed to different consumer types of each operator.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> Unfortunately, the different order of the two calculation
>>> >     steps can
>>> >     >> lead
>>> >     >>>> to
>>> >     >>>>> different results. To be specific, the semantic of the
>>> >     configuration
>>> >     >>>> option
>>> >     >>>>> `consumer-weights` changed (within a slot vs. within an
>>> >     operator).
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> To sum up things:
>>> >     >>>>>
>>> >     >>>>> While (3) might be a bit more implementation related, I
>>> >     think (1)
>>> >     >> and (2)
>>> >     >>>>> somehow suggest that, the price for the proposed approach to
>>> >     avoid
>>> >     >>>>> specifying resource for every operator is that it's not as
>>> >     >> independent
>>> >     >>>> from
>>> >     >>>>> operator chaining and slot sharing as the operator-based
>>> >     approach
>>> >     >>>> discussed
>>> >     >>>>> in the FLIP.
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> Thank you~
>>> >     >>>>>
>>> >     >>>>> Xintong Song
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>>
>>> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
>>> >     <sewen@apache.org <ma...@apache.org>>
>>> >     >> wrote:
>>> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
>>> >     >>>>>>
>>> >     >>>>>> I want to say, first of all, that this is super well
>>> >     written. And
>>> >     >> the
>>> >     >>>>>> points that the FLIP makes about how to expose the
>>> >     configuration to
>>> >     >>>> users
>>> >     >>>>>> is exactly the right thing to figure out first.
>>> >     >>>>>> So good job here!
>>> >     >>>>>>
>>> >     >>>>>> About how to let users specify the resource profiles. If I
>>> >     can sum
>>> >     >> the
>>> >     >>>>> FLIP
>>> >     >>>>>> and previous discussion up in my own words, the problem is the
>>> >     >>>> following:
>>> >     >>>>>> Operator-level specification is the simplest and cleanest
>>> >     approach,
>>> >     >>>>> because
>>> >     >>>>>>> it avoids mixing operator configuration (resource) and
>>> >     >> scheduling. No
>>> >     >>>>>>> matter what other parameters change (chaining, slot sharing,
>>> >     >>>> switching
>>> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
>>> >     stay the
>>> >     >>>> same.
>>> >     >>>>>>> But it would require that a user specifies resources on all
>>> >     >>>> operators,
>>> >     >>>>>>> which makes it hard to use. That's why the FLIP suggests going
>>> >     >> with
>>> >     >>>>>>> specifying resources on a Sharing-Group.
>>> >     >>>>>>
>>> >     >>>>>> I think both thoughts are important, so can we find a solution
>>> >     >> where
>>> >     >>>> the
>>> >     >>>>>> Resource Profiles are specified on an Operator, but we
>>> >     still avoid
>>> >     >> that
>>> >     >>>>> we
>>> >     >>>>>> need to specify a resource profile on every operator?
>>> >     >>>>>>
>>> >     >>>>>> What do you think about something like the following:
>>> >     >>>>>>    - Resource Profiles are specified on an operator level.
>>> >     >>>>>>    - Not all operators need profiles
>>> >     >>>>>>    - All Operators without a Resource Profile ended up in the
>>> >     >> default
>>> >     >>>> slot
>>> >     >>>>>> sharing group with a default profile (will get a default slot).
>>> >     >>>>>>    - All Operators with a Resource Profile will go into
>>> >     another slot
>>> >     >>>>> sharing
>>> >     >>>>>> group (the resource-specified-group).
>>> >     >>>>>>    - Users can define different slot sharing groups for
>>> >     operators
>>> >     >> like
>>> >     >>>>> they
>>> >     >>>>>> do now, with the exception that you cannot mix operators
>>> >     that have
>>> >     >> a
>>> >     >>>>>> resource profile and operators that have no resource profile.
>>> >     >>>>>>    - The default case where no operator has a resource
>>> >     profile is
>>> >     >> just a
>>> >     >>>>>> special case of this model
>>> >     >>>>>>    - The chaining logic sums up the profiles per operator,
>>> >     like it
>>> >     >> does
>>> >     >>>>> now,
>>> >     >>>>>> and the scheduler sums up the profiles of the tasks that it
>>> >     >> schedules
>>> >     >>>>>> together.
>>> >     >>>>>>
>>> >     >>>>>>
>>> >     >>>>>> There is another question about reactive scaling raised in the
>>> >     >> FLIP. I
>>> >     >>>>> need
>>> >     >>>>>> to think a bit about that. That is indeed a bit more tricky
>>> >     once we
>>> >     >>>> have
>>> >     >>>>>> slots of different sizes.
>>> >     >>>>>> It is not clear then which of the different slot requests the
>>> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
>>> >     show up,
>>> >     >> or how
>>> >     >>>>> the
>>> >     >>>>>> JobManager redistributes the slots resources when resources
>>> >     (TMs)
>>> >     >>>>> disappear
>>> >     >>>>>> This question is pretty orthogonal, though, to the "how to
>>> >     specify
>>> >     >> the
>>> >     >>>>>> resources".
>>> >     >>>>>>
>>> >     >>>>>>
>>> >     >>>>>> Best,
>>> >     >>>>>> Stephan
>>> >     >>>>>>
>>> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
>>> >     <tonysong820@gmail.com <ma...@gmail.com>
>>> >     >>>>> wrote:
>>> >     >>>>>>> Thanks for drafting the FLIP and driving the discussion,
>>> >     Yangze.
>>> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
>>> >     >>>>>>>
>>> >     >>>>>>> @Till,
>>> >     >>>>>>>
>>> >     >>>>>>> I agree that specifying requirements for SSGs means that SSGs
>>> >     >> need to
>>> >     >>>>> be
>>> >     >>>>>>> supported in fine-grained resource management, otherwise each
>>> >     >>>> operator
>>> >     >>>>>>> might use as many resources as the whole group. However, I
>>> >     cannot
>>> >     >>>> think
>>> >     >>>>>> of
>>> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
>>> >     resource
>>> >     >>>>>>> management.
>>> >     >>>>>>>
>>> >     >>>>>>>
>>> >     >>>>>>>> Interestingly, if all operators have their resources properly
>>> >     >>>>>> specified,
>>> >     >>>>>>>> then slot sharing is no longer needed because Flink could
>>> >     >> slice off
>>> >     >>>>> the
>>> >     >>>>>>>> appropriately sized slots for every Task individually.
>>> >     >>>>>>>>
>>> >     >>>>>>> So for example, if we have a job consisting of two
>>> >     operator op_1
>>> >     >> and
>>> >     >>>>> op_2
>>> >     >>>>>>>> where each op needs 100 MB of memory, we would then say that
>>> >     >> the
>>> >     >>>> slot
>>> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>>> >     >> cluster
>>> >     >>>>> with
>>> >     >>>>>> 2
>>> >     >>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>>> >     >> this
>>> >     >>>>> job.
>>> >     >>>>>> If
>>> >     >>>>>>>> the resources were specified on an operator level, then the
>>> >     >> system
>>> >     >>>>>> could
>>> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>>> >     >> TM_2.
>>> >     >>>>>>>
>>> >     >>>>>>> Couldn't agree more that if all operators' requirements are
>>> >     >> properly
>>> >     >>>>>>> specified, slot sharing should be no longer needed. I
>>> >     think this
>>> >     >>>>> exactly
>>> >     >>>>>>> disproves the example. If we already know op_1 and op_2 each
>>> >     >> needs
>>> >     >>>> 100
>>> >     >>>>> MB
>>> >     >>>>>>> of memory, why would we put them in the same group? If
>>> >     they are
>>> >     >> in
>>> >     >>>>>> separate
>>> >     >>>>>>> groups, with the proposed approach the system can freely
>>> >     deploy
>>> >     >> them
>>> >     >>>> to
>>> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
>>> >     >>>>>>>
>>> >     >>>>>>> Moreover, the precondition for not needing slot sharing is
>>> >     having
>>> >     >>>>>> resource
>>> >     >>>>>>> requirements properly specified for all operators. This is not
>>> >     >> always
>>> >     >>>>>>> possible, and usually requires tremendous efforts. One of the
>>> >     >>>> benefits
>>> >     >>>>>> for
>>> >     >>>>>>> SSG-based requirements is that it allows the user to freely
>>> >     >> decide
>>> >     >>>> the
>>> >     >>>>>>> granularity, thus efforts they want to pay. I would
>>> >     consider SSG
>>> >     >> in
>>> >     >>>>>>> fine-grained resource management as a group of operators
>>> >     that the
>>> >     >>>> user
>>> >     >>>>>>> would like to specify the total resource for. There can be
>>> >     only
>>> >     >> one
>>> >     >>>>> group
>>> >     >>>>>>> in the job, 2~3 groups dividing the job into a few major
>>> >     parts,
>>> >     >> or as
>>> >     >>>>>> many
>>> >     >>>>>>> groups as the number of tasks/operators, depending on how
>>> >     >>>> fine-grained
>>> >     >>>>>> the
>>> >     >>>>>>> user is able to specify the resources.
>>> >     >>>>>>>
>>> >     >>>>>>> Having to support SSGs might be a constraint. But given
>>> >     that all
>>> >     >> the
>>> >     >>>>>>> current scheduler implementations already support SSGs, I
>>> >     tend to
>>> >     >>>> think
>>> >     >>>>>>> that as an acceptable price for the above discussed
>>> >     usability and
>>> >     >>>>>>> flexibility.
>>> >     >>>>>>>
>>> >     >>>>>>> @Chesnay
>>> >     >>>>>>>
>>> >     >>>>>>> Will declaring them on slot sharing groups not also waste
>>> >     >> resources
>>> >     >>>> if
>>> >     >>>>>> the
>>> >     >>>>>>>> parallelism of operators within that group are different?
>>> >     >>>>>>>>
>>> >     >>>>>>> Yes. It's a trade-off between usability and resource
>>> >     >> utilization. To
>>> >     >>>>>> avoid
>>> >     >>>>>>> such wasting, the user can define more groups, so that
>>> >     each group
>>> >     >>>>>> contains
>>> >     >>>>>>> less operators and the chance of having operators with
>>> >     different
>>> >     >>>>>>> parallelism will be reduced. The price is to have more
>>> >     resource
>>> >     >>>>>>> requirements to specify.
>>> >     >>>>>>>
>>> >     >>>>>>> It also seems like quite a hassle for users having to
>>> >     >> recalculate the
>>> >     >>>>>>>> resource requirements if they change the slot sharing.
>>> >     >>>>>>>> I'd think that it's not really workable for users that create
>>> >     >> a set
>>> >     >>>>> of
>>> >     >>>>>>>> re-usable operators which are mixed and matched in their
>>> >     >>>>> applications;
>>> >     >>>>>>>> managing the resources requirements in such a setting
>>> >     would be
>>> >     >> a
>>> >     >>>>>>>> nightmare, and in the end would require operator-level
>>> >     >> requirements
>>> >     >>>>> any
>>> >     >>>>>>>> way.
>>> >     >>>>>>>> In that sense, I'm not even sure whether it really increases
>>> >     >>>>> usability.
>>> >     >>>>>>>     - As mentioned in my reply to Till's comment, there's no
>>> >     >> reason to
>>> >     >>>>> put
>>> >     >>>>>>>     multiple operators whose individual resource
>>> >     requirements are
>>> >     >>>>> already
>>> >     >>>>>>> known
>>> >     >>>>>>>     into the same group in fine-grained resource management.
>>> >     >>>>>>>     - Even an operator implementation is reused for multiple
>>> >     >>>>> applications,
>>> >     >>>>>>>     it does not guarantee the same resource requirements.
>>> >     During
>>> >     >> our
>>> >     >>>>> years
>>> >     >>>>>>> of
>>> >     >>>>>>>     practices in Alibaba, with per-operator requirements
>>> >     >> specified for
>>> >     >>>>>>> Blink's
>>> >     >>>>>>>     fine-grained resource management, very few users
>>> >     (including
>>> >     >> our
>>> >     >>>>>>> specialists
>>> >     >>>>>>>     who are dedicated to supporting Blink users) are as
>>> >     >> experienced as
>>> >     >>>>> to
>>> >     >>>>>>>     accurately predict/estimate the operator resource
>>> >     >> requirements.
>>> >     >>>> Most
>>> >     >>>>>>> people
>>> >     >>>>>>>     rely on the execution-time metrics (throughput, delay, cpu
>>> >     >> load,
>>> >     >>>>>> memory
>>> >     >>>>>>>     usage, GC pressure, etc.) to improve the specification.
>>> >     >>>>>>>
>>> >     >>>>>>> To sum up:
>>> >     >>>>>>> If the user is capable of providing proper resource
>>> >     requirements
>>> >     >> for
>>> >     >>>>>> every
>>> >     >>>>>>> operator, that's definitely a good thing and we would not
>>> >     need to
>>> >     >>>> rely
>>> >     >>>>> on
>>> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
>>> >     >> fine-grained
>>> >     >>>>>> resource
>>> >     >>>>>>> management to work. For those users who are capable and do not
>>> >     >> like
>>> >     >>>>>> having
>>> >     >>>>>>> to set each operator to a separate SSG, I would be ok to have
>>> >     >> both
>>> >     >>>>>>> SSG-based and operator-based runtime interfaces and to only
>>> >     >> fallback
>>> >     >>>> to
>>> >     >>>>>> the
>>> >     >>>>>>> SSG requirements when the operator requirements are not
>>> >     >> specified.
>>> >     >>>>>> However,
>>> >     >>>>>>> as the first step, I think we should prioritise the use cases
>>> >     >> where
>>> >     >>>>> users
>>> >     >>>>>>> are not that experienced.
>>> >     >>>>>>>
>>> >     >>>>>>> Thank you~
>>> >     >>>>>>>
>>> >     >>>>>>> Xintong Song
>>> >     >>>>>>>
>>> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
>>> >     >> chesnay@apache.org <ma...@apache.org>>
>>> >     >>>>>>> wrote:
>>> >     >>>>>>>
>>> >     >>>>>>>> Will declaring them on slot sharing groups not also waste
>>> >     >> resources
>>> >     >>>>> if
>>> >     >>>>>>>> the parallelism of operators within that group are different?
>>> >     >>>>>>>>
>>> >     >>>>>>>> It also seems like quite a hassle for users having to
>>> >     >> recalculate
>>> >     >>>> the
>>> >     >>>>>>>> resource requirements if they change the slot sharing.
>>> >     >>>>>>>> I'd think that it's not really workable for users that create
>>> >     >> a set
>>> >     >>>>> of
>>> >     >>>>>>>> re-usable operators which are mixed and matched in their
>>> >     >>>>> applications;
>>> >     >>>>>>>> managing the resources requirements in such a setting
>>> >     would be
>>> >     >> a
>>> >     >>>>>>>> nightmare, and in the end would require operator-level
>>> >     >> requirements
>>> >     >>>>> any
>>> >     >>>>>>>> way.
>>> >     >>>>>>>> In that sense, I'm not even sure whether it really increases
>>> >     >>>>> usability.
>>> >     >>>>>>>> My main worry is that it if we wire the runtime to work
>>> >     on SSGs
>>> >     >>>> it's
>>> >     >>>>>>>> gonna be difficult to implement more fine-grained approaches,
>>> >     >> which
>>> >     >>>>>>>> would not be the case if, for the runtime, they are always
>>> >     >> defined
>>> >     >>>> on
>>> >     >>>>>> an
>>> >     >>>>>>>> operator-level.
>>> >     >>>>>>>>
>>> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
>>> >     >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
>>> >     >>>> Yangze.
>>> >     >>>>>>>>> I like that defining resource requirements on a slot sharing
>>> >     >>>> group
>>> >     >>>>>>> makes
>>> >     >>>>>>>>> the overall setup easier and improves usability of resource
>>> >     >>>>>>> requirements.
>>> >     >>>>>>>>> What I do not like about it is that it changes slot sharing
>>> >     >>>> groups
>>> >     >>>>>> from
>>> >     >>>>>>>>> being a scheduling hint to something which needs to be
>>> >     >> supported
>>> >     >>>> in
>>> >     >>>>>>> order
>>> >     >>>>>>>>> to support fine grained resource requirements. So far, the
>>> >     >> idea
>>> >     >>>> of
>>> >     >>>>>> slot
>>> >     >>>>>>>>> sharing groups was that it tells the system that a set of
>>> >     >>>> operators
>>> >     >>>>>> can
>>> >     >>>>>>>> be
>>> >     >>>>>>>>> deployed in the same slot. But the system still had the
>>> >     >> freedom
>>> >     >>>> to
>>> >     >>>>>> say
>>> >     >>>>>>>> that
>>> >     >>>>>>>>> it would rather place these tasks in different slots if it
>>> >     >>>> wanted.
>>> >     >>>>> If
>>> >     >>>>>>> we
>>> >     >>>>>>>>> now specify resource requirements on a per slot sharing
>>> >     >> group,
>>> >     >>>> then
>>> >     >>>>>> the
>>> >     >>>>>>>>> only option for a scheduler which does not support slot
>>> >     >> sharing
>>> >     >>>>>> groups
>>> >     >>>>>>> is
>>> >     >>>>>>>>> to say that every operator in this slot sharing group
>>> >     needs a
>>> >     >>>> slot
>>> >     >>>>>> with
>>> >     >>>>>>>> the
>>> >     >>>>>>>>> same resources as the whole group.
>>> >     >>>>>>>>>
>>> >     >>>>>>>>> So for example, if we have a job consisting of two operator
>>> >     >> op_1
>>> >     >>>>> and
>>> >     >>>>>>> op_2
>>> >     >>>>>>>>> where each op needs 100 MB of memory, we would then say that
>>> >     >> the
>>> >     >>>>> slot
>>> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>>> >     >> cluster
>>> >     >>>>>> with
>>> >     >>>>>>> 2
>>> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>>> >     >> this
>>> >     >>>>>> job.
>>> >     >>>>>>> If
>>> >     >>>>>>>>> the resources were specified on an operator level, then the
>>> >     >>>> system
>>> >     >>>>>>> could
>>> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>>> >     >> TM_2.
>>> >     >>>>>>>>> Originally, one of the primary goals of slot sharing groups
>>> >     >> was
>>> >     >>>> to
>>> >     >>>>>> make
>>> >     >>>>>>>> it
>>> >     >>>>>>>>> easier for the user to reason about how many slots a job
>>> >     >> needs
>>> >     >>>>>>>> independent
>>> >     >>>>>>>>> of the actual number of operators in the job. Interestingly,
>>> >     >> if
>>> >     >>>> all
>>> >     >>>>>>>>> operators have their resources properly specified, then slot
>>> >     >>>>> sharing
>>> >     >>>>>> is
>>> >     >>>>>>>> no
>>> >     >>>>>>>>> longer needed because Flink could slice off the
>>> >     appropriately
>>> >     >>>> sized
>>> >     >>>>>>> slots
>>> >     >>>>>>>>> for every Task individually. What matters is whether the
>>> >     >> whole
>>> >     >>>>>> cluster
>>> >     >>>>>>>> has
>>> >     >>>>>>>>> enough resources to run all tasks or not.
>>> >     >>>>>>>>>
>>> >     >>>>>>>>> Cheers,
>>> >     >>>>>>>>> Till
>>> >     >>>>>>>>>
>>> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
>>> >     >> karmagyz@gmail.com <ma...@gmail.com>>
>>> >     >>>>>> wrote:
>>> >     >>>>>>>>>> Hi, there,
>>> >     >>>>>>>>>>
>>> >     >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
>>> >     >> Runtime
>>> >     >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
>>> >     >> where we
>>> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
>>> >     >> for
>>> >     >>>>>>>>>> specifying fine-grained resource requirements.
>>> >     >>>>>>>>>>
>>> >     >>>>>>>>>> In this FLIP:
>>> >     >>>>>>>>>> - Expound the user story of fine-grained resource
>>> >     >> management.
>>> >     >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
>>> >     >> resource
>>> >     >>>>>>>>>> requirements.
>>> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
>>> >     >> granularities
>>> >     >>>>> for
>>> >     >>>>>>>>>> specifying the resource requirements (op, task and slot
>>> >     >> sharing
>>> >     >>>>>> group)
>>> >     >>>>>>>>>> and explain why we choose the slot sharing group.
>>> >     >>>>>>>>>>
>>> >     >>>>>>>>>> Please find more details in the FLIP wiki document [1].
>>> >     >> Looking
>>> >     >>>>>>>>>> forward to your feedback.
>>> >     >>>>>>>>>>
>>> >     >>>>>>>>>> [1]
>>> >     >>>>>>>>>>
>>> >     >>
>>> >     https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>>> >     <https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements>
>>> >     >>>>>>>>>> Best,
>>> >     >>>>>>>>>> Yangze Guo
>>> >     >>>>>>>>>>
>>> >     >>>>>>>>
>>> >
>>>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

 FGRuntimeInterface.png
<https://drive.google.com/file/d/13nYCLBd1HjdfYVWUjxzdNa8wo5n2GKbi/view?usp=drive_web>

Thank you~

Xintong Song



On Fri, Jan 22, 2021 at 11:11 AM Xintong Song <to...@gmail.com> wrote:

> I think Chesnay's proposal could actually work. IIUC, the keypoint is to
> derive operator requirements from SSG requirements on the API side, so that
> the runtime only deals with operator requirements. It's debatable how the
> deriving should be done though. E.g., an alternative could be to evenly
> divide the SSG requirement into requirements of operators in the group.
>
>
> However, I'm not entirely sure which option is more desired. Illustrating
> my understanding in the following figure, in which on the top is
> Chesnay's proposal and on the bottom is the SSG-based proposal in this FLIP.
>
>
> [image: FGRuntimeInterface.png]
>
>
> I think the major difference between the two approaches is where deriving
> operator requirements from SSG requirements happens.
>
> - Chesnay's proposal simplifies the runtime logic and the interface to
> expose, at the price of moving more complexity (i.e. the deriving) to the
> API side. The question is, where do we prefer to keep the complexity? I'm
> slightly leaning towards having a thin API and keep the complexity in
> runtime if possible.
>
> - Notice that the dash line arrows represent optional steps that are
> needed only for schedulers that do not respect SSGs, which we don't have at
> the moment. If we only look at the solid line arrows, then the SSG-based
> approach is much simpler, without needing to derive and aggregate the
> requirements back and forth. I'm not sure about complicating the current
> design only for the potential future needs.
>
>
> Thank you~
>
> Xintong Song
>
>
>
>
> On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> You're raising a good point, but I think I can rectify that with a minor
>> adjustment.
>>
>> Default requirements are whatever the default requirements are, setting
>> the requirements for one operator has no effect on other operators.
>>
>> With these rules, and some API enhancements, the following mockup would
>> replicate the SSG-based behavior:
>>
>> Map<SlotSharingGroupId, Requirements> requirements = ...
>> for slotSharingGroup in env.getSlotSharingGroups() {
>>      vertices = slotSharingGroup.getVertices()
>>
>> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
>> vertices.remainint().setRequirements(ZERO)
>> }
>>
>> We could even allow setting requirements on slotsharing-groups
>> colocation-groups and internally translate them accordingly.
>> I can't help but feel this is a plain API issue.
>>
>> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
>> > If I understand you correctly Chesnay, then you want to decouple the
>> > resource requirement specification from the slot sharing group
>> > assignment. Hence, per default all operators would be in the same slot
>> > sharing group. If there is no operator with a resource specification,
>> > then the system would allocate a default slot for it. If there is at
>> > least one operator, then the system would sum up all the specified
>> > resources and allocate a slot of this size. This effectively means
>> > that all unspecified operators will implicitly have a zero resource
>> > requirement. Did I understand your idea correctly?
>> >
>> > I am wondering whether this wouldn't lead to a surprising behaviour
>> > for the user. If the user specifies the resource requirements for a
>> > single operator, then he probably will assume that the other operators
>> > will get the default share of resources and not nothing.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <chesnay@apache.org
>> > <ma...@apache.org>> wrote:
>> >
>> >     Is there even a functional difference between specifying the
>> >     requirements for an SSG vs specifying the same requirements on a
>> >     single
>> >     operator within that group (ideally a colocation group to avoid this
>> >     whole hint business)?
>> >
>> >     Wouldn't we get the best of both worlds in the latter case?
>> >
>> >     Users can take shortcuts to define shared requirements,
>> >     but refine them further as needed on a per-operator basis,
>> >     without changing semantics of slotsharing groups
>> >     nor the runtime being locked into SSG-based requirements.
>> >
>> >     (And before anyone argues what happens if slotsharing groups
>> >     change or
>> >     whatnot, that's a plain API issue that we could surely solve. (A
>> >     plain
>> >     iteration over slotsharing groups and therein contained operators
>> >     would
>> >     suffice)).
>> >
>> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
>> >     > Maybe a different minor idea: Would it be possible to treat the
>> SSG
>> >     > resource requirements as a hint for the runtime similar to how
>> >     slot sharing
>> >     > groups are designed at the moment? Meaning that we don't give
>> >     the guarantee
>> >     > that Flink will always deploy this set of tasks together no
>> >     matter what
>> >     > comes. If, for example, the runtime can derive by some means the
>> >     resource
>> >     > requirements for each task based on the requirements for the
>> >     SSG, this
>> >     > could be possible. One easy strategy would be to give every task
>> >     the same
>> >     > resources as the whole slot sharing group. Another one could be
>> >     > distributing the resources equally among the tasks. This does
>> >     not even have
>> >     > to be implemented but we would give ourselves the freedom to
>> change
>> >     > scheduling if need should arise.
>> >     >
>> >     > Cheers,
>> >     > Till
>> >     >
>> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karmagyz@gmail.com
>> >     <ma...@gmail.com>> wrote:
>> >     >
>> >     >> Thanks for the responses, Till and Xintong.
>> >     >>
>> >     >> I second Xintong's comment that SSG-based runtime interface
>> >     will give
>> >     >> us the flexibility to achieve op/task-based approach. That's one
>> of
>> >     >> the most important reasons for our design choice.
>> >     >>
>> >     >> Some cents regarding the default operator resource:
>> >     >> - It might be good for the scenario of DataStream jobs.
>> >     >>     ** For light-weight operators, the accumulative
>> >     configuration error
>> >     >> will not be significant. Then, the resource of a task used is
>> >     >> proportional to the number of operators it contains.
>> >     >>     ** For heavy operators like join and window or operators
>> >     using the
>> >     >> external resources, user will turn to the fine-grained resource
>> >     >> configuration.
>> >     >> - It can increase the stability for the standalone cluster
>> >     where task
>> >     >> executors registered are heterogeneous(with different default
>> slot
>> >     >> resources).
>> >     >> - It might not be good for SQL users. The operators that SQL
>> >     will be
>> >     >> transferred to is a black box to the user. We also do not
>> guarantee
>> >     >> the cross-version of consistency of the transformation so far.
>> >     >>
>> >     >> I think it can be treated as a follow-up work when the
>> fine-grained
>> >     >> resource management is end-to-end ready.
>> >     >>
>> >     >> Best,
>> >     >> Yangze Guo
>> >     >>
>> >     >>
>> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
>> >     <tonysong820@gmail.com <ma...@gmail.com>>
>> >     >> wrote:
>> >     >>> Thanks for the feedback, Till.
>> >     >>>
>> >     >>> ## I feel that what you proposed (operator-based + default
>> >     value) might
>> >     >> be
>> >     >>> subsumed by the SSG-based approach.
>> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
>> >     categorized by
>> >     >>> whether the resource requirements are known to the users.
>> >     >>>
>> >     >>>     1. *Both known.* As previously mentioned, there's no
>> >     reason to put
>> >     >>>     multiple operators whose individual resource requirements
>> >     are already
>> >     >> known
>> >     >>>     into the same group in fine-grained resource management.
>> >     And if op_1
>> >     >> and
>> >     >>>     op_2 are in different groups, there should be no problem
>> >     switching
>> >     >> data
>> >     >>>     exchange mode from pipelined to blocking. This is
>> >     equivalent to
>> >     >> specifying
>> >     >>>     operator resource requirements in your proposal.
>> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that
>> >     op_2 is in a
>> >     >>>     SSG whose resource is not specified thus would have the
>> >     default slot
>> >     >>>     resource. This is equivalent to having default operator
>> >     resources in
>> >     >> your
>> >     >>>     proposal.
>> >     >>>     3. *Both unknown*. The user can either set op_1 and op_2
>> >     to the same
>> >     >> SSG
>> >     >>>     or separate SSGs.
>> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
>> >     equivalent to
>> >     >> the
>> >     >>>        coarse-grained resource management, where op_1 and op_2
>> >     share a
>> >     >> default
>> >     >>>        size slot no matter which data exchange mode is used.
>> >     >>>        - If op_1 and op_2 are in different SSGs, then each of
>> >     them will
>> >     >> use
>> >     >>>        a default size slot. This is equivalent to setting them
>> >     with
>> >     >> default
>> >     >>>        operator resources in your proposal.
>> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is
>> >     known.*
>> >     >>>        - It is possible that the user learns the total / max
>> >     resource
>> >     >>>        requirement from executing and monitoring the job,
>> >     while not
>> >     >>> being aware of
>> >     >>>        individual operator requirements.
>> >     >>>        - I believe this is the case your proposal does not
>> >     cover. And TBH,
>> >     >>>        this is probably how most users learn the resource
>> >     requirements,
>> >     >>> according
>> >     >>>        to my experiences.
>> >     >>>        - In this case, the user might need to specify
>> >     different resources
>> >     >> if
>> >     >>>        he wants to switch the execution mode, which should not
>> >     be worse
>> >     >> than not
>> >     >>>        being able to use fine-grained resource management.
>> >     >>>
>> >     >>>
>> >     >>> ## An additional idea inspired by your proposal.
>> >     >>> We may provide multiple options for deciding resources for
>> >     SSGs whose
>> >     >>> requirement is not specified, if needed.
>> >     >>>
>> >     >>>     - Default slot resource (current design)
>> >     >>>     - Default operator resource times number of operators
>> >     (equivalent to
>> >     >>>     your proposal)
>> >     >>>
>> >     >>>
>> >     >>> ## Exposing internal runtime strategies
>> >     >>> Theoretically, yes. Tying to the SSGs, the resource
>> >     requirements might be
>> >     >>> affected if how SSGs are internally handled changes in future.
>> >     >> Practically,
>> >     >>> I do not concretely see at the moment what kind of changes we
>> >     may want in
>> >     >>> future that might conflict with this FLIP proposal, as the
>> >     question of
>> >     >>> switching data exchange mode answered above. I'd suggest to
>> >     not give up
>> >     >> the
>> >     >>> user friendliness we may gain now for the future problems that
>> >     may or may
>> >     >>> not exist.
>> >     >>>
>> >     >>> Moreover, the SSG-based approach has the flexibility to
>> >     achieve the
>> >     >>> equivalent behavior as the operator-based approach, if we set
>> each
>> >     >> operator
>> >     >>> (or task) to a separate SSG. We can even provide a shortcut
>> >     option to
>> >     >>> automatically do that for users, if needed.
>> >     >>>
>> >     >>>
>> >     >>> Thank you~
>> >     >>>
>> >     >>> Xintong Song
>> >     >>>
>> >     >>>
>> >     >>>
>> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
>> >     <trohrmann@apache.org <ma...@apache.org>>
>> >     >> wrote:
>> >     >>>> Thanks for the responses Xintong and Stephan,
>> >     >>>>
>> >     >>>> I agree that being able to define the resource requirements
>> for a
>> >     >> group of
>> >     >>>> operators is more user friendly. However, my concern is that
>> >     we are
>> >     >>>> exposing thereby internal runtime strategies which might
>> >     limit our
>> >     >>>> flexibility to execute a given job. Moreover, the semantics of
>> >     >> configuring
>> >     >>>> resource requirements for SSGs could break if switching from
>> >     streaming
>> >     >> to
>> >     >>>> batch execution. If one defines the resource requirements for
>> >     op_1 ->
>> >     >> op_2
>> >     >>>> which run in pipelined mode when using the streaming
>> >     execution, then
>> >     >> how do
>> >     >>>> we interpret these requirements when op_1 -> op_2 are
>> >     executed with a
>> >     >>>> blocking data exchange in batch execution mode? Consequently,
>> >     I am
>> >     >> still
>> >     >>>> leaning towards Stephan's proposal to set the resource
>> >     requirements per
>> >     >>>> operator.
>> >     >>>>
>> >     >>>> Maybe the following proposal makes the configuration easier:
>> >     If the
>> >     >> user
>> >     >>>> wants to use fine-grained resource requirements, then she
>> >     needs to
>> >     >> specify
>> >     >>>> the default size which is used for operators which have no
>> >     explicit
>> >     >>>> resource annotation. If this holds true, then every operator
>> >     would
>> >     >> have a
>> >     >>>> resource requirement and the system can try to execute the
>> >     operators
>> >     >> in the
>> >     >>>> best possible manner w/o being constrained by how the user
>> >     set the SSG
>> >     >>>> requirements.
>> >     >>>>
>> >     >>>> Cheers,
>> >     >>>> Till
>> >     >>>>
>> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
>> >     <tonysong820@gmail.com <ma...@gmail.com>>
>> >     >>>> wrote:
>> >     >>>>
>> >     >>>>> Thanks for the feedback, Stephan.
>> >     >>>>>
>> >     >>>>> Actually, your proposal has also come to my mind at some
>> >     point. And I
>> >     >>>> have
>> >     >>>>> some concerns about it.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> 1. It does not give users the same control as the SSG-based
>> >     approach.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> While both approaches do not require specifying for each
>> >     operator,
>> >     >>>>> SSG-based approach supports the semantic that "some operators
>> >     >> together
>> >     >>>> use
>> >     >>>>> this much resource" while the operator-based approach doesn't.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
>> >     o_m), and
>> >     >> at
>> >     >>>> some
>> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
>> >     reduces the
>> >     >> data
>> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
>> >     (o_1, ...,
>> >     >> o_n)
>> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
>> >     >> parallelisms
>> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
>> >     lead to too
>> >     >> much
>> >     >>>>> wasting of resources. If the two SSGs end up needing different
>> >     >> resources,
>> >     >>>>> with the SSG-based approach one can directly specify
>> >     resources for
>> >     >> the
>> >     >>>> two
>> >     >>>>> groups. However, with the operator-based approach, the user
>> will
>> >     >> have to
>> >     >>>>> specify resources for each operator in one of the two
>> >     groups, and
>> >     >> tune
>> >     >>>> the
>> >     >>>>> default slot resource via configurations to fit the other
>> group.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> 2. It increases the chance of breaking operator chains.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Setting chainnable operators into different slot sharing
>> >     groups will
>> >     >>>>> prevent them from being chained. In the current
>> implementation,
>> >     >>>> downstream
>> >     >>>>> operators, if SSG not explicitly specified, will be set to
>> >     the same
>> >     >> group
>> >     >>>>> as the chainable upstream operators (unless multiple upstream
>> >     >> operators
>> >     >>>> in
>> >     >>>>> different groups), to reduce the chance of breaking chains.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
>> >     deciding
>> >     >> SSGs
>> >     >>>>> based on whether resource is specified we will easily get
>> >     groups like
>> >     >>>> (o_1,
>> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
>> >     chained. This
>> >     >> is
>> >     >>>> also
>> >     >>>>> possible for the SSG-based approach, but I believe the
>> >     chance is much
>> >     >>>>> smaller because there's no strong reason for users to
>> >     specify the
>> >     >> groups
>> >     >>>>> with alternate operators like that. We are more likely to
>> >     get groups
>> >     >> like
>> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between
>> >     o_2 and
>> >     >> o_3.
>> >     >>>>>
>> >     >>>>> 3. It complicates the system by having two different
>> >     mechanisms for
>> >     >>>> sharing
>> >     >>>>> managed memory in  a slot.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> - In FLIP-141, we introduced the intra-slot managed memory
>> >     sharing
>> >     >>>>> mechanism, where managed memory is first distributed
>> >     according to the
>> >     >>>>> consumer type, then further distributed across operators of
>> that
>> >     >> consumer
>> >     >>>>> type.
>> >     >>>>>
>> >     >>>>> - With the operator-based approach, managed memory size
>> >     specified
>> >     >> for an
>> >     >>>>> operator should account for all the consumer types of that
>> >     operator.
>> >     >> That
>> >     >>>>> means the managed memory is first distributed across
>> >     operators, then
>> >     >>>>> distributed to different consumer types of each operator.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Unfortunately, the different order of the two calculation
>> >     steps can
>> >     >> lead
>> >     >>>> to
>> >     >>>>> different results. To be specific, the semantic of the
>> >     configuration
>> >     >>>> option
>> >     >>>>> `consumer-weights` changed (within a slot vs. within an
>> >     operator).
>> >     >>>>>
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> To sum up things:
>> >     >>>>>
>> >     >>>>> While (3) might be a bit more implementation related, I
>> >     think (1)
>> >     >> and (2)
>> >     >>>>> somehow suggest that, the price for the proposed approach to
>> >     avoid
>> >     >>>>> specifying resource for every operator is that it's not as
>> >     >> independent
>> >     >>>> from
>> >     >>>>> operator chaining and slot sharing as the operator-based
>> >     approach
>> >     >>>> discussed
>> >     >>>>> in the FLIP.
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> Thank you~
>> >     >>>>>
>> >     >>>>> Xintong Song
>> >     >>>>>
>> >     >>>>>
>> >     >>>>>
>> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
>> >     <sewen@apache.org <ma...@apache.org>>
>> >     >> wrote:
>> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
>> >     >>>>>>
>> >     >>>>>> I want to say, first of all, that this is super well
>> >     written. And
>> >     >> the
>> >     >>>>>> points that the FLIP makes about how to expose the
>> >     configuration to
>> >     >>>> users
>> >     >>>>>> is exactly the right thing to figure out first.
>> >     >>>>>> So good job here!
>> >     >>>>>>
>> >     >>>>>> About how to let users specify the resource profiles. If I
>> >     can sum
>> >     >> the
>> >     >>>>> FLIP
>> >     >>>>>> and previous discussion up in my own words, the problem is
>> the
>> >     >>>> following:
>> >     >>>>>> Operator-level specification is the simplest and cleanest
>> >     approach,
>> >     >>>>> because
>> >     >>>>>>> it avoids mixing operator configuration (resource) and
>> >     >> scheduling. No
>> >     >>>>>>> matter what other parameters change (chaining, slot sharing,
>> >     >>>> switching
>> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
>> >     stay the
>> >     >>>> same.
>> >     >>>>>>> But it would require that a user specifies resources on all
>> >     >>>> operators,
>> >     >>>>>>> which makes it hard to use. That's why the FLIP suggests
>> going
>> >     >> with
>> >     >>>>>>> specifying resources on a Sharing-Group.
>> >     >>>>>>
>> >     >>>>>> I think both thoughts are important, so can we find a
>> solution
>> >     >> where
>> >     >>>> the
>> >     >>>>>> Resource Profiles are specified on an Operator, but we
>> >     still avoid
>> >     >> that
>> >     >>>>> we
>> >     >>>>>> need to specify a resource profile on every operator?
>> >     >>>>>>
>> >     >>>>>> What do you think about something like the following:
>> >     >>>>>>    - Resource Profiles are specified on an operator level.
>> >     >>>>>>    - Not all operators need profiles
>> >     >>>>>>    - All Operators without a Resource Profile ended up in the
>> >     >> default
>> >     >>>> slot
>> >     >>>>>> sharing group with a default profile (will get a default
>> slot).
>> >     >>>>>>    - All Operators with a Resource Profile will go into
>> >     another slot
>> >     >>>>> sharing
>> >     >>>>>> group (the resource-specified-group).
>> >     >>>>>>    - Users can define different slot sharing groups for
>> >     operators
>> >     >> like
>> >     >>>>> they
>> >     >>>>>> do now, with the exception that you cannot mix operators
>> >     that have
>> >     >> a
>> >     >>>>>> resource profile and operators that have no resource profile.
>> >     >>>>>>    - The default case where no operator has a resource
>> >     profile is
>> >     >> just a
>> >     >>>>>> special case of this model
>> >     >>>>>>    - The chaining logic sums up the profiles per operator,
>> >     like it
>> >     >> does
>> >     >>>>> now,
>> >     >>>>>> and the scheduler sums up the profiles of the tasks that it
>> >     >> schedules
>> >     >>>>>> together.
>> >     >>>>>>
>> >     >>>>>>
>> >     >>>>>> There is another question about reactive scaling raised in
>> the
>> >     >> FLIP. I
>> >     >>>>> need
>> >     >>>>>> to think a bit about that. That is indeed a bit more tricky
>> >     once we
>> >     >>>> have
>> >     >>>>>> slots of different sizes.
>> >     >>>>>> It is not clear then which of the different slot requests the
>> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
>> >     show up,
>> >     >> or how
>> >     >>>>> the
>> >     >>>>>> JobManager redistributes the slots resources when resources
>> >     (TMs)
>> >     >>>>> disappear
>> >     >>>>>> This question is pretty orthogonal, though, to the "how to
>> >     specify
>> >     >> the
>> >     >>>>>> resources".
>> >     >>>>>>
>> >     >>>>>>
>> >     >>>>>> Best,
>> >     >>>>>> Stephan
>> >     >>>>>>
>> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
>> >     <tonysong820@gmail.com <ma...@gmail.com>
>> >     >>>>> wrote:
>> >     >>>>>>> Thanks for drafting the FLIP and driving the discussion,
>> >     Yangze.
>> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
>> >     >>>>>>>
>> >     >>>>>>> @Till,
>> >     >>>>>>>
>> >     >>>>>>> I agree that specifying requirements for SSGs means that
>> SSGs
>> >     >> need to
>> >     >>>>> be
>> >     >>>>>>> supported in fine-grained resource management, otherwise
>> each
>> >     >>>> operator
>> >     >>>>>>> might use as many resources as the whole group. However, I
>> >     cannot
>> >     >>>> think
>> >     >>>>>> of
>> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
>> >     resource
>> >     >>>>>>> management.
>> >     >>>>>>>
>> >     >>>>>>>
>> >     >>>>>>>> Interestingly, if all operators have their resources
>> properly
>> >     >>>>>> specified,
>> >     >>>>>>>> then slot sharing is no longer needed because Flink could
>> >     >> slice off
>> >     >>>>> the
>> >     >>>>>>>> appropriately sized slots for every Task individually.
>> >     >>>>>>>>
>> >     >>>>>>> So for example, if we have a job consisting of two
>> >     operator op_1
>> >     >> and
>> >     >>>>> op_2
>> >     >>>>>>>> where each op needs 100 MB of memory, we would then say
>> that
>> >     >> the
>> >     >>>> slot
>> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>> >     >> cluster
>> >     >>>>> with
>> >     >>>>>> 2
>> >     >>>>>>>> TMs with one slot of 100 MB each, then the system cannot
>> run
>> >     >> this
>> >     >>>>> job.
>> >     >>>>>> If
>> >     >>>>>>>> the resources were specified on an operator level, then the
>> >     >> system
>> >     >>>>>> could
>> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>> >     >> TM_2.
>> >     >>>>>>>
>> >     >>>>>>> Couldn't agree more that if all operators' requirements are
>> >     >> properly
>> >     >>>>>>> specified, slot sharing should be no longer needed. I
>> >     think this
>> >     >>>>> exactly
>> >     >>>>>>> disproves the example. If we already know op_1 and op_2 each
>> >     >> needs
>> >     >>>> 100
>> >     >>>>> MB
>> >     >>>>>>> of memory, why would we put them in the same group? If
>> >     they are
>> >     >> in
>> >     >>>>>> separate
>> >     >>>>>>> groups, with the proposed approach the system can freely
>> >     deploy
>> >     >> them
>> >     >>>> to
>> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
>> >     >>>>>>>
>> >     >>>>>>> Moreover, the precondition for not needing slot sharing is
>> >     having
>> >     >>>>>> resource
>> >     >>>>>>> requirements properly specified for all operators. This is
>> not
>> >     >> always
>> >     >>>>>>> possible, and usually requires tremendous efforts. One of
>> the
>> >     >>>> benefits
>> >     >>>>>> for
>> >     >>>>>>> SSG-based requirements is that it allows the user to freely
>> >     >> decide
>> >     >>>> the
>> >     >>>>>>> granularity, thus efforts they want to pay. I would
>> >     consider SSG
>> >     >> in
>> >     >>>>>>> fine-grained resource management as a group of operators
>> >     that the
>> >     >>>> user
>> >     >>>>>>> would like to specify the total resource for. There can be
>> >     only
>> >     >> one
>> >     >>>>> group
>> >     >>>>>>> in the job, 2~3 groups dividing the job into a few major
>> >     parts,
>> >     >> or as
>> >     >>>>>> many
>> >     >>>>>>> groups as the number of tasks/operators, depending on how
>> >     >>>> fine-grained
>> >     >>>>>> the
>> >     >>>>>>> user is able to specify the resources.
>> >     >>>>>>>
>> >     >>>>>>> Having to support SSGs might be a constraint. But given
>> >     that all
>> >     >> the
>> >     >>>>>>> current scheduler implementations already support SSGs, I
>> >     tend to
>> >     >>>> think
>> >     >>>>>>> that as an acceptable price for the above discussed
>> >     usability and
>> >     >>>>>>> flexibility.
>> >     >>>>>>>
>> >     >>>>>>> @Chesnay
>> >     >>>>>>>
>> >     >>>>>>> Will declaring them on slot sharing groups not also waste
>> >     >> resources
>> >     >>>> if
>> >     >>>>>> the
>> >     >>>>>>>> parallelism of operators within that group are different?
>> >     >>>>>>>>
>> >     >>>>>>> Yes. It's a trade-off between usability and resource
>> >     >> utilization. To
>> >     >>>>>> avoid
>> >     >>>>>>> such wasting, the user can define more groups, so that
>> >     each group
>> >     >>>>>> contains
>> >     >>>>>>> less operators and the chance of having operators with
>> >     different
>> >     >>>>>>> parallelism will be reduced. The price is to have more
>> >     resource
>> >     >>>>>>> requirements to specify.
>> >     >>>>>>>
>> >     >>>>>>> It also seems like quite a hassle for users having to
>> >     >> recalculate the
>> >     >>>>>>>> resource requirements if they change the slot sharing.
>> >     >>>>>>>> I'd think that it's not really workable for users that
>> create
>> >     >> a set
>> >     >>>>> of
>> >     >>>>>>>> re-usable operators which are mixed and matched in their
>> >     >>>>> applications;
>> >     >>>>>>>> managing the resources requirements in such a setting
>> >     would be
>> >     >> a
>> >     >>>>>>>> nightmare, and in the end would require operator-level
>> >     >> requirements
>> >     >>>>> any
>> >     >>>>>>>> way.
>> >     >>>>>>>> In that sense, I'm not even sure whether it really
>> increases
>> >     >>>>> usability.
>> >     >>>>>>>     - As mentioned in my reply to Till's comment, there's no
>> >     >> reason to
>> >     >>>>> put
>> >     >>>>>>>     multiple operators whose individual resource
>> >     requirements are
>> >     >>>>> already
>> >     >>>>>>> known
>> >     >>>>>>>     into the same group in fine-grained resource management.
>> >     >>>>>>>     - Even an operator implementation is reused for multiple
>> >     >>>>> applications,
>> >     >>>>>>>     it does not guarantee the same resource requirements.
>> >     During
>> >     >> our
>> >     >>>>> years
>> >     >>>>>>> of
>> >     >>>>>>>     practices in Alibaba, with per-operator requirements
>> >     >> specified for
>> >     >>>>>>> Blink's
>> >     >>>>>>>     fine-grained resource management, very few users
>> >     (including
>> >     >> our
>> >     >>>>>>> specialists
>> >     >>>>>>>     who are dedicated to supporting Blink users) are as
>> >     >> experienced as
>> >     >>>>> to
>> >     >>>>>>>     accurately predict/estimate the operator resource
>> >     >> requirements.
>> >     >>>> Most
>> >     >>>>>>> people
>> >     >>>>>>>     rely on the execution-time metrics (throughput, delay,
>> cpu
>> >     >> load,
>> >     >>>>>> memory
>> >     >>>>>>>     usage, GC pressure, etc.) to improve the specification.
>> >     >>>>>>>
>> >     >>>>>>> To sum up:
>> >     >>>>>>> If the user is capable of providing proper resource
>> >     requirements
>> >     >> for
>> >     >>>>>> every
>> >     >>>>>>> operator, that's definitely a good thing and we would not
>> >     need to
>> >     >>>> rely
>> >     >>>>> on
>> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
>> >     >> fine-grained
>> >     >>>>>> resource
>> >     >>>>>>> management to work. For those users who are capable and do
>> not
>> >     >> like
>> >     >>>>>> having
>> >     >>>>>>> to set each operator to a separate SSG, I would be ok to
>> have
>> >     >> both
>> >     >>>>>>> SSG-based and operator-based runtime interfaces and to only
>> >     >> fallback
>> >     >>>> to
>> >     >>>>>> the
>> >     >>>>>>> SSG requirements when the operator requirements are not
>> >     >> specified.
>> >     >>>>>> However,
>> >     >>>>>>> as the first step, I think we should prioritise the use
>> cases
>> >     >> where
>> >     >>>>> users
>> >     >>>>>>> are not that experienced.
>> >     >>>>>>>
>> >     >>>>>>> Thank you~
>> >     >>>>>>>
>> >     >>>>>>> Xintong Song
>> >     >>>>>>>
>> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
>> >     >> chesnay@apache.org <ma...@apache.org>>
>> >     >>>>>>> wrote:
>> >     >>>>>>>
>> >     >>>>>>>> Will declaring them on slot sharing groups not also waste
>> >     >> resources
>> >     >>>>> if
>> >     >>>>>>>> the parallelism of operators within that group are
>> different?
>> >     >>>>>>>>
>> >     >>>>>>>> It also seems like quite a hassle for users having to
>> >     >> recalculate
>> >     >>>> the
>> >     >>>>>>>> resource requirements if they change the slot sharing.
>> >     >>>>>>>> I'd think that it's not really workable for users that
>> create
>> >     >> a set
>> >     >>>>> of
>> >     >>>>>>>> re-usable operators which are mixed and matched in their
>> >     >>>>> applications;
>> >     >>>>>>>> managing the resources requirements in such a setting
>> >     would be
>> >     >> a
>> >     >>>>>>>> nightmare, and in the end would require operator-level
>> >     >> requirements
>> >     >>>>> any
>> >     >>>>>>>> way.
>> >     >>>>>>>> In that sense, I'm not even sure whether it really
>> increases
>> >     >>>>> usability.
>> >     >>>>>>>> My main worry is that it if we wire the runtime to work
>> >     on SSGs
>> >     >>>> it's
>> >     >>>>>>>> gonna be difficult to implement more fine-grained
>> approaches,
>> >     >> which
>> >     >>>>>>>> would not be the case if, for the runtime, they are always
>> >     >> defined
>> >     >>>> on
>> >     >>>>>> an
>> >     >>>>>>>> operator-level.
>> >     >>>>>>>>
>> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
>> >     >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
>> >     >>>> Yangze.
>> >     >>>>>>>>> I like that defining resource requirements on a slot
>> sharing
>> >     >>>> group
>> >     >>>>>>> makes
>> >     >>>>>>>>> the overall setup easier and improves usability of
>> resource
>> >     >>>>>>> requirements.
>> >     >>>>>>>>> What I do not like about it is that it changes slot
>> sharing
>> >     >>>> groups
>> >     >>>>>> from
>> >     >>>>>>>>> being a scheduling hint to something which needs to be
>> >     >> supported
>> >     >>>> in
>> >     >>>>>>> order
>> >     >>>>>>>>> to support fine grained resource requirements. So far, the
>> >     >> idea
>> >     >>>> of
>> >     >>>>>> slot
>> >     >>>>>>>>> sharing groups was that it tells the system that a set of
>> >     >>>> operators
>> >     >>>>>> can
>> >     >>>>>>>> be
>> >     >>>>>>>>> deployed in the same slot. But the system still had the
>> >     >> freedom
>> >     >>>> to
>> >     >>>>>> say
>> >     >>>>>>>> that
>> >     >>>>>>>>> it would rather place these tasks in different slots if it
>> >     >>>> wanted.
>> >     >>>>> If
>> >     >>>>>>> we
>> >     >>>>>>>>> now specify resource requirements on a per slot sharing
>> >     >> group,
>> >     >>>> then
>> >     >>>>>> the
>> >     >>>>>>>>> only option for a scheduler which does not support slot
>> >     >> sharing
>> >     >>>>>> groups
>> >     >>>>>>> is
>> >     >>>>>>>>> to say that every operator in this slot sharing group
>> >     needs a
>> >     >>>> slot
>> >     >>>>>> with
>> >     >>>>>>>> the
>> >     >>>>>>>>> same resources as the whole group.
>> >     >>>>>>>>>
>> >     >>>>>>>>> So for example, if we have a job consisting of two
>> operator
>> >     >> op_1
>> >     >>>>> and
>> >     >>>>>>> op_2
>> >     >>>>>>>>> where each op needs 100 MB of memory, we would then say
>> that
>> >     >> the
>> >     >>>>> slot
>> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>> >     >> cluster
>> >     >>>>>> with
>> >     >>>>>>> 2
>> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot
>> run
>> >     >> this
>> >     >>>>>> job.
>> >     >>>>>>> If
>> >     >>>>>>>>> the resources were specified on an operator level, then
>> the
>> >     >>>> system
>> >     >>>>>>> could
>> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>> >     >> TM_2.
>> >     >>>>>>>>> Originally, one of the primary goals of slot sharing
>> groups
>> >     >> was
>> >     >>>> to
>> >     >>>>>> make
>> >     >>>>>>>> it
>> >     >>>>>>>>> easier for the user to reason about how many slots a job
>> >     >> needs
>> >     >>>>>>>> independent
>> >     >>>>>>>>> of the actual number of operators in the job.
>> Interestingly,
>> >     >> if
>> >     >>>> all
>> >     >>>>>>>>> operators have their resources properly specified, then
>> slot
>> >     >>>>> sharing
>> >     >>>>>> is
>> >     >>>>>>>> no
>> >     >>>>>>>>> longer needed because Flink could slice off the
>> >     appropriately
>> >     >>>> sized
>> >     >>>>>>> slots
>> >     >>>>>>>>> for every Task individually. What matters is whether the
>> >     >> whole
>> >     >>>>>> cluster
>> >     >>>>>>>> has
>> >     >>>>>>>>> enough resources to run all tasks or not.
>> >     >>>>>>>>>
>> >     >>>>>>>>> Cheers,
>> >     >>>>>>>>> Till
>> >     >>>>>>>>>
>> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
>> >     >> karmagyz@gmail.com <ma...@gmail.com>>
>> >     >>>>>> wrote:
>> >     >>>>>>>>>> Hi, there,
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
>> >     >> Runtime
>> >     >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
>> >     >> where we
>> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
>> >     >> for
>> >     >>>>>>>>>> specifying fine-grained resource requirements.
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> In this FLIP:
>> >     >>>>>>>>>> - Expound the user story of fine-grained resource
>> >     >> management.
>> >     >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
>> >     >> resource
>> >     >>>>>>>>>> requirements.
>> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
>> >     >> granularities
>> >     >>>>> for
>> >     >>>>>>>>>> specifying the resource requirements (op, task and slot
>> >     >> sharing
>> >     >>>>>> group)
>> >     >>>>>>>>>> and explain why we choose the slot sharing group.
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> Please find more details in the FLIP wiki document [1].
>> >     >> Looking
>> >     >>>>>>>>>> forward to your feedback.
>> >     >>>>>>>>>>
>> >     >>>>>>>>>> [1]
>> >     >>>>>>>>>>
>> >     >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>> >     <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>> >
>> >     >>>>>>>>>> Best,
>> >     >>>>>>>>>> Yangze Guo
>> >     >>>>>>>>>>
>> >     >>>>>>>>
>> >
>>
>>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

I think Chesnay's proposal could actually work. IIUC, the keypoint is to
derive operator requirements from SSG requirements on the API side, so that
the runtime only deals with operator requirements. It's debatable how the
deriving should be done though. E.g., an alternative could be to evenly
divide the SSG requirement into requirements of operators in the group.


However, I'm not entirely sure which option is more desired. Illustrating
my understanding in the following figure, in which on the top is
Chesnay's proposal and on the bottom is the SSG-based proposal in this FLIP.


[image: FGRuntimeInterface.png]


I think the major difference between the two approaches is where deriving
operator requirements from SSG requirements happens.

- Chesnay's proposal simplifies the runtime logic and the interface to
expose, at the price of moving more complexity (i.e. the deriving) to the
API side. The question is, where do we prefer to keep the complexity? I'm
slightly leaning towards having a thin API and keep the complexity in
runtime if possible.

- Notice that the dash line arrows represent optional steps that are needed
only for schedulers that do not respect SSGs, which we don't have at the
moment. If we only look at the solid line arrows, then the SSG-based
approach is much simpler, without needing to derive and aggregate the
requirements back and forth. I'm not sure about complicating the current
design only for the potential future needs.


Thank you~

Xintong Song




On Fri, Jan 22, 2021 at 7:35 AM Chesnay Schepler <ch...@apache.org> wrote:

> You're raising a good point, but I think I can rectify that with a minor
> adjustment.
>
> Default requirements are whatever the default requirements are, setting
> the requirements for one operator has no effect on other operators.
>
> With these rules, and some API enhancements, the following mockup would
> replicate the SSG-based behavior:
>
> Map<SlotSharingGroupId, Requirements> requirements = ...
> for slotSharingGroup in env.getSlotSharingGroups() {
>      vertices = slotSharingGroup.getVertices()
> vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
> vertices.remainint().setRequirements(ZERO)
> }
>
> We could even allow setting requirements on slotsharing-groups
> colocation-groups and internally translate them accordingly.
> I can't help but feel this is a plain API issue.
>
> On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> > If I understand you correctly Chesnay, then you want to decouple the
> > resource requirement specification from the slot sharing group
> > assignment. Hence, per default all operators would be in the same slot
> > sharing group. If there is no operator with a resource specification,
> > then the system would allocate a default slot for it. If there is at
> > least one operator, then the system would sum up all the specified
> > resources and allocate a slot of this size. This effectively means
> > that all unspecified operators will implicitly have a zero resource
> > requirement. Did I understand your idea correctly?
> >
> > I am wondering whether this wouldn't lead to a surprising behaviour
> > for the user. If the user specifies the resource requirements for a
> > single operator, then he probably will assume that the other operators
> > will get the default share of resources and not nothing.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <chesnay@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Is there even a functional difference between specifying the
> >     requirements for an SSG vs specifying the same requirements on a
> >     single
> >     operator within that group (ideally a colocation group to avoid this
> >     whole hint business)?
> >
> >     Wouldn't we get the best of both worlds in the latter case?
> >
> >     Users can take shortcuts to define shared requirements,
> >     but refine them further as needed on a per-operator basis,
> >     without changing semantics of slotsharing groups
> >     nor the runtime being locked into SSG-based requirements.
> >
> >     (And before anyone argues what happens if slotsharing groups
> >     change or
> >     whatnot, that's a plain API issue that we could surely solve. (A
> >     plain
> >     iteration over slotsharing groups and therein contained operators
> >     would
> >     suffice)).
> >
> >     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> >     > Maybe a different minor idea: Would it be possible to treat the SSG
> >     > resource requirements as a hint for the runtime similar to how
> >     slot sharing
> >     > groups are designed at the moment? Meaning that we don't give
> >     the guarantee
> >     > that Flink will always deploy this set of tasks together no
> >     matter what
> >     > comes. If, for example, the runtime can derive by some means the
> >     resource
> >     > requirements for each task based on the requirements for the
> >     SSG, this
> >     > could be possible. One easy strategy would be to give every task
> >     the same
> >     > resources as the whole slot sharing group. Another one could be
> >     > distributing the resources equally among the tasks. This does
> >     not even have
> >     > to be implemented but we would give ourselves the freedom to change
> >     > scheduling if need should arise.
> >     >
> >     > Cheers,
> >     > Till
> >     >
> >     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karmagyz@gmail.com
> >     <ma...@gmail.com>> wrote:
> >     >
> >     >> Thanks for the responses, Till and Xintong.
> >     >>
> >     >> I second Xintong's comment that SSG-based runtime interface
> >     will give
> >     >> us the flexibility to achieve op/task-based approach. That's one
> of
> >     >> the most important reasons for our design choice.
> >     >>
> >     >> Some cents regarding the default operator resource:
> >     >> - It might be good for the scenario of DataStream jobs.
> >     >>     ** For light-weight operators, the accumulative
> >     configuration error
> >     >> will not be significant. Then, the resource of a task used is
> >     >> proportional to the number of operators it contains.
> >     >>     ** For heavy operators like join and window or operators
> >     using the
> >     >> external resources, user will turn to the fine-grained resource
> >     >> configuration.
> >     >> - It can increase the stability for the standalone cluster
> >     where task
> >     >> executors registered are heterogeneous(with different default slot
> >     >> resources).
> >     >> - It might not be good for SQL users. The operators that SQL
> >     will be
> >     >> transferred to is a black box to the user. We also do not
> guarantee
> >     >> the cross-version of consistency of the transformation so far.
> >     >>
> >     >> I think it can be treated as a follow-up work when the
> fine-grained
> >     >> resource management is end-to-end ready.
> >     >>
> >     >> Best,
> >     >> Yangze Guo
> >     >>
> >     >>
> >     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
> >     <tonysong820@gmail.com <ma...@gmail.com>>
> >     >> wrote:
> >     >>> Thanks for the feedback, Till.
> >     >>>
> >     >>> ## I feel that what you proposed (operator-based + default
> >     value) might
> >     >> be
> >     >>> subsumed by the SSG-based approach.
> >     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
> >     categorized by
> >     >>> whether the resource requirements are known to the users.
> >     >>>
> >     >>>     1. *Both known.* As previously mentioned, there's no
> >     reason to put
> >     >>>     multiple operators whose individual resource requirements
> >     are already
> >     >> known
> >     >>>     into the same group in fine-grained resource management.
> >     And if op_1
> >     >> and
> >     >>>     op_2 are in different groups, there should be no problem
> >     switching
> >     >> data
> >     >>>     exchange mode from pipelined to blocking. This is
> >     equivalent to
> >     >> specifying
> >     >>>     operator resource requirements in your proposal.
> >     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that
> >     op_2 is in a
> >     >>>     SSG whose resource is not specified thus would have the
> >     default slot
> >     >>>     resource. This is equivalent to having default operator
> >     resources in
> >     >> your
> >     >>>     proposal.
> >     >>>     3. *Both unknown*. The user can either set op_1 and op_2
> >     to the same
> >     >> SSG
> >     >>>     or separate SSGs.
> >     >>>        - If op_1 and op_2 are in the same SSG, it will be
> >     equivalent to
> >     >> the
> >     >>>        coarse-grained resource management, where op_1 and op_2
> >     share a
> >     >> default
> >     >>>        size slot no matter which data exchange mode is used.
> >     >>>        - If op_1 and op_2 are in different SSGs, then each of
> >     them will
> >     >> use
> >     >>>        a default size slot. This is equivalent to setting them
> >     with
> >     >> default
> >     >>>        operator resources in your proposal.
> >     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is
> >     known.*
> >     >>>        - It is possible that the user learns the total / max
> >     resource
> >     >>>        requirement from executing and monitoring the job,
> >     while not
> >     >>> being aware of
> >     >>>        individual operator requirements.
> >     >>>        - I believe this is the case your proposal does not
> >     cover. And TBH,
> >     >>>        this is probably how most users learn the resource
> >     requirements,
> >     >>> according
> >     >>>        to my experiences.
> >     >>>        - In this case, the user might need to specify
> >     different resources
> >     >> if
> >     >>>        he wants to switch the execution mode, which should not
> >     be worse
> >     >> than not
> >     >>>        being able to use fine-grained resource management.
> >     >>>
> >     >>>
> >     >>> ## An additional idea inspired by your proposal.
> >     >>> We may provide multiple options for deciding resources for
> >     SSGs whose
> >     >>> requirement is not specified, if needed.
> >     >>>
> >     >>>     - Default slot resource (current design)
> >     >>>     - Default operator resource times number of operators
> >     (equivalent to
> >     >>>     your proposal)
> >     >>>
> >     >>>
> >     >>> ## Exposing internal runtime strategies
> >     >>> Theoretically, yes. Tying to the SSGs, the resource
> >     requirements might be
> >     >>> affected if how SSGs are internally handled changes in future.
> >     >> Practically,
> >     >>> I do not concretely see at the moment what kind of changes we
> >     may want in
> >     >>> future that might conflict with this FLIP proposal, as the
> >     question of
> >     >>> switching data exchange mode answered above. I'd suggest to
> >     not give up
> >     >> the
> >     >>> user friendliness we may gain now for the future problems that
> >     may or may
> >     >>> not exist.
> >     >>>
> >     >>> Moreover, the SSG-based approach has the flexibility to
> >     achieve the
> >     >>> equivalent behavior as the operator-based approach, if we set
> each
> >     >> operator
> >     >>> (or task) to a separate SSG. We can even provide a shortcut
> >     option to
> >     >>> automatically do that for users, if needed.
> >     >>>
> >     >>>
> >     >>> Thank you~
> >     >>>
> >     >>> Xintong Song
> >     >>>
> >     >>>
> >     >>>
> >     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
> >     <trohrmann@apache.org <ma...@apache.org>>
> >     >> wrote:
> >     >>>> Thanks for the responses Xintong and Stephan,
> >     >>>>
> >     >>>> I agree that being able to define the resource requirements for
> a
> >     >> group of
> >     >>>> operators is more user friendly. However, my concern is that
> >     we are
> >     >>>> exposing thereby internal runtime strategies which might
> >     limit our
> >     >>>> flexibility to execute a given job. Moreover, the semantics of
> >     >> configuring
> >     >>>> resource requirements for SSGs could break if switching from
> >     streaming
> >     >> to
> >     >>>> batch execution. If one defines the resource requirements for
> >     op_1 ->
> >     >> op_2
> >     >>>> which run in pipelined mode when using the streaming
> >     execution, then
> >     >> how do
> >     >>>> we interpret these requirements when op_1 -> op_2 are
> >     executed with a
> >     >>>> blocking data exchange in batch execution mode? Consequently,
> >     I am
> >     >> still
> >     >>>> leaning towards Stephan's proposal to set the resource
> >     requirements per
> >     >>>> operator.
> >     >>>>
> >     >>>> Maybe the following proposal makes the configuration easier:
> >     If the
> >     >> user
> >     >>>> wants to use fine-grained resource requirements, then she
> >     needs to
> >     >> specify
> >     >>>> the default size which is used for operators which have no
> >     explicit
> >     >>>> resource annotation. If this holds true, then every operator
> >     would
> >     >> have a
> >     >>>> resource requirement and the system can try to execute the
> >     operators
> >     >> in the
> >     >>>> best possible manner w/o being constrained by how the user
> >     set the SSG
> >     >>>> requirements.
> >     >>>>
> >     >>>> Cheers,
> >     >>>> Till
> >     >>>>
> >     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
> >     <tonysong820@gmail.com <ma...@gmail.com>>
> >     >>>> wrote:
> >     >>>>
> >     >>>>> Thanks for the feedback, Stephan.
> >     >>>>>
> >     >>>>> Actually, your proposal has also come to my mind at some
> >     point. And I
> >     >>>> have
> >     >>>>> some concerns about it.
> >     >>>>>
> >     >>>>>
> >     >>>>> 1. It does not give users the same control as the SSG-based
> >     approach.
> >     >>>>>
> >     >>>>>
> >     >>>>> While both approaches do not require specifying for each
> >     operator,
> >     >>>>> SSG-based approach supports the semantic that "some operators
> >     >> together
> >     >>>> use
> >     >>>>> this much resource" while the operator-based approach doesn't.
> >     >>>>>
> >     >>>>>
> >     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
> >     o_m), and
> >     >> at
> >     >>>> some
> >     >>>>> point there's an agg o_n (1 < n < m) which significantly
> >     reduces the
> >     >> data
> >     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
> >     (o_1, ...,
> >     >> o_n)
> >     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> >     >> parallelisms
> >     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
> >     lead to too
> >     >> much
> >     >>>>> wasting of resources. If the two SSGs end up needing different
> >     >> resources,
> >     >>>>> with the SSG-based approach one can directly specify
> >     resources for
> >     >> the
> >     >>>> two
> >     >>>>> groups. However, with the operator-based approach, the user
> will
> >     >> have to
> >     >>>>> specify resources for each operator in one of the two
> >     groups, and
> >     >> tune
> >     >>>> the
> >     >>>>> default slot resource via configurations to fit the other
> group.
> >     >>>>>
> >     >>>>>
> >     >>>>> 2. It increases the chance of breaking operator chains.
> >     >>>>>
> >     >>>>>
> >     >>>>> Setting chainnable operators into different slot sharing
> >     groups will
> >     >>>>> prevent them from being chained. In the current implementation,
> >     >>>> downstream
> >     >>>>> operators, if SSG not explicitly specified, will be set to
> >     the same
> >     >> group
> >     >>>>> as the chainable upstream operators (unless multiple upstream
> >     >> operators
> >     >>>> in
> >     >>>>> different groups), to reduce the chance of breaking chains.
> >     >>>>>
> >     >>>>>
> >     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
> >     deciding
> >     >> SSGs
> >     >>>>> based on whether resource is specified we will easily get
> >     groups like
> >     >>>> (o_1,
> >     >>>>> o_3) & (o_2, o_4), where none of the operators can be
> >     chained. This
> >     >> is
> >     >>>> also
> >     >>>>> possible for the SSG-based approach, but I believe the
> >     chance is much
> >     >>>>> smaller because there's no strong reason for users to
> >     specify the
> >     >> groups
> >     >>>>> with alternate operators like that. We are more likely to
> >     get groups
> >     >> like
> >     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between
> >     o_2 and
> >     >> o_3.
> >     >>>>>
> >     >>>>> 3. It complicates the system by having two different
> >     mechanisms for
> >     >>>> sharing
> >     >>>>> managed memory in  a slot.
> >     >>>>>
> >     >>>>>
> >     >>>>> - In FLIP-141, we introduced the intra-slot managed memory
> >     sharing
> >     >>>>> mechanism, where managed memory is first distributed
> >     according to the
> >     >>>>> consumer type, then further distributed across operators of
> that
> >     >> consumer
> >     >>>>> type.
> >     >>>>>
> >     >>>>> - With the operator-based approach, managed memory size
> >     specified
> >     >> for an
> >     >>>>> operator should account for all the consumer types of that
> >     operator.
> >     >> That
> >     >>>>> means the managed memory is first distributed across
> >     operators, then
> >     >>>>> distributed to different consumer types of each operator.
> >     >>>>>
> >     >>>>>
> >     >>>>> Unfortunately, the different order of the two calculation
> >     steps can
> >     >> lead
> >     >>>> to
> >     >>>>> different results. To be specific, the semantic of the
> >     configuration
> >     >>>> option
> >     >>>>> `consumer-weights` changed (within a slot vs. within an
> >     operator).
> >     >>>>>
> >     >>>>>
> >     >>>>>
> >     >>>>> To sum up things:
> >     >>>>>
> >     >>>>> While (3) might be a bit more implementation related, I
> >     think (1)
> >     >> and (2)
> >     >>>>> somehow suggest that, the price for the proposed approach to
> >     avoid
> >     >>>>> specifying resource for every operator is that it's not as
> >     >> independent
> >     >>>> from
> >     >>>>> operator chaining and slot sharing as the operator-based
> >     approach
> >     >>>> discussed
> >     >>>>> in the FLIP.
> >     >>>>>
> >     >>>>>
> >     >>>>> Thank you~
> >     >>>>>
> >     >>>>> Xintong Song
> >     >>>>>
> >     >>>>>
> >     >>>>>
> >     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
> >     <sewen@apache.org <ma...@apache.org>>
> >     >> wrote:
> >     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> >     >>>>>>
> >     >>>>>> I want to say, first of all, that this is super well
> >     written. And
> >     >> the
> >     >>>>>> points that the FLIP makes about how to expose the
> >     configuration to
> >     >>>> users
> >     >>>>>> is exactly the right thing to figure out first.
> >     >>>>>> So good job here!
> >     >>>>>>
> >     >>>>>> About how to let users specify the resource profiles. If I
> >     can sum
> >     >> the
> >     >>>>> FLIP
> >     >>>>>> and previous discussion up in my own words, the problem is the
> >     >>>> following:
> >     >>>>>> Operator-level specification is the simplest and cleanest
> >     approach,
> >     >>>>> because
> >     >>>>>>> it avoids mixing operator configuration (resource) and
> >     >> scheduling. No
> >     >>>>>>> matter what other parameters change (chaining, slot sharing,
> >     >>>> switching
> >     >>>>>>> pipelined and blocking shuffles), the resource profiles
> >     stay the
> >     >>>> same.
> >     >>>>>>> But it would require that a user specifies resources on all
> >     >>>> operators,
> >     >>>>>>> which makes it hard to use. That's why the FLIP suggests
> going
> >     >> with
> >     >>>>>>> specifying resources on a Sharing-Group.
> >     >>>>>>
> >     >>>>>> I think both thoughts are important, so can we find a solution
> >     >> where
> >     >>>> the
> >     >>>>>> Resource Profiles are specified on an Operator, but we
> >     still avoid
> >     >> that
> >     >>>>> we
> >     >>>>>> need to specify a resource profile on every operator?
> >     >>>>>>
> >     >>>>>> What do you think about something like the following:
> >     >>>>>>    - Resource Profiles are specified on an operator level.
> >     >>>>>>    - Not all operators need profiles
> >     >>>>>>    - All Operators without a Resource Profile ended up in the
> >     >> default
> >     >>>> slot
> >     >>>>>> sharing group with a default profile (will get a default
> slot).
> >     >>>>>>    - All Operators with a Resource Profile will go into
> >     another slot
> >     >>>>> sharing
> >     >>>>>> group (the resource-specified-group).
> >     >>>>>>    - Users can define different slot sharing groups for
> >     operators
> >     >> like
> >     >>>>> they
> >     >>>>>> do now, with the exception that you cannot mix operators
> >     that have
> >     >> a
> >     >>>>>> resource profile and operators that have no resource profile.
> >     >>>>>>    - The default case where no operator has a resource
> >     profile is
> >     >> just a
> >     >>>>>> special case of this model
> >     >>>>>>    - The chaining logic sums up the profiles per operator,
> >     like it
> >     >> does
> >     >>>>> now,
> >     >>>>>> and the scheduler sums up the profiles of the tasks that it
> >     >> schedules
> >     >>>>>> together.
> >     >>>>>>
> >     >>>>>>
> >     >>>>>> There is another question about reactive scaling raised in the
> >     >> FLIP. I
> >     >>>>> need
> >     >>>>>> to think a bit about that. That is indeed a bit more tricky
> >     once we
> >     >>>> have
> >     >>>>>> slots of different sizes.
> >     >>>>>> It is not clear then which of the different slot requests the
> >     >>>>>> ResourceManager should fulfill when new resources (TMs)
> >     show up,
> >     >> or how
> >     >>>>> the
> >     >>>>>> JobManager redistributes the slots resources when resources
> >     (TMs)
> >     >>>>> disappear
> >     >>>>>> This question is pretty orthogonal, though, to the "how to
> >     specify
> >     >> the
> >     >>>>>> resources".
> >     >>>>>>
> >     >>>>>>
> >     >>>>>> Best,
> >     >>>>>> Stephan
> >     >>>>>>
> >     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
> >     <tonysong820@gmail.com <ma...@gmail.com>
> >     >>>>> wrote:
> >     >>>>>>> Thanks for drafting the FLIP and driving the discussion,
> >     Yangze.
> >     >>>>>>> And Thanks for the feedback, Till and Chesnay.
> >     >>>>>>>
> >     >>>>>>> @Till,
> >     >>>>>>>
> >     >>>>>>> I agree that specifying requirements for SSGs means that SSGs
> >     >> need to
> >     >>>>> be
> >     >>>>>>> supported in fine-grained resource management, otherwise each
> >     >>>> operator
> >     >>>>>>> might use as many resources as the whole group. However, I
> >     cannot
> >     >>>> think
> >     >>>>>> of
> >     >>>>>>> a strong reason for not supporting SSGs in fine-grained
> >     resource
> >     >>>>>>> management.
> >     >>>>>>>
> >     >>>>>>>
> >     >>>>>>>> Interestingly, if all operators have their resources
> properly
> >     >>>>>> specified,
> >     >>>>>>>> then slot sharing is no longer needed because Flink could
> >     >> slice off
> >     >>>>> the
> >     >>>>>>>> appropriately sized slots for every Task individually.
> >     >>>>>>>>
> >     >>>>>>> So for example, if we have a job consisting of two
> >     operator op_1
> >     >> and
> >     >>>>> op_2
> >     >>>>>>>> where each op needs 100 MB of memory, we would then say that
> >     >> the
> >     >>>> slot
> >     >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
> >     >> cluster
> >     >>>>> with
> >     >>>>>> 2
> >     >>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
> >     >> this
> >     >>>>> job.
> >     >>>>>> If
> >     >>>>>>>> the resources were specified on an operator level, then the
> >     >> system
> >     >>>>>> could
> >     >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
> >     >> TM_2.
> >     >>>>>>>
> >     >>>>>>> Couldn't agree more that if all operators' requirements are
> >     >> properly
> >     >>>>>>> specified, slot sharing should be no longer needed. I
> >     think this
> >     >>>>> exactly
> >     >>>>>>> disproves the example. If we already know op_1 and op_2 each
> >     >> needs
> >     >>>> 100
> >     >>>>> MB
> >     >>>>>>> of memory, why would we put them in the same group? If
> >     they are
> >     >> in
> >     >>>>>> separate
> >     >>>>>>> groups, with the proposed approach the system can freely
> >     deploy
> >     >> them
> >     >>>> to
> >     >>>>>>> either a 200 MB TM or two 100 MB TMs.
> >     >>>>>>>
> >     >>>>>>> Moreover, the precondition for not needing slot sharing is
> >     having
> >     >>>>>> resource
> >     >>>>>>> requirements properly specified for all operators. This is
> not
> >     >> always
> >     >>>>>>> possible, and usually requires tremendous efforts. One of the
> >     >>>> benefits
> >     >>>>>> for
> >     >>>>>>> SSG-based requirements is that it allows the user to freely
> >     >> decide
> >     >>>> the
> >     >>>>>>> granularity, thus efforts they want to pay. I would
> >     consider SSG
> >     >> in
> >     >>>>>>> fine-grained resource management as a group of operators
> >     that the
> >     >>>> user
> >     >>>>>>> would like to specify the total resource for. There can be
> >     only
> >     >> one
> >     >>>>> group
> >     >>>>>>> in the job, 2~3 groups dividing the job into a few major
> >     parts,
> >     >> or as
> >     >>>>>> many
> >     >>>>>>> groups as the number of tasks/operators, depending on how
> >     >>>> fine-grained
> >     >>>>>> the
> >     >>>>>>> user is able to specify the resources.
> >     >>>>>>>
> >     >>>>>>> Having to support SSGs might be a constraint. But given
> >     that all
> >     >> the
> >     >>>>>>> current scheduler implementations already support SSGs, I
> >     tend to
> >     >>>> think
> >     >>>>>>> that as an acceptable price for the above discussed
> >     usability and
> >     >>>>>>> flexibility.
> >     >>>>>>>
> >     >>>>>>> @Chesnay
> >     >>>>>>>
> >     >>>>>>> Will declaring them on slot sharing groups not also waste
> >     >> resources
> >     >>>> if
> >     >>>>>> the
> >     >>>>>>>> parallelism of operators within that group are different?
> >     >>>>>>>>
> >     >>>>>>> Yes. It's a trade-off between usability and resource
> >     >> utilization. To
> >     >>>>>> avoid
> >     >>>>>>> such wasting, the user can define more groups, so that
> >     each group
> >     >>>>>> contains
> >     >>>>>>> less operators and the chance of having operators with
> >     different
> >     >>>>>>> parallelism will be reduced. The price is to have more
> >     resource
> >     >>>>>>> requirements to specify.
> >     >>>>>>>
> >     >>>>>>> It also seems like quite a hassle for users having to
> >     >> recalculate the
> >     >>>>>>>> resource requirements if they change the slot sharing.
> >     >>>>>>>> I'd think that it's not really workable for users that
> create
> >     >> a set
> >     >>>>> of
> >     >>>>>>>> re-usable operators which are mixed and matched in their
> >     >>>>> applications;
> >     >>>>>>>> managing the resources requirements in such a setting
> >     would be
> >     >> a
> >     >>>>>>>> nightmare, and in the end would require operator-level
> >     >> requirements
> >     >>>>> any
> >     >>>>>>>> way.
> >     >>>>>>>> In that sense, I'm not even sure whether it really increases
> >     >>>>> usability.
> >     >>>>>>>     - As mentioned in my reply to Till's comment, there's no
> >     >> reason to
> >     >>>>> put
> >     >>>>>>>     multiple operators whose individual resource
> >     requirements are
> >     >>>>> already
> >     >>>>>>> known
> >     >>>>>>>     into the same group in fine-grained resource management.
> >     >>>>>>>     - Even an operator implementation is reused for multiple
> >     >>>>> applications,
> >     >>>>>>>     it does not guarantee the same resource requirements.
> >     During
> >     >> our
> >     >>>>> years
> >     >>>>>>> of
> >     >>>>>>>     practices in Alibaba, with per-operator requirements
> >     >> specified for
> >     >>>>>>> Blink's
> >     >>>>>>>     fine-grained resource management, very few users
> >     (including
> >     >> our
> >     >>>>>>> specialists
> >     >>>>>>>     who are dedicated to supporting Blink users) are as
> >     >> experienced as
> >     >>>>> to
> >     >>>>>>>     accurately predict/estimate the operator resource
> >     >> requirements.
> >     >>>> Most
> >     >>>>>>> people
> >     >>>>>>>     rely on the execution-time metrics (throughput, delay,
> cpu
> >     >> load,
> >     >>>>>> memory
> >     >>>>>>>     usage, GC pressure, etc.) to improve the specification.
> >     >>>>>>>
> >     >>>>>>> To sum up:
> >     >>>>>>> If the user is capable of providing proper resource
> >     requirements
> >     >> for
> >     >>>>>> every
> >     >>>>>>> operator, that's definitely a good thing and we would not
> >     need to
> >     >>>> rely
> >     >>>>> on
> >     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> >     >> fine-grained
> >     >>>>>> resource
> >     >>>>>>> management to work. For those users who are capable and do
> not
> >     >> like
> >     >>>>>> having
> >     >>>>>>> to set each operator to a separate SSG, I would be ok to have
> >     >> both
> >     >>>>>>> SSG-based and operator-based runtime interfaces and to only
> >     >> fallback
> >     >>>> to
> >     >>>>>> the
> >     >>>>>>> SSG requirements when the operator requirements are not
> >     >> specified.
> >     >>>>>> However,
> >     >>>>>>> as the first step, I think we should prioritise the use cases
> >     >> where
> >     >>>>> users
> >     >>>>>>> are not that experienced.
> >     >>>>>>>
> >     >>>>>>> Thank you~
> >     >>>>>>>
> >     >>>>>>> Xintong Song
> >     >>>>>>>
> >     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> >     >> chesnay@apache.org <ma...@apache.org>>
> >     >>>>>>> wrote:
> >     >>>>>>>
> >     >>>>>>>> Will declaring them on slot sharing groups not also waste
> >     >> resources
> >     >>>>> if
> >     >>>>>>>> the parallelism of operators within that group are
> different?
> >     >>>>>>>>
> >     >>>>>>>> It also seems like quite a hassle for users having to
> >     >> recalculate
> >     >>>> the
> >     >>>>>>>> resource requirements if they change the slot sharing.
> >     >>>>>>>> I'd think that it's not really workable for users that
> create
> >     >> a set
> >     >>>>> of
> >     >>>>>>>> re-usable operators which are mixed and matched in their
> >     >>>>> applications;
> >     >>>>>>>> managing the resources requirements in such a setting
> >     would be
> >     >> a
> >     >>>>>>>> nightmare, and in the end would require operator-level
> >     >> requirements
> >     >>>>> any
> >     >>>>>>>> way.
> >     >>>>>>>> In that sense, I'm not even sure whether it really increases
> >     >>>>> usability.
> >     >>>>>>>> My main worry is that it if we wire the runtime to work
> >     on SSGs
> >     >>>> it's
> >     >>>>>>>> gonna be difficult to implement more fine-grained
> approaches,
> >     >> which
> >     >>>>>>>> would not be the case if, for the runtime, they are always
> >     >> defined
> >     >>>> on
> >     >>>>>> an
> >     >>>>>>>> operator-level.
> >     >>>>>>>>
> >     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> >     >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
> >     >>>> Yangze.
> >     >>>>>>>>> I like that defining resource requirements on a slot
> sharing
> >     >>>> group
> >     >>>>>>> makes
> >     >>>>>>>>> the overall setup easier and improves usability of resource
> >     >>>>>>> requirements.
> >     >>>>>>>>> What I do not like about it is that it changes slot sharing
> >     >>>> groups
> >     >>>>>> from
> >     >>>>>>>>> being a scheduling hint to something which needs to be
> >     >> supported
> >     >>>> in
> >     >>>>>>> order
> >     >>>>>>>>> to support fine grained resource requirements. So far, the
> >     >> idea
> >     >>>> of
> >     >>>>>> slot
> >     >>>>>>>>> sharing groups was that it tells the system that a set of
> >     >>>> operators
> >     >>>>>> can
> >     >>>>>>>> be
> >     >>>>>>>>> deployed in the same slot. But the system still had the
> >     >> freedom
> >     >>>> to
> >     >>>>>> say
> >     >>>>>>>> that
> >     >>>>>>>>> it would rather place these tasks in different slots if it
> >     >>>> wanted.
> >     >>>>> If
> >     >>>>>>> we
> >     >>>>>>>>> now specify resource requirements on a per slot sharing
> >     >> group,
> >     >>>> then
> >     >>>>>> the
> >     >>>>>>>>> only option for a scheduler which does not support slot
> >     >> sharing
> >     >>>>>> groups
> >     >>>>>>> is
> >     >>>>>>>>> to say that every operator in this slot sharing group
> >     needs a
> >     >>>> slot
> >     >>>>>> with
> >     >>>>>>>> the
> >     >>>>>>>>> same resources as the whole group.
> >     >>>>>>>>>
> >     >>>>>>>>> So for example, if we have a job consisting of two operator
> >     >> op_1
> >     >>>>> and
> >     >>>>>>> op_2
> >     >>>>>>>>> where each op needs 100 MB of memory, we would then say
> that
> >     >> the
> >     >>>>> slot
> >     >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
> >     >> cluster
> >     >>>>>> with
> >     >>>>>>> 2
> >     >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot
> run
> >     >> this
> >     >>>>>> job.
> >     >>>>>>> If
> >     >>>>>>>>> the resources were specified on an operator level, then the
> >     >>>> system
> >     >>>>>>> could
> >     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
> >     >> TM_2.
> >     >>>>>>>>> Originally, one of the primary goals of slot sharing groups
> >     >> was
> >     >>>> to
> >     >>>>>> make
> >     >>>>>>>> it
> >     >>>>>>>>> easier for the user to reason about how many slots a job
> >     >> needs
> >     >>>>>>>> independent
> >     >>>>>>>>> of the actual number of operators in the job.
> Interestingly,
> >     >> if
> >     >>>> all
> >     >>>>>>>>> operators have their resources properly specified, then
> slot
> >     >>>>> sharing
> >     >>>>>> is
> >     >>>>>>>> no
> >     >>>>>>>>> longer needed because Flink could slice off the
> >     appropriately
> >     >>>> sized
> >     >>>>>>> slots
> >     >>>>>>>>> for every Task individually. What matters is whether the
> >     >> whole
> >     >>>>>> cluster
> >     >>>>>>>> has
> >     >>>>>>>>> enough resources to run all tasks or not.
> >     >>>>>>>>>
> >     >>>>>>>>> Cheers,
> >     >>>>>>>>> Till
> >     >>>>>>>>>
> >     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> >     >> karmagyz@gmail.com <ma...@gmail.com>>
> >     >>>>>> wrote:
> >     >>>>>>>>>> Hi, there,
> >     >>>>>>>>>>
> >     >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
> >     >> Runtime
> >     >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
> >     >> where we
> >     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
> >     >> for
> >     >>>>>>>>>> specifying fine-grained resource requirements.
> >     >>>>>>>>>>
> >     >>>>>>>>>> In this FLIP:
> >     >>>>>>>>>> - Expound the user story of fine-grained resource
> >     >> management.
> >     >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
> >     >> resource
> >     >>>>>>>>>> requirements.
> >     >>>>>>>>>> - Discuss the pros and cons of the three potential
> >     >> granularities
> >     >>>>> for
> >     >>>>>>>>>> specifying the resource requirements (op, task and slot
> >     >> sharing
> >     >>>>>> group)
> >     >>>>>>>>>> and explain why we choose the slot sharing group.
> >     >>>>>>>>>>
> >     >>>>>>>>>> Please find more details in the FLIP wiki document [1].
> >     >> Looking
> >     >>>>>>>>>> forward to your feedback.
> >     >>>>>>>>>>
> >     >>>>>>>>>> [1]
> >     >>>>>>>>>>
> >     >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >     <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >
> >     >>>>>>>>>> Best,
> >     >>>>>>>>>> Yangze Guo
> >     >>>>>>>>>>
> >     >>>>>>>>
> >
>
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Chesnay Schepler <ch...@apache.org>.

You're raising a good point, but I think I can rectify that with a minor 
adjustment.

Default requirements are whatever the default requirements are, setting 
the requirements for one operator has no effect on other operators.

With these rules, and some API enhancements, the following mockup would 
replicate the SSG-based behavior:

Map<SlotSharingGroupId, Requirements> requirements = ...
for slotSharingGroup in env.getSlotSharingGroups() {
     vertices = slotSharingGroup.getVertices()
vertices.first().setRequirements(requirements.get(slotSharingGroup.getID())
vertices.remainint().setRequirements(ZERO)
}

We could even allow setting requirements on slotsharing-groups 
colocation-groups and internally translate them accordingly.
I can't help but feel this is a plain API issue.

On 1/21/2021 9:44 AM, Till Rohrmann wrote:
> If I understand you correctly Chesnay, then you want to decouple the 
> resource requirement specification from the slot sharing group 
> assignment. Hence, per default all operators would be in the same slot 
> sharing group. If there is no operator with a resource specification, 
> then the system would allocate a default slot for it. If there is at 
> least one operator, then the system would sum up all the specified 
> resources and allocate a slot of this size. This effectively means 
> that all unspecified operators will implicitly have a zero resource 
> requirement. Did I understand your idea correctly?
>
> I am wondering whether this wouldn't lead to a surprising behaviour 
> for the user. If the user specifies the resource requirements for a 
> single operator, then he probably will assume that the other operators 
> will get the default share of resources and not nothing.
>
> Cheers,
> Till
>
> On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <chesnay@apache.org 
> <ma...@apache.org>> wrote:
>
>     Is there even a functional difference between specifying the
>     requirements for an SSG vs specifying the same requirements on a
>     single
>     operator within that group (ideally a colocation group to avoid this
>     whole hint business)?
>
>     Wouldn't we get the best of both worlds in the latter case?
>
>     Users can take shortcuts to define shared requirements,
>     but refine them further as needed on a per-operator basis,
>     without changing semantics of slotsharing groups
>     nor the runtime being locked into SSG-based requirements.
>
>     (And before anyone argues what happens if slotsharing groups
>     change or
>     whatnot, that's a plain API issue that we could surely solve. (A
>     plain
>     iteration over slotsharing groups and therein contained operators
>     would
>     suffice)).
>
>     On 1/20/2021 6:48 PM, Till Rohrmann wrote:
>     > Maybe a different minor idea: Would it be possible to treat the SSG
>     > resource requirements as a hint for the runtime similar to how
>     slot sharing
>     > groups are designed at the moment? Meaning that we don't give
>     the guarantee
>     > that Flink will always deploy this set of tasks together no
>     matter what
>     > comes. If, for example, the runtime can derive by some means the
>     resource
>     > requirements for each task based on the requirements for the
>     SSG, this
>     > could be possible. One easy strategy would be to give every task
>     the same
>     > resources as the whole slot sharing group. Another one could be
>     > distributing the resources equally among the tasks. This does
>     not even have
>     > to be implemented but we would give ourselves the freedom to change
>     > scheduling if need should arise.
>     >
>     > Cheers,
>     > Till
>     >
>     > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <karmagyz@gmail.com
>     <ma...@gmail.com>> wrote:
>     >
>     >> Thanks for the responses, Till and Xintong.
>     >>
>     >> I second Xintong's comment that SSG-based runtime interface
>     will give
>     >> us the flexibility to achieve op/task-based approach. That's one of
>     >> the most important reasons for our design choice.
>     >>
>     >> Some cents regarding the default operator resource:
>     >> - It might be good for the scenario of DataStream jobs.
>     >>     ** For light-weight operators, the accumulative
>     configuration error
>     >> will not be significant. Then, the resource of a task used is
>     >> proportional to the number of operators it contains.
>     >>     ** For heavy operators like join and window or operators
>     using the
>     >> external resources, user will turn to the fine-grained resource
>     >> configuration.
>     >> - It can increase the stability for the standalone cluster
>     where task
>     >> executors registered are heterogeneous(with different default slot
>     >> resources).
>     >> - It might not be good for SQL users. The operators that SQL
>     will be
>     >> transferred to is a black box to the user. We also do not guarantee
>     >> the cross-version of consistency of the transformation so far.
>     >>
>     >> I think it can be treated as a follow-up work when the fine-grained
>     >> resource management is end-to-end ready.
>     >>
>     >> Best,
>     >> Yangze Guo
>     >>
>     >>
>     >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song
>     <tonysong820@gmail.com <ma...@gmail.com>>
>     >> wrote:
>     >>> Thanks for the feedback, Till.
>     >>>
>     >>> ## I feel that what you proposed (operator-based + default
>     value) might
>     >> be
>     >>> subsumed by the SSG-based approach.
>     >>> Thinking of op_1 -> op_2, there are the following 4 cases,
>     categorized by
>     >>> whether the resource requirements are known to the users.
>     >>>
>     >>>     1. *Both known.* As previously mentioned, there's no
>     reason to put
>     >>>     multiple operators whose individual resource requirements
>     are already
>     >> known
>     >>>     into the same group in fine-grained resource management.
>     And if op_1
>     >> and
>     >>>     op_2 are in different groups, there should be no problem
>     switching
>     >> data
>     >>>     exchange mode from pipelined to blocking. This is
>     equivalent to
>     >> specifying
>     >>>     operator resource requirements in your proposal.
>     >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that
>     op_2 is in a
>     >>>     SSG whose resource is not specified thus would have the
>     default slot
>     >>>     resource. This is equivalent to having default operator
>     resources in
>     >> your
>     >>>     proposal.
>     >>>     3. *Both unknown*. The user can either set op_1 and op_2
>     to the same
>     >> SSG
>     >>>     or separate SSGs.
>     >>>        - If op_1 and op_2 are in the same SSG, it will be
>     equivalent to
>     >> the
>     >>>        coarse-grained resource management, where op_1 and op_2
>     share a
>     >> default
>     >>>        size slot no matter which data exchange mode is used.
>     >>>        - If op_1 and op_2 are in different SSGs, then each of
>     them will
>     >> use
>     >>>        a default size slot. This is equivalent to setting them
>     with
>     >> default
>     >>>        operator resources in your proposal.
>     >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is
>     known.*
>     >>>        - It is possible that the user learns the total / max
>     resource
>     >>>        requirement from executing and monitoring the job,
>     while not
>     >>> being aware of
>     >>>        individual operator requirements.
>     >>>        - I believe this is the case your proposal does not
>     cover. And TBH,
>     >>>        this is probably how most users learn the resource
>     requirements,
>     >>> according
>     >>>        to my experiences.
>     >>>        - In this case, the user might need to specify
>     different resources
>     >> if
>     >>>        he wants to switch the execution mode, which should not
>     be worse
>     >> than not
>     >>>        being able to use fine-grained resource management.
>     >>>
>     >>>
>     >>> ## An additional idea inspired by your proposal.
>     >>> We may provide multiple options for deciding resources for
>     SSGs whose
>     >>> requirement is not specified, if needed.
>     >>>
>     >>>     - Default slot resource (current design)
>     >>>     - Default operator resource times number of operators
>     (equivalent to
>     >>>     your proposal)
>     >>>
>     >>>
>     >>> ## Exposing internal runtime strategies
>     >>> Theoretically, yes. Tying to the SSGs, the resource
>     requirements might be
>     >>> affected if how SSGs are internally handled changes in future.
>     >> Practically,
>     >>> I do not concretely see at the moment what kind of changes we
>     may want in
>     >>> future that might conflict with this FLIP proposal, as the
>     question of
>     >>> switching data exchange mode answered above. I'd suggest to
>     not give up
>     >> the
>     >>> user friendliness we may gain now for the future problems that
>     may or may
>     >>> not exist.
>     >>>
>     >>> Moreover, the SSG-based approach has the flexibility to
>     achieve the
>     >>> equivalent behavior as the operator-based approach, if we set each
>     >> operator
>     >>> (or task) to a separate SSG. We can even provide a shortcut
>     option to
>     >>> automatically do that for users, if needed.
>     >>>
>     >>>
>     >>> Thank you~
>     >>>
>     >>> Xintong Song
>     >>>
>     >>>
>     >>>
>     >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann
>     <trohrmann@apache.org <ma...@apache.org>>
>     >> wrote:
>     >>>> Thanks for the responses Xintong and Stephan,
>     >>>>
>     >>>> I agree that being able to define the resource requirements for a
>     >> group of
>     >>>> operators is more user friendly. However, my concern is that
>     we are
>     >>>> exposing thereby internal runtime strategies which might
>     limit our
>     >>>> flexibility to execute a given job. Moreover, the semantics of
>     >> configuring
>     >>>> resource requirements for SSGs could break if switching from
>     streaming
>     >> to
>     >>>> batch execution. If one defines the resource requirements for
>     op_1 ->
>     >> op_2
>     >>>> which run in pipelined mode when using the streaming
>     execution, then
>     >> how do
>     >>>> we interpret these requirements when op_1 -> op_2 are
>     executed with a
>     >>>> blocking data exchange in batch execution mode? Consequently,
>     I am
>     >> still
>     >>>> leaning towards Stephan's proposal to set the resource
>     requirements per
>     >>>> operator.
>     >>>>
>     >>>> Maybe the following proposal makes the configuration easier:
>     If the
>     >> user
>     >>>> wants to use fine-grained resource requirements, then she
>     needs to
>     >> specify
>     >>>> the default size which is used for operators which have no
>     explicit
>     >>>> resource annotation. If this holds true, then every operator
>     would
>     >> have a
>     >>>> resource requirement and the system can try to execute the
>     operators
>     >> in the
>     >>>> best possible manner w/o being constrained by how the user
>     set the SSG
>     >>>> requirements.
>     >>>>
>     >>>> Cheers,
>     >>>> Till
>     >>>>
>     >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song
>     <tonysong820@gmail.com <ma...@gmail.com>>
>     >>>> wrote:
>     >>>>
>     >>>>> Thanks for the feedback, Stephan.
>     >>>>>
>     >>>>> Actually, your proposal has also come to my mind at some
>     point. And I
>     >>>> have
>     >>>>> some concerns about it.
>     >>>>>
>     >>>>>
>     >>>>> 1. It does not give users the same control as the SSG-based
>     approach.
>     >>>>>
>     >>>>>
>     >>>>> While both approaches do not require specifying for each
>     operator,
>     >>>>> SSG-based approach supports the semantic that "some operators
>     >> together
>     >>>> use
>     >>>>> this much resource" while the operator-based approach doesn't.
>     >>>>>
>     >>>>>
>     >>>>> Think of a long pipeline with m operators (o_1, o_2, ...,
>     o_m), and
>     >> at
>     >>>> some
>     >>>>> point there's an agg o_n (1 < n < m) which significantly
>     reduces the
>     >> data
>     >>>>> amount. One can separate the pipeline into 2 groups SSG_1
>     (o_1, ...,
>     >> o_n)
>     >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
>     >> parallelisms
>     >>>>> for operators in SSG_1 than for operators in SSG_2 won't
>     lead to too
>     >> much
>     >>>>> wasting of resources. If the two SSGs end up needing different
>     >> resources,
>     >>>>> with the SSG-based approach one can directly specify
>     resources for
>     >> the
>     >>>> two
>     >>>>> groups. However, with the operator-based approach, the user will
>     >> have to
>     >>>>> specify resources for each operator in one of the two
>     groups, and
>     >> tune
>     >>>> the
>     >>>>> default slot resource via configurations to fit the other group.
>     >>>>>
>     >>>>>
>     >>>>> 2. It increases the chance of breaking operator chains.
>     >>>>>
>     >>>>>
>     >>>>> Setting chainnable operators into different slot sharing
>     groups will
>     >>>>> prevent them from being chained. In the current implementation,
>     >>>> downstream
>     >>>>> operators, if SSG not explicitly specified, will be set to
>     the same
>     >> group
>     >>>>> as the chainable upstream operators (unless multiple upstream
>     >> operators
>     >>>> in
>     >>>>> different groups), to reduce the chance of breaking chains.
>     >>>>>
>     >>>>>
>     >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3,
>     deciding
>     >> SSGs
>     >>>>> based on whether resource is specified we will easily get
>     groups like
>     >>>> (o_1,
>     >>>>> o_3) & (o_2, o_4), where none of the operators can be
>     chained. This
>     >> is
>     >>>> also
>     >>>>> possible for the SSG-based approach, but I believe the
>     chance is much
>     >>>>> smaller because there's no strong reason for users to
>     specify the
>     >> groups
>     >>>>> with alternate operators like that. We are more likely to
>     get groups
>     >> like
>     >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between
>     o_2 and
>     >> o_3.
>     >>>>>
>     >>>>> 3. It complicates the system by having two different
>     mechanisms for
>     >>>> sharing
>     >>>>> managed memory in  a slot.
>     >>>>>
>     >>>>>
>     >>>>> - In FLIP-141, we introduced the intra-slot managed memory
>     sharing
>     >>>>> mechanism, where managed memory is first distributed
>     according to the
>     >>>>> consumer type, then further distributed across operators of that
>     >> consumer
>     >>>>> type.
>     >>>>>
>     >>>>> - With the operator-based approach, managed memory size
>     specified
>     >> for an
>     >>>>> operator should account for all the consumer types of that
>     operator.
>     >> That
>     >>>>> means the managed memory is first distributed across
>     operators, then
>     >>>>> distributed to different consumer types of each operator.
>     >>>>>
>     >>>>>
>     >>>>> Unfortunately, the different order of the two calculation
>     steps can
>     >> lead
>     >>>> to
>     >>>>> different results. To be specific, the semantic of the
>     configuration
>     >>>> option
>     >>>>> `consumer-weights` changed (within a slot vs. within an
>     operator).
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> To sum up things:
>     >>>>>
>     >>>>> While (3) might be a bit more implementation related, I
>     think (1)
>     >> and (2)
>     >>>>> somehow suggest that, the price for the proposed approach to
>     avoid
>     >>>>> specifying resource for every operator is that it's not as
>     >> independent
>     >>>> from
>     >>>>> operator chaining and slot sharing as the operator-based
>     approach
>     >>>> discussed
>     >>>>> in the FLIP.
>     >>>>>
>     >>>>>
>     >>>>> Thank you~
>     >>>>>
>     >>>>> Xintong Song
>     >>>>>
>     >>>>>
>     >>>>>
>     >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen
>     <sewen@apache.org <ma...@apache.org>>
>     >> wrote:
>     >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
>     >>>>>>
>     >>>>>> I want to say, first of all, that this is super well
>     written. And
>     >> the
>     >>>>>> points that the FLIP makes about how to expose the
>     configuration to
>     >>>> users
>     >>>>>> is exactly the right thing to figure out first.
>     >>>>>> So good job here!
>     >>>>>>
>     >>>>>> About how to let users specify the resource profiles. If I
>     can sum
>     >> the
>     >>>>> FLIP
>     >>>>>> and previous discussion up in my own words, the problem is the
>     >>>> following:
>     >>>>>> Operator-level specification is the simplest and cleanest
>     approach,
>     >>>>> because
>     >>>>>>> it avoids mixing operator configuration (resource) and
>     >> scheduling. No
>     >>>>>>> matter what other parameters change (chaining, slot sharing,
>     >>>> switching
>     >>>>>>> pipelined and blocking shuffles), the resource profiles
>     stay the
>     >>>> same.
>     >>>>>>> But it would require that a user specifies resources on all
>     >>>> operators,
>     >>>>>>> which makes it hard to use. That's why the FLIP suggests going
>     >> with
>     >>>>>>> specifying resources on a Sharing-Group.
>     >>>>>>
>     >>>>>> I think both thoughts are important, so can we find a solution
>     >> where
>     >>>> the
>     >>>>>> Resource Profiles are specified on an Operator, but we
>     still avoid
>     >> that
>     >>>>> we
>     >>>>>> need to specify a resource profile on every operator?
>     >>>>>>
>     >>>>>> What do you think about something like the following:
>     >>>>>>    - Resource Profiles are specified on an operator level.
>     >>>>>>    - Not all operators need profiles
>     >>>>>>    - All Operators without a Resource Profile ended up in the
>     >> default
>     >>>> slot
>     >>>>>> sharing group with a default profile (will get a default slot).
>     >>>>>>    - All Operators with a Resource Profile will go into
>     another slot
>     >>>>> sharing
>     >>>>>> group (the resource-specified-group).
>     >>>>>>    - Users can define different slot sharing groups for
>     operators
>     >> like
>     >>>>> they
>     >>>>>> do now, with the exception that you cannot mix operators
>     that have
>     >> a
>     >>>>>> resource profile and operators that have no resource profile.
>     >>>>>>    - The default case where no operator has a resource
>     profile is
>     >> just a
>     >>>>>> special case of this model
>     >>>>>>    - The chaining logic sums up the profiles per operator,
>     like it
>     >> does
>     >>>>> now,
>     >>>>>> and the scheduler sums up the profiles of the tasks that it
>     >> schedules
>     >>>>>> together.
>     >>>>>>
>     >>>>>>
>     >>>>>> There is another question about reactive scaling raised in the
>     >> FLIP. I
>     >>>>> need
>     >>>>>> to think a bit about that. That is indeed a bit more tricky
>     once we
>     >>>> have
>     >>>>>> slots of different sizes.
>     >>>>>> It is not clear then which of the different slot requests the
>     >>>>>> ResourceManager should fulfill when new resources (TMs)
>     show up,
>     >> or how
>     >>>>> the
>     >>>>>> JobManager redistributes the slots resources when resources
>     (TMs)
>     >>>>> disappear
>     >>>>>> This question is pretty orthogonal, though, to the "how to
>     specify
>     >> the
>     >>>>>> resources".
>     >>>>>>
>     >>>>>>
>     >>>>>> Best,
>     >>>>>> Stephan
>     >>>>>>
>     >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song
>     <tonysong820@gmail.com <ma...@gmail.com>
>     >>>>> wrote:
>     >>>>>>> Thanks for drafting the FLIP and driving the discussion,
>     Yangze.
>     >>>>>>> And Thanks for the feedback, Till and Chesnay.
>     >>>>>>>
>     >>>>>>> @Till,
>     >>>>>>>
>     >>>>>>> I agree that specifying requirements for SSGs means that SSGs
>     >> need to
>     >>>>> be
>     >>>>>>> supported in fine-grained resource management, otherwise each
>     >>>> operator
>     >>>>>>> might use as many resources as the whole group. However, I
>     cannot
>     >>>> think
>     >>>>>> of
>     >>>>>>> a strong reason for not supporting SSGs in fine-grained
>     resource
>     >>>>>>> management.
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>> Interestingly, if all operators have their resources properly
>     >>>>>> specified,
>     >>>>>>>> then slot sharing is no longer needed because Flink could
>     >> slice off
>     >>>>> the
>     >>>>>>>> appropriately sized slots for every Task individually.
>     >>>>>>>>
>     >>>>>>> So for example, if we have a job consisting of two
>     operator op_1
>     >> and
>     >>>>> op_2
>     >>>>>>>> where each op needs 100 MB of memory, we would then say that
>     >> the
>     >>>> slot
>     >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>     >> cluster
>     >>>>> with
>     >>>>>> 2
>     >>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>     >> this
>     >>>>> job.
>     >>>>>> If
>     >>>>>>>> the resources were specified on an operator level, then the
>     >> system
>     >>>>>> could
>     >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>     >> TM_2.
>     >>>>>>>
>     >>>>>>> Couldn't agree more that if all operators' requirements are
>     >> properly
>     >>>>>>> specified, slot sharing should be no longer needed. I
>     think this
>     >>>>> exactly
>     >>>>>>> disproves the example. If we already know op_1 and op_2 each
>     >> needs
>     >>>> 100
>     >>>>> MB
>     >>>>>>> of memory, why would we put them in the same group? If
>     they are
>     >> in
>     >>>>>> separate
>     >>>>>>> groups, with the proposed approach the system can freely
>     deploy
>     >> them
>     >>>> to
>     >>>>>>> either a 200 MB TM or two 100 MB TMs.
>     >>>>>>>
>     >>>>>>> Moreover, the precondition for not needing slot sharing is
>     having
>     >>>>>> resource
>     >>>>>>> requirements properly specified for all operators. This is not
>     >> always
>     >>>>>>> possible, and usually requires tremendous efforts. One of the
>     >>>> benefits
>     >>>>>> for
>     >>>>>>> SSG-based requirements is that it allows the user to freely
>     >> decide
>     >>>> the
>     >>>>>>> granularity, thus efforts they want to pay. I would
>     consider SSG
>     >> in
>     >>>>>>> fine-grained resource management as a group of operators
>     that the
>     >>>> user
>     >>>>>>> would like to specify the total resource for. There can be
>     only
>     >> one
>     >>>>> group
>     >>>>>>> in the job, 2~3 groups dividing the job into a few major
>     parts,
>     >> or as
>     >>>>>> many
>     >>>>>>> groups as the number of tasks/operators, depending on how
>     >>>> fine-grained
>     >>>>>> the
>     >>>>>>> user is able to specify the resources.
>     >>>>>>>
>     >>>>>>> Having to support SSGs might be a constraint. But given
>     that all
>     >> the
>     >>>>>>> current scheduler implementations already support SSGs, I
>     tend to
>     >>>> think
>     >>>>>>> that as an acceptable price for the above discussed
>     usability and
>     >>>>>>> flexibility.
>     >>>>>>>
>     >>>>>>> @Chesnay
>     >>>>>>>
>     >>>>>>> Will declaring them on slot sharing groups not also waste
>     >> resources
>     >>>> if
>     >>>>>> the
>     >>>>>>>> parallelism of operators within that group are different?
>     >>>>>>>>
>     >>>>>>> Yes. It's a trade-off between usability and resource
>     >> utilization. To
>     >>>>>> avoid
>     >>>>>>> such wasting, the user can define more groups, so that
>     each group
>     >>>>>> contains
>     >>>>>>> less operators and the chance of having operators with
>     different
>     >>>>>>> parallelism will be reduced. The price is to have more
>     resource
>     >>>>>>> requirements to specify.
>     >>>>>>>
>     >>>>>>> It also seems like quite a hassle for users having to
>     >> recalculate the
>     >>>>>>>> resource requirements if they change the slot sharing.
>     >>>>>>>> I'd think that it's not really workable for users that create
>     >> a set
>     >>>>> of
>     >>>>>>>> re-usable operators which are mixed and matched in their
>     >>>>> applications;
>     >>>>>>>> managing the resources requirements in such a setting
>     would be
>     >> a
>     >>>>>>>> nightmare, and in the end would require operator-level
>     >> requirements
>     >>>>> any
>     >>>>>>>> way.
>     >>>>>>>> In that sense, I'm not even sure whether it really increases
>     >>>>> usability.
>     >>>>>>>     - As mentioned in my reply to Till's comment, there's no
>     >> reason to
>     >>>>> put
>     >>>>>>>     multiple operators whose individual resource
>     requirements are
>     >>>>> already
>     >>>>>>> known
>     >>>>>>>     into the same group in fine-grained resource management.
>     >>>>>>>     - Even an operator implementation is reused for multiple
>     >>>>> applications,
>     >>>>>>>     it does not guarantee the same resource requirements.
>     During
>     >> our
>     >>>>> years
>     >>>>>>> of
>     >>>>>>>     practices in Alibaba, with per-operator requirements
>     >> specified for
>     >>>>>>> Blink's
>     >>>>>>>     fine-grained resource management, very few users
>     (including
>     >> our
>     >>>>>>> specialists
>     >>>>>>>     who are dedicated to supporting Blink users) are as
>     >> experienced as
>     >>>>> to
>     >>>>>>>     accurately predict/estimate the operator resource
>     >> requirements.
>     >>>> Most
>     >>>>>>> people
>     >>>>>>>     rely on the execution-time metrics (throughput, delay, cpu
>     >> load,
>     >>>>>> memory
>     >>>>>>>     usage, GC pressure, etc.) to improve the specification.
>     >>>>>>>
>     >>>>>>> To sum up:
>     >>>>>>> If the user is capable of providing proper resource
>     requirements
>     >> for
>     >>>>>> every
>     >>>>>>> operator, that's definitely a good thing and we would not
>     need to
>     >>>> rely
>     >>>>> on
>     >>>>>>> the SSGs. However, that shouldn't be a *must* for the
>     >> fine-grained
>     >>>>>> resource
>     >>>>>>> management to work. For those users who are capable and do not
>     >> like
>     >>>>>> having
>     >>>>>>> to set each operator to a separate SSG, I would be ok to have
>     >> both
>     >>>>>>> SSG-based and operator-based runtime interfaces and to only
>     >> fallback
>     >>>> to
>     >>>>>> the
>     >>>>>>> SSG requirements when the operator requirements are not
>     >> specified.
>     >>>>>> However,
>     >>>>>>> as the first step, I think we should prioritise the use cases
>     >> where
>     >>>>> users
>     >>>>>>> are not that experienced.
>     >>>>>>>
>     >>>>>>> Thank you~
>     >>>>>>>
>     >>>>>>> Xintong Song
>     >>>>>>>
>     >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
>     >> chesnay@apache.org <ma...@apache.org>>
>     >>>>>>> wrote:
>     >>>>>>>
>     >>>>>>>> Will declaring them on slot sharing groups not also waste
>     >> resources
>     >>>>> if
>     >>>>>>>> the parallelism of operators within that group are different?
>     >>>>>>>>
>     >>>>>>>> It also seems like quite a hassle for users having to
>     >> recalculate
>     >>>> the
>     >>>>>>>> resource requirements if they change the slot sharing.
>     >>>>>>>> I'd think that it's not really workable for users that create
>     >> a set
>     >>>>> of
>     >>>>>>>> re-usable operators which are mixed and matched in their
>     >>>>> applications;
>     >>>>>>>> managing the resources requirements in such a setting
>     would be
>     >> a
>     >>>>>>>> nightmare, and in the end would require operator-level
>     >> requirements
>     >>>>> any
>     >>>>>>>> way.
>     >>>>>>>> In that sense, I'm not even sure whether it really increases
>     >>>>> usability.
>     >>>>>>>> My main worry is that it if we wire the runtime to work
>     on SSGs
>     >>>> it's
>     >>>>>>>> gonna be difficult to implement more fine-grained approaches,
>     >> which
>     >>>>>>>> would not be the case if, for the runtime, they are always
>     >> defined
>     >>>> on
>     >>>>>> an
>     >>>>>>>> operator-level.
>     >>>>>>>>
>     >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
>     >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
>     >>>> Yangze.
>     >>>>>>>>> I like that defining resource requirements on a slot sharing
>     >>>> group
>     >>>>>>> makes
>     >>>>>>>>> the overall setup easier and improves usability of resource
>     >>>>>>> requirements.
>     >>>>>>>>> What I do not like about it is that it changes slot sharing
>     >>>> groups
>     >>>>>> from
>     >>>>>>>>> being a scheduling hint to something which needs to be
>     >> supported
>     >>>> in
>     >>>>>>> order
>     >>>>>>>>> to support fine grained resource requirements. So far, the
>     >> idea
>     >>>> of
>     >>>>>> slot
>     >>>>>>>>> sharing groups was that it tells the system that a set of
>     >>>> operators
>     >>>>>> can
>     >>>>>>>> be
>     >>>>>>>>> deployed in the same slot. But the system still had the
>     >> freedom
>     >>>> to
>     >>>>>> say
>     >>>>>>>> that
>     >>>>>>>>> it would rather place these tasks in different slots if it
>     >>>> wanted.
>     >>>>> If
>     >>>>>>> we
>     >>>>>>>>> now specify resource requirements on a per slot sharing
>     >> group,
>     >>>> then
>     >>>>>> the
>     >>>>>>>>> only option for a scheduler which does not support slot
>     >> sharing
>     >>>>>> groups
>     >>>>>>> is
>     >>>>>>>>> to say that every operator in this slot sharing group
>     needs a
>     >>>> slot
>     >>>>>> with
>     >>>>>>>> the
>     >>>>>>>>> same resources as the whole group.
>     >>>>>>>>>
>     >>>>>>>>> So for example, if we have a job consisting of two operator
>     >> op_1
>     >>>>> and
>     >>>>>>> op_2
>     >>>>>>>>> where each op needs 100 MB of memory, we would then say that
>     >> the
>     >>>>> slot
>     >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>     >> cluster
>     >>>>>> with
>     >>>>>>> 2
>     >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>     >> this
>     >>>>>> job.
>     >>>>>>> If
>     >>>>>>>>> the resources were specified on an operator level, then the
>     >>>> system
>     >>>>>>> could
>     >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>     >> TM_2.
>     >>>>>>>>> Originally, one of the primary goals of slot sharing groups
>     >> was
>     >>>> to
>     >>>>>> make
>     >>>>>>>> it
>     >>>>>>>>> easier for the user to reason about how many slots a job
>     >> needs
>     >>>>>>>> independent
>     >>>>>>>>> of the actual number of operators in the job. Interestingly,
>     >> if
>     >>>> all
>     >>>>>>>>> operators have their resources properly specified, then slot
>     >>>>> sharing
>     >>>>>> is
>     >>>>>>>> no
>     >>>>>>>>> longer needed because Flink could slice off the
>     appropriately
>     >>>> sized
>     >>>>>>> slots
>     >>>>>>>>> for every Task individually. What matters is whether the
>     >> whole
>     >>>>>> cluster
>     >>>>>>>> has
>     >>>>>>>>> enough resources to run all tasks or not.
>     >>>>>>>>>
>     >>>>>>>>> Cheers,
>     >>>>>>>>> Till
>     >>>>>>>>>
>     >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
>     >> karmagyz@gmail.com <ma...@gmail.com>>
>     >>>>>> wrote:
>     >>>>>>>>>> Hi, there,
>     >>>>>>>>>>
>     >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
>     >> Runtime
>     >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
>     >> where we
>     >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
>     >> for
>     >>>>>>>>>> specifying fine-grained resource requirements.
>     >>>>>>>>>>
>     >>>>>>>>>> In this FLIP:
>     >>>>>>>>>> - Expound the user story of fine-grained resource
>     >> management.
>     >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
>     >> resource
>     >>>>>>>>>> requirements.
>     >>>>>>>>>> - Discuss the pros and cons of the three potential
>     >> granularities
>     >>>>> for
>     >>>>>>>>>> specifying the resource requirements (op, task and slot
>     >> sharing
>     >>>>>> group)
>     >>>>>>>>>> and explain why we choose the slot sharing group.
>     >>>>>>>>>>
>     >>>>>>>>>> Please find more details in the FLIP wiki document [1].
>     >> Looking
>     >>>>>>>>>> forward to your feedback.
>     >>>>>>>>>>
>     >>>>>>>>>> [1]
>     >>>>>>>>>>
>     >>
>     https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>     <https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements>
>     >>>>>>>>>> Best,
>     >>>>>>>>>> Yangze Guo
>     >>>>>>>>>>
>     >>>>>>>>
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Till Rohrmann <tr...@apache.org>.

If I understand you correctly Chesnay, then you want to decouple the
resource requirement specification from the slot sharing group assignment.
Hence, per default all operators would be in the same slot sharing group.
If there is no operator with a resource specification, then the system
would allocate a default slot for it. If there is at least one operator,
then the system would sum up all the specified resources and allocate a
slot of this size. This effectively means that all unspecified operators
will implicitly have a zero resource requirement. Did I understand your
idea correctly?

I am wondering whether this wouldn't lead to a surprising behaviour for the
user. If the user specifies the resource requirements for a single
operator, then he probably will assume that the other operators will get
the default share of resources and not nothing.

Cheers,
Till

On Thu, Jan 21, 2021 at 3:25 AM Chesnay Schepler <ch...@apache.org> wrote:

> Is there even a functional difference between specifying the
> requirements for an SSG vs specifying the same requirements on a single
> operator within that group (ideally a colocation group to avoid this
> whole hint business)?
>
> Wouldn't we get the best of both worlds in the latter case?
>
> Users can take shortcuts to define shared requirements,
> but refine them further as needed on a per-operator basis,
> without changing semantics of slotsharing groups
> nor the runtime being locked into SSG-based requirements.
>
> (And before anyone argues what happens if slotsharing groups change or
> whatnot, that's a plain API issue that we could surely solve. (A plain
> iteration over slotsharing groups and therein contained operators would
> suffice)).
>
> On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> > Maybe a different minor idea: Would it be possible to treat the SSG
> > resource requirements as a hint for the runtime similar to how slot
> sharing
> > groups are designed at the moment? Meaning that we don't give the
> guarantee
> > that Flink will always deploy this set of tasks together no matter what
> > comes. If, for example, the runtime can derive by some means the resource
> > requirements for each task based on the requirements for the SSG, this
> > could be possible. One easy strategy would be to give every task the same
> > resources as the whole slot sharing group. Another one could be
> > distributing the resources equally among the tasks. This does not even
> have
> > to be implemented but we would give ourselves the freedom to change
> > scheduling if need should arise.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> >> Thanks for the responses, Till and Xintong.
> >>
> >> I second Xintong's comment that SSG-based runtime interface will give
> >> us the flexibility to achieve op/task-based approach. That's one of
> >> the most important reasons for our design choice.
> >>
> >> Some cents regarding the default operator resource:
> >> - It might be good for the scenario of DataStream jobs.
> >>     ** For light-weight operators, the accumulative configuration error
> >> will not be significant. Then, the resource of a task used is
> >> proportional to the number of operators it contains.
> >>     ** For heavy operators like join and window or operators using the
> >> external resources, user will turn to the fine-grained resource
> >> configuration.
> >> - It can increase the stability for the standalone cluster where task
> >> executors registered are heterogeneous(with different default slot
> >> resources).
> >> - It might not be good for SQL users. The operators that SQL will be
> >> transferred to is a black box to the user. We also do not guarantee
> >> the cross-version of consistency of the transformation so far.
> >>
> >> I think it can be treated as a follow-up work when the fine-grained
> >> resource management is end-to-end ready.
> >>
> >> Best,
> >> Yangze Guo
> >>
> >>
> >> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <to...@gmail.com>
> >> wrote:
> >>> Thanks for the feedback, Till.
> >>>
> >>> ## I feel that what you proposed (operator-based + default value) might
> >> be
> >>> subsumed by the SSG-based approach.
> >>> Thinking of op_1 -> op_2, there are the following 4 cases, categorized
> by
> >>> whether the resource requirements are known to the users.
> >>>
> >>>     1. *Both known.* As previously mentioned, there's no reason to put
> >>>     multiple operators whose individual resource requirements are
> already
> >> known
> >>>     into the same group in fine-grained resource management. And if
> op_1
> >> and
> >>>     op_2 are in different groups, there should be no problem switching
> >> data
> >>>     exchange mode from pipelined to blocking. This is equivalent to
> >> specifying
> >>>     operator resource requirements in your proposal.
> >>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is
> in a
> >>>     SSG whose resource is not specified thus would have the default
> slot
> >>>     resource. This is equivalent to having default operator resources
> in
> >> your
> >>>     proposal.
> >>>     3. *Both unknown*. The user can either set op_1 and op_2 to the
> same
> >> SSG
> >>>     or separate SSGs.
> >>>        - If op_1 and op_2 are in the same SSG, it will be equivalent to
> >> the
> >>>        coarse-grained resource management, where op_1 and op_2 share a
> >> default
> >>>        size slot no matter which data exchange mode is used.
> >>>        - If op_1 and op_2 are in different SSGs, then each of them will
> >> use
> >>>        a default size slot. This is equivalent to setting them with
> >> default
> >>>        operator resources in your proposal.
> >>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
> >>>        - It is possible that the user learns the total / max resource
> >>>        requirement from executing and monitoring the job, while not
> >>> being aware of
> >>>        individual operator requirements.
> >>>        - I believe this is the case your proposal does not cover. And
> TBH,
> >>>        this is probably how most users learn the resource requirements,
> >>> according
> >>>        to my experiences.
> >>>        - In this case, the user might need to specify different
> resources
> >> if
> >>>        he wants to switch the execution mode, which should not be worse
> >> than not
> >>>        being able to use fine-grained resource management.
> >>>
> >>>
> >>> ## An additional idea inspired by your proposal.
> >>> We may provide multiple options for deciding resources for SSGs whose
> >>> requirement is not specified, if needed.
> >>>
> >>>     - Default slot resource (current design)
> >>>     - Default operator resource times number of operators (equivalent
> to
> >>>     your proposal)
> >>>
> >>>
> >>> ## Exposing internal runtime strategies
> >>> Theoretically, yes. Tying to the SSGs, the resource requirements might
> be
> >>> affected if how SSGs are internally handled changes in future.
> >> Practically,
> >>> I do not concretely see at the moment what kind of changes we may want
> in
> >>> future that might conflict with this FLIP proposal, as the question of
> >>> switching data exchange mode answered above. I'd suggest to not give up
> >> the
> >>> user friendliness we may gain now for the future problems that may or
> may
> >>> not exist.
> >>>
> >>> Moreover, the SSG-based approach has the flexibility to achieve the
> >>> equivalent behavior as the operator-based approach, if we set each
> >> operator
> >>> (or task) to a separate SSG. We can even provide a shortcut option to
> >>> automatically do that for users, if needed.
> >>>
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <tr...@apache.org>
> >> wrote:
> >>>> Thanks for the responses Xintong and Stephan,
> >>>>
> >>>> I agree that being able to define the resource requirements for a
> >> group of
> >>>> operators is more user friendly. However, my concern is that we are
> >>>> exposing thereby internal runtime strategies which might limit our
> >>>> flexibility to execute a given job. Moreover, the semantics of
> >> configuring
> >>>> resource requirements for SSGs could break if switching from streaming
> >> to
> >>>> batch execution. If one defines the resource requirements for op_1 ->
> >> op_2
> >>>> which run in pipelined mode when using the streaming execution, then
> >> how do
> >>>> we interpret these requirements when op_1 -> op_2 are executed with a
> >>>> blocking data exchange in batch execution mode? Consequently, I am
> >> still
> >>>> leaning towards Stephan's proposal to set the resource requirements
> per
> >>>> operator.
> >>>>
> >>>> Maybe the following proposal makes the configuration easier: If the
> >> user
> >>>> wants to use fine-grained resource requirements, then she needs to
> >> specify
> >>>> the default size which is used for operators which have no explicit
> >>>> resource annotation. If this holds true, then every operator would
> >> have a
> >>>> resource requirement and the system can try to execute the operators
> >> in the
> >>>> best possible manner w/o being constrained by how the user set the SSG
> >>>> requirements.
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Thanks for the feedback, Stephan.
> >>>>>
> >>>>> Actually, your proposal has also come to my mind at some point. And I
> >>>> have
> >>>>> some concerns about it.
> >>>>>
> >>>>>
> >>>>> 1. It does not give users the same control as the SSG-based approach.
> >>>>>
> >>>>>
> >>>>> While both approaches do not require specifying for each operator,
> >>>>> SSG-based approach supports the semantic that "some operators
> >> together
> >>>> use
> >>>>> this much resource" while the operator-based approach doesn't.
> >>>>>
> >>>>>
> >>>>> Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
> >> at
> >>>> some
> >>>>> point there's an agg o_n (1 < n < m) which significantly reduces the
> >> data
> >>>>> amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ...,
> >> o_n)
> >>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> >> parallelisms
> >>>>> for operators in SSG_1 than for operators in SSG_2 won't lead to too
> >> much
> >>>>> wasting of resources. If the two SSGs end up needing different
> >> resources,
> >>>>> with the SSG-based approach one can directly specify resources for
> >> the
> >>>> two
> >>>>> groups. However, with the operator-based approach, the user will
> >> have to
> >>>>> specify resources for each operator in one of the two groups, and
> >> tune
> >>>> the
> >>>>> default slot resource via configurations to fit the other group.
> >>>>>
> >>>>>
> >>>>> 2. It increases the chance of breaking operator chains.
> >>>>>
> >>>>>
> >>>>> Setting chainnable operators into different slot sharing groups will
> >>>>> prevent them from being chained. In the current implementation,
> >>>> downstream
> >>>>> operators, if SSG not explicitly specified, will be set to the same
> >> group
> >>>>> as the chainable upstream operators (unless multiple upstream
> >> operators
> >>>> in
> >>>>> different groups), to reduce the chance of breaking chains.
> >>>>>
> >>>>>
> >>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
> >> SSGs
> >>>>> based on whether resource is specified we will easily get groups like
> >>>> (o_1,
> >>>>> o_3) & (o_2, o_4), where none of the operators can be chained. This
> >> is
> >>>> also
> >>>>> possible for the SSG-based approach, but I believe the chance is much
> >>>>> smaller because there's no strong reason for users to specify the
> >> groups
> >>>>> with alternate operators like that. We are more likely to get groups
> >> like
> >>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and
> >> o_3.
> >>>>>
> >>>>> 3. It complicates the system by having two different mechanisms for
> >>>> sharing
> >>>>> managed memory in  a slot.
> >>>>>
> >>>>>
> >>>>> - In FLIP-141, we introduced the intra-slot managed memory sharing
> >>>>> mechanism, where managed memory is first distributed according to the
> >>>>> consumer type, then further distributed across operators of that
> >> consumer
> >>>>> type.
> >>>>>
> >>>>> - With the operator-based approach, managed memory size specified
> >> for an
> >>>>> operator should account for all the consumer types of that operator.
> >> That
> >>>>> means the managed memory is first distributed across operators, then
> >>>>> distributed to different consumer types of each operator.
> >>>>>
> >>>>>
> >>>>> Unfortunately, the different order of the two calculation steps can
> >> lead
> >>>> to
> >>>>> different results. To be specific, the semantic of the configuration
> >>>> option
> >>>>> `consumer-weights` changed (within a slot vs. within an operator).
> >>>>>
> >>>>>
> >>>>>
> >>>>> To sum up things:
> >>>>>
> >>>>> While (3) might be a bit more implementation related, I think (1)
> >> and (2)
> >>>>> somehow suggest that, the price for the proposed approach to avoid
> >>>>> specifying resource for every operator is that it's not as
> >> independent
> >>>> from
> >>>>> operator chaining and slot sharing as the operator-based approach
> >>>> discussed
> >>>>> in the FLIP.
> >>>>>
> >>>>>
> >>>>> Thank you~
> >>>>>
> >>>>> Xintong Song
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org>
> >> wrote:
> >>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
> >>>>>>
> >>>>>> I want to say, first of all, that this is super well written. And
> >> the
> >>>>>> points that the FLIP makes about how to expose the configuration to
> >>>> users
> >>>>>> is exactly the right thing to figure out first.
> >>>>>> So good job here!
> >>>>>>
> >>>>>> About how to let users specify the resource profiles. If I can sum
> >> the
> >>>>> FLIP
> >>>>>> and previous discussion up in my own words, the problem is the
> >>>> following:
> >>>>>> Operator-level specification is the simplest and cleanest approach,
> >>>>> because
> >>>>>>> it avoids mixing operator configuration (resource) and
> >> scheduling. No
> >>>>>>> matter what other parameters change (chaining, slot sharing,
> >>>> switching
> >>>>>>> pipelined and blocking shuffles), the resource profiles stay the
> >>>> same.
> >>>>>>> But it would require that a user specifies resources on all
> >>>> operators,
> >>>>>>> which makes it hard to use. That's why the FLIP suggests going
> >> with
> >>>>>>> specifying resources on a Sharing-Group.
> >>>>>>
> >>>>>> I think both thoughts are important, so can we find a solution
> >> where
> >>>> the
> >>>>>> Resource Profiles are specified on an Operator, but we still avoid
> >> that
> >>>>> we
> >>>>>> need to specify a resource profile on every operator?
> >>>>>>
> >>>>>> What do you think about something like the following:
> >>>>>>    - Resource Profiles are specified on an operator level.
> >>>>>>    - Not all operators need profiles
> >>>>>>    - All Operators without a Resource Profile ended up in the
> >> default
> >>>> slot
> >>>>>> sharing group with a default profile (will get a default slot).
> >>>>>>    - All Operators with a Resource Profile will go into another slot
> >>>>> sharing
> >>>>>> group (the resource-specified-group).
> >>>>>>    - Users can define different slot sharing groups for operators
> >> like
> >>>>> they
> >>>>>> do now, with the exception that you cannot mix operators that have
> >> a
> >>>>>> resource profile and operators that have no resource profile.
> >>>>>>    - The default case where no operator has a resource profile is
> >> just a
> >>>>>> special case of this model
> >>>>>>    - The chaining logic sums up the profiles per operator, like it
> >> does
> >>>>> now,
> >>>>>> and the scheduler sums up the profiles of the tasks that it
> >> schedules
> >>>>>> together.
> >>>>>>
> >>>>>>
> >>>>>> There is another question about reactive scaling raised in the
> >> FLIP. I
> >>>>> need
> >>>>>> to think a bit about that. That is indeed a bit more tricky once we
> >>>> have
> >>>>>> slots of different sizes.
> >>>>>> It is not clear then which of the different slot requests the
> >>>>>> ResourceManager should fulfill when new resources (TMs) show up,
> >> or how
> >>>>> the
> >>>>>> JobManager redistributes the slots resources when resources (TMs)
> >>>>> disappear
> >>>>>> This question is pretty orthogonal, though, to the "how to specify
> >> the
> >>>>>> resources".
> >>>>>>
> >>>>>>
> >>>>>> Best,
> >>>>>> Stephan
> >>>>>>
> >>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <tonysong820@gmail.com
> >>>>> wrote:
> >>>>>>> Thanks for drafting the FLIP and driving the discussion, Yangze.
> >>>>>>> And Thanks for the feedback, Till and Chesnay.
> >>>>>>>
> >>>>>>> @Till,
> >>>>>>>
> >>>>>>> I agree that specifying requirements for SSGs means that SSGs
> >> need to
> >>>>> be
> >>>>>>> supported in fine-grained resource management, otherwise each
> >>>> operator
> >>>>>>> might use as many resources as the whole group. However, I cannot
> >>>> think
> >>>>>> of
> >>>>>>> a strong reason for not supporting SSGs in fine-grained resource
> >>>>>>> management.
> >>>>>>>
> >>>>>>>
> >>>>>>>> Interestingly, if all operators have their resources properly
> >>>>>> specified,
> >>>>>>>> then slot sharing is no longer needed because Flink could
> >> slice off
> >>>>> the
> >>>>>>>> appropriately sized slots for every Task individually.
> >>>>>>>>
> >>>>>>> So for example, if we have a job consisting of two operator op_1
> >> and
> >>>>> op_2
> >>>>>>>> where each op needs 100 MB of memory, we would then say that
> >> the
> >>>> slot
> >>>>>>>> sharing group needs 200 MB of memory to run. If we have a
> >> cluster
> >>>>> with
> >>>>>> 2
> >>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
> >> this
> >>>>> job.
> >>>>>> If
> >>>>>>>> the resources were specified on an operator level, then the
> >> system
> >>>>>> could
> >>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
> >> TM_2.
> >>>>>>>
> >>>>>>> Couldn't agree more that if all operators' requirements are
> >> properly
> >>>>>>> specified, slot sharing should be no longer needed. I think this
> >>>>> exactly
> >>>>>>> disproves the example. If we already know op_1 and op_2 each
> >> needs
> >>>> 100
> >>>>> MB
> >>>>>>> of memory, why would we put them in the same group? If they are
> >> in
> >>>>>> separate
> >>>>>>> groups, with the proposed approach the system can freely deploy
> >> them
> >>>> to
> >>>>>>> either a 200 MB TM or two 100 MB TMs.
> >>>>>>>
> >>>>>>> Moreover, the precondition for not needing slot sharing is having
> >>>>>> resource
> >>>>>>> requirements properly specified for all operators. This is not
> >> always
> >>>>>>> possible, and usually requires tremendous efforts. One of the
> >>>> benefits
> >>>>>> for
> >>>>>>> SSG-based requirements is that it allows the user to freely
> >> decide
> >>>> the
> >>>>>>> granularity, thus efforts they want to pay. I would consider SSG
> >> in
> >>>>>>> fine-grained resource management as a group of operators that the
> >>>> user
> >>>>>>> would like to specify the total resource for. There can be only
> >> one
> >>>>> group
> >>>>>>> in the job, 2~3 groups dividing the job into a few major parts,
> >> or as
> >>>>>> many
> >>>>>>> groups as the number of tasks/operators, depending on how
> >>>> fine-grained
> >>>>>> the
> >>>>>>> user is able to specify the resources.
> >>>>>>>
> >>>>>>> Having to support SSGs might be a constraint. But given that all
> >> the
> >>>>>>> current scheduler implementations already support SSGs, I tend to
> >>>> think
> >>>>>>> that as an acceptable price for the above discussed usability and
> >>>>>>> flexibility.
> >>>>>>>
> >>>>>>> @Chesnay
> >>>>>>>
> >>>>>>> Will declaring them on slot sharing groups not also waste
> >> resources
> >>>> if
> >>>>>> the
> >>>>>>>> parallelism of operators within that group are different?
> >>>>>>>>
> >>>>>>> Yes. It's a trade-off between usability and resource
> >> utilization. To
> >>>>>> avoid
> >>>>>>> such wasting, the user can define more groups, so that each group
> >>>>>> contains
> >>>>>>> less operators and the chance of having operators with different
> >>>>>>> parallelism will be reduced. The price is to have more resource
> >>>>>>> requirements to specify.
> >>>>>>>
> >>>>>>> It also seems like quite a hassle for users having to
> >> recalculate the
> >>>>>>>> resource requirements if they change the slot sharing.
> >>>>>>>> I'd think that it's not really workable for users that create
> >> a set
> >>>>> of
> >>>>>>>> re-usable operators which are mixed and matched in their
> >>>>> applications;
> >>>>>>>> managing the resources requirements in such a setting would be
> >> a
> >>>>>>>> nightmare, and in the end would require operator-level
> >> requirements
> >>>>> any
> >>>>>>>> way.
> >>>>>>>> In that sense, I'm not even sure whether it really increases
> >>>>> usability.
> >>>>>>>     - As mentioned in my reply to Till's comment, there's no
> >> reason to
> >>>>> put
> >>>>>>>     multiple operators whose individual resource requirements are
> >>>>> already
> >>>>>>> known
> >>>>>>>     into the same group in fine-grained resource management.
> >>>>>>>     - Even an operator implementation is reused for multiple
> >>>>> applications,
> >>>>>>>     it does not guarantee the same resource requirements. During
> >> our
> >>>>> years
> >>>>>>> of
> >>>>>>>     practices in Alibaba, with per-operator requirements
> >> specified for
> >>>>>>> Blink's
> >>>>>>>     fine-grained resource management, very few users (including
> >> our
> >>>>>>> specialists
> >>>>>>>     who are dedicated to supporting Blink users) are as
> >> experienced as
> >>>>> to
> >>>>>>>     accurately predict/estimate the operator resource
> >> requirements.
> >>>> Most
> >>>>>>> people
> >>>>>>>     rely on the execution-time metrics (throughput, delay, cpu
> >> load,
> >>>>>> memory
> >>>>>>>     usage, GC pressure, etc.) to improve the specification.
> >>>>>>>
> >>>>>>> To sum up:
> >>>>>>> If the user is capable of providing proper resource requirements
> >> for
> >>>>>> every
> >>>>>>> operator, that's definitely a good thing and we would not need to
> >>>> rely
> >>>>> on
> >>>>>>> the SSGs. However, that shouldn't be a *must* for the
> >> fine-grained
> >>>>>> resource
> >>>>>>> management to work. For those users who are capable and do not
> >> like
> >>>>>> having
> >>>>>>> to set each operator to a separate SSG, I would be ok to have
> >> both
> >>>>>>> SSG-based and operator-based runtime interfaces and to only
> >> fallback
> >>>> to
> >>>>>> the
> >>>>>>> SSG requirements when the operator requirements are not
> >> specified.
> >>>>>> However,
> >>>>>>> as the first step, I think we should prioritise the use cases
> >> where
> >>>>> users
> >>>>>>> are not that experienced.
> >>>>>>>
> >>>>>>> Thank you~
> >>>>>>>
> >>>>>>> Xintong Song
> >>>>>>>
> >>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> >> chesnay@apache.org>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Will declaring them on slot sharing groups not also waste
> >> resources
> >>>>> if
> >>>>>>>> the parallelism of operators within that group are different?
> >>>>>>>>
> >>>>>>>> It also seems like quite a hassle for users having to
> >> recalculate
> >>>> the
> >>>>>>>> resource requirements if they change the slot sharing.
> >>>>>>>> I'd think that it's not really workable for users that create
> >> a set
> >>>>> of
> >>>>>>>> re-usable operators which are mixed and matched in their
> >>>>> applications;
> >>>>>>>> managing the resources requirements in such a setting would be
> >> a
> >>>>>>>> nightmare, and in the end would require operator-level
> >> requirements
> >>>>> any
> >>>>>>>> way.
> >>>>>>>> In that sense, I'm not even sure whether it really increases
> >>>>> usability.
> >>>>>>>> My main worry is that it if we wire the runtime to work on SSGs
> >>>> it's
> >>>>>>>> gonna be difficult to implement more fine-grained approaches,
> >> which
> >>>>>>>> would not be the case if, for the runtime, they are always
> >> defined
> >>>> on
> >>>>>> an
> >>>>>>>> operator-level.
> >>>>>>>>
> >>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> >>>>>>>>> Thanks for drafting this FLIP and starting this discussion
> >>>> Yangze.
> >>>>>>>>> I like that defining resource requirements on a slot sharing
> >>>> group
> >>>>>>> makes
> >>>>>>>>> the overall setup easier and improves usability of resource
> >>>>>>> requirements.
> >>>>>>>>> What I do not like about it is that it changes slot sharing
> >>>> groups
> >>>>>> from
> >>>>>>>>> being a scheduling hint to something which needs to be
> >> supported
> >>>> in
> >>>>>>> order
> >>>>>>>>> to support fine grained resource requirements. So far, the
> >> idea
> >>>> of
> >>>>>> slot
> >>>>>>>>> sharing groups was that it tells the system that a set of
> >>>> operators
> >>>>>> can
> >>>>>>>> be
> >>>>>>>>> deployed in the same slot. But the system still had the
> >> freedom
> >>>> to
> >>>>>> say
> >>>>>>>> that
> >>>>>>>>> it would rather place these tasks in different slots if it
> >>>> wanted.
> >>>>> If
> >>>>>>> we
> >>>>>>>>> now specify resource requirements on a per slot sharing
> >> group,
> >>>> then
> >>>>>> the
> >>>>>>>>> only option for a scheduler which does not support slot
> >> sharing
> >>>>>> groups
> >>>>>>> is
> >>>>>>>>> to say that every operator in this slot sharing group needs a
> >>>> slot
> >>>>>> with
> >>>>>>>> the
> >>>>>>>>> same resources as the whole group.
> >>>>>>>>>
> >>>>>>>>> So for example, if we have a job consisting of two operator
> >> op_1
> >>>>> and
> >>>>>>> op_2
> >>>>>>>>> where each op needs 100 MB of memory, we would then say that
> >> the
> >>>>> slot
> >>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
> >> cluster
> >>>>>> with
> >>>>>>> 2
> >>>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
> >> this
> >>>>>> job.
> >>>>>>> If
> >>>>>>>>> the resources were specified on an operator level, then the
> >>>> system
> >>>>>>> could
> >>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
> >> TM_2.
> >>>>>>>>> Originally, one of the primary goals of slot sharing groups
> >> was
> >>>> to
> >>>>>> make
> >>>>>>>> it
> >>>>>>>>> easier for the user to reason about how many slots a job
> >> needs
> >>>>>>>> independent
> >>>>>>>>> of the actual number of operators in the job. Interestingly,
> >> if
> >>>> all
> >>>>>>>>> operators have their resources properly specified, then slot
> >>>>> sharing
> >>>>>> is
> >>>>>>>> no
> >>>>>>>>> longer needed because Flink could slice off the appropriately
> >>>> sized
> >>>>>>> slots
> >>>>>>>>> for every Task individually. What matters is whether the
> >> whole
> >>>>>> cluster
> >>>>>>>> has
> >>>>>>>>> enough resources to run all tasks or not.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Till
> >>>>>>>>>
> >>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> >> karmagyz@gmail.com>
> >>>>>> wrote:
> >>>>>>>>>> Hi, there,
> >>>>>>>>>>
> >>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
> >> Runtime
> >>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
> >> where we
> >>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
> >> for
> >>>>>>>>>> specifying fine-grained resource requirements.
> >>>>>>>>>>
> >>>>>>>>>> In this FLIP:
> >>>>>>>>>> - Expound the user story of fine-grained resource
> >> management.
> >>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
> >> resource
> >>>>>>>>>> requirements.
> >>>>>>>>>> - Discuss the pros and cons of the three potential
> >> granularities
> >>>>> for
> >>>>>>>>>> specifying the resource requirements (op, task and slot
> >> sharing
> >>>>>> group)
> >>>>>>>>>> and explain why we choose the slot sharing group.
> >>>>>>>>>>
> >>>>>>>>>> Please find more details in the FLIP wiki document [1].
> >> Looking
> >>>>>>>>>> forward to your feedback.
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >>>>>>>>>> Best,
> >>>>>>>>>> Yangze Guo
> >>>>>>>>>>
> >>>>>>>>
>
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Chesnay Schepler <ch...@apache.org>.

Is there even a functional difference between specifying the 
requirements for an SSG vs specifying the same requirements on a single 
operator within that group (ideally a colocation group to avoid this 
whole hint business)?

Wouldn't we get the best of both worlds in the latter case?

Users can take shortcuts to define shared requirements,
but refine them further as needed on a per-operator basis,
without changing semantics of slotsharing groups
nor the runtime being locked into SSG-based requirements.

(And before anyone argues what happens if slotsharing groups change or 
whatnot, that's a plain API issue that we could surely solve. (A plain 
iteration over slotsharing groups and therein contained operators would 
suffice)).

On 1/20/2021 6:48 PM, Till Rohrmann wrote:
> Maybe a different minor idea: Would it be possible to treat the SSG
> resource requirements as a hint for the runtime similar to how slot sharing
> groups are designed at the moment? Meaning that we don't give the guarantee
> that Flink will always deploy this set of tasks together no matter what
> comes. If, for example, the runtime can derive by some means the resource
> requirements for each task based on the requirements for the SSG, this
> could be possible. One easy strategy would be to give every task the same
> resources as the whole slot sharing group. Another one could be
> distributing the resources equally among the tasks. This does not even have
> to be implemented but we would give ourselves the freedom to change
> scheduling if need should arise.
>
> Cheers,
> Till
>
> On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <ka...@gmail.com> wrote:
>
>> Thanks for the responses, Till and Xintong.
>>
>> I second Xintong's comment that SSG-based runtime interface will give
>> us the flexibility to achieve op/task-based approach. That's one of
>> the most important reasons for our design choice.
>>
>> Some cents regarding the default operator resource:
>> - It might be good for the scenario of DataStream jobs.
>>     ** For light-weight operators, the accumulative configuration error
>> will not be significant. Then, the resource of a task used is
>> proportional to the number of operators it contains.
>>     ** For heavy operators like join and window or operators using the
>> external resources, user will turn to the fine-grained resource
>> configuration.
>> - It can increase the stability for the standalone cluster where task
>> executors registered are heterogeneous(with different default slot
>> resources).
>> - It might not be good for SQL users. The operators that SQL will be
>> transferred to is a black box to the user. We also do not guarantee
>> the cross-version of consistency of the transformation so far.
>>
>> I think it can be treated as a follow-up work when the fine-grained
>> resource management is end-to-end ready.
>>
>> Best,
>> Yangze Guo
>>
>>
>> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <to...@gmail.com>
>> wrote:
>>> Thanks for the feedback, Till.
>>>
>>> ## I feel that what you proposed (operator-based + default value) might
>> be
>>> subsumed by the SSG-based approach.
>>> Thinking of op_1 -> op_2, there are the following 4 cases, categorized by
>>> whether the resource requirements are known to the users.
>>>
>>>     1. *Both known.* As previously mentioned, there's no reason to put
>>>     multiple operators whose individual resource requirements are already
>> known
>>>     into the same group in fine-grained resource management. And if op_1
>> and
>>>     op_2 are in different groups, there should be no problem switching
>> data
>>>     exchange mode from pipelined to blocking. This is equivalent to
>> specifying
>>>     operator resource requirements in your proposal.
>>>     2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is in a
>>>     SSG whose resource is not specified thus would have the default slot
>>>     resource. This is equivalent to having default operator resources in
>> your
>>>     proposal.
>>>     3. *Both unknown*. The user can either set op_1 and op_2 to the same
>> SSG
>>>     or separate SSGs.
>>>        - If op_1 and op_2 are in the same SSG, it will be equivalent to
>> the
>>>        coarse-grained resource management, where op_1 and op_2 share a
>> default
>>>        size slot no matter which data exchange mode is used.
>>>        - If op_1 and op_2 are in different SSGs, then each of them will
>> use
>>>        a default size slot. This is equivalent to setting them with
>> default
>>>        operator resources in your proposal.
>>>     4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
>>>        - It is possible that the user learns the total / max resource
>>>        requirement from executing and monitoring the job, while not
>>> being aware of
>>>        individual operator requirements.
>>>        - I believe this is the case your proposal does not cover. And TBH,
>>>        this is probably how most users learn the resource requirements,
>>> according
>>>        to my experiences.
>>>        - In this case, the user might need to specify different resources
>> if
>>>        he wants to switch the execution mode, which should not be worse
>> than not
>>>        being able to use fine-grained resource management.
>>>
>>>
>>> ## An additional idea inspired by your proposal.
>>> We may provide multiple options for deciding resources for SSGs whose
>>> requirement is not specified, if needed.
>>>
>>>     - Default slot resource (current design)
>>>     - Default operator resource times number of operators (equivalent to
>>>     your proposal)
>>>
>>>
>>> ## Exposing internal runtime strategies
>>> Theoretically, yes. Tying to the SSGs, the resource requirements might be
>>> affected if how SSGs are internally handled changes in future.
>> Practically,
>>> I do not concretely see at the moment what kind of changes we may want in
>>> future that might conflict with this FLIP proposal, as the question of
>>> switching data exchange mode answered above. I'd suggest to not give up
>> the
>>> user friendliness we may gain now for the future problems that may or may
>>> not exist.
>>>
>>> Moreover, the SSG-based approach has the flexibility to achieve the
>>> equivalent behavior as the operator-based approach, if we set each
>> operator
>>> (or task) to a separate SSG. We can even provide a shortcut option to
>>> automatically do that for users, if needed.
>>>
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <tr...@apache.org>
>> wrote:
>>>> Thanks for the responses Xintong and Stephan,
>>>>
>>>> I agree that being able to define the resource requirements for a
>> group of
>>>> operators is more user friendly. However, my concern is that we are
>>>> exposing thereby internal runtime strategies which might limit our
>>>> flexibility to execute a given job. Moreover, the semantics of
>> configuring
>>>> resource requirements for SSGs could break if switching from streaming
>> to
>>>> batch execution. If one defines the resource requirements for op_1 ->
>> op_2
>>>> which run in pipelined mode when using the streaming execution, then
>> how do
>>>> we interpret these requirements when op_1 -> op_2 are executed with a
>>>> blocking data exchange in batch execution mode? Consequently, I am
>> still
>>>> leaning towards Stephan's proposal to set the resource requirements per
>>>> operator.
>>>>
>>>> Maybe the following proposal makes the configuration easier: If the
>> user
>>>> wants to use fine-grained resource requirements, then she needs to
>> specify
>>>> the default size which is used for operators which have no explicit
>>>> resource annotation. If this holds true, then every operator would
>> have a
>>>> resource requirement and the system can try to execute the operators
>> in the
>>>> best possible manner w/o being constrained by how the user set the SSG
>>>> requirements.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for the feedback, Stephan.
>>>>>
>>>>> Actually, your proposal has also come to my mind at some point. And I
>>>> have
>>>>> some concerns about it.
>>>>>
>>>>>
>>>>> 1. It does not give users the same control as the SSG-based approach.
>>>>>
>>>>>
>>>>> While both approaches do not require specifying for each operator,
>>>>> SSG-based approach supports the semantic that "some operators
>> together
>>>> use
>>>>> this much resource" while the operator-based approach doesn't.
>>>>>
>>>>>
>>>>> Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
>> at
>>>> some
>>>>> point there's an agg o_n (1 < n < m) which significantly reduces the
>> data
>>>>> amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ...,
>> o_n)
>>>>> and SSG_2 (o_n+1, ... o_m), so that configuring much higher
>> parallelisms
>>>>> for operators in SSG_1 than for operators in SSG_2 won't lead to too
>> much
>>>>> wasting of resources. If the two SSGs end up needing different
>> resources,
>>>>> with the SSG-based approach one can directly specify resources for
>> the
>>>> two
>>>>> groups. However, with the operator-based approach, the user will
>> have to
>>>>> specify resources for each operator in one of the two groups, and
>> tune
>>>> the
>>>>> default slot resource via configurations to fit the other group.
>>>>>
>>>>>
>>>>> 2. It increases the chance of breaking operator chains.
>>>>>
>>>>>
>>>>> Setting chainnable operators into different slot sharing groups will
>>>>> prevent them from being chained. In the current implementation,
>>>> downstream
>>>>> operators, if SSG not explicitly specified, will be set to the same
>> group
>>>>> as the chainable upstream operators (unless multiple upstream
>> operators
>>>> in
>>>>> different groups), to reduce the chance of breaking chains.
>>>>>
>>>>>
>>>>> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
>> SSGs
>>>>> based on whether resource is specified we will easily get groups like
>>>> (o_1,
>>>>> o_3) & (o_2, o_4), where none of the operators can be chained. This
>> is
>>>> also
>>>>> possible for the SSG-based approach, but I believe the chance is much
>>>>> smaller because there's no strong reason for users to specify the
>> groups
>>>>> with alternate operators like that. We are more likely to get groups
>> like
>>>>> (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and
>> o_3.
>>>>>
>>>>> 3. It complicates the system by having two different mechanisms for
>>>> sharing
>>>>> managed memory in  a slot.
>>>>>
>>>>>
>>>>> - In FLIP-141, we introduced the intra-slot managed memory sharing
>>>>> mechanism, where managed memory is first distributed according to the
>>>>> consumer type, then further distributed across operators of that
>> consumer
>>>>> type.
>>>>>
>>>>> - With the operator-based approach, managed memory size specified
>> for an
>>>>> operator should account for all the consumer types of that operator.
>> That
>>>>> means the managed memory is first distributed across operators, then
>>>>> distributed to different consumer types of each operator.
>>>>>
>>>>>
>>>>> Unfortunately, the different order of the two calculation steps can
>> lead
>>>> to
>>>>> different results. To be specific, the semantic of the configuration
>>>> option
>>>>> `consumer-weights` changed (within a slot vs. within an operator).
>>>>>
>>>>>
>>>>>
>>>>> To sum up things:
>>>>>
>>>>> While (3) might be a bit more implementation related, I think (1)
>> and (2)
>>>>> somehow suggest that, the price for the proposed approach to avoid
>>>>> specifying resource for every operator is that it's not as
>> independent
>>>> from
>>>>> operator chaining and slot sharing as the operator-based approach
>>>> discussed
>>>>> in the FLIP.
>>>>>
>>>>>
>>>>> Thank you~
>>>>>
>>>>> Xintong Song
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org>
>> wrote:
>>>>>> Thanks a lot, Yangze and Xintong for this FLIP.
>>>>>>
>>>>>> I want to say, first of all, that this is super well written. And
>> the
>>>>>> points that the FLIP makes about how to expose the configuration to
>>>> users
>>>>>> is exactly the right thing to figure out first.
>>>>>> So good job here!
>>>>>>
>>>>>> About how to let users specify the resource profiles. If I can sum
>> the
>>>>> FLIP
>>>>>> and previous discussion up in my own words, the problem is the
>>>> following:
>>>>>> Operator-level specification is the simplest and cleanest approach,
>>>>> because
>>>>>>> it avoids mixing operator configuration (resource) and
>> scheduling. No
>>>>>>> matter what other parameters change (chaining, slot sharing,
>>>> switching
>>>>>>> pipelined and blocking shuffles), the resource profiles stay the
>>>> same.
>>>>>>> But it would require that a user specifies resources on all
>>>> operators,
>>>>>>> which makes it hard to use. That's why the FLIP suggests going
>> with
>>>>>>> specifying resources on a Sharing-Group.
>>>>>>
>>>>>> I think both thoughts are important, so can we find a solution
>> where
>>>> the
>>>>>> Resource Profiles are specified on an Operator, but we still avoid
>> that
>>>>> we
>>>>>> need to specify a resource profile on every operator?
>>>>>>
>>>>>> What do you think about something like the following:
>>>>>>    - Resource Profiles are specified on an operator level.
>>>>>>    - Not all operators need profiles
>>>>>>    - All Operators without a Resource Profile ended up in the
>> default
>>>> slot
>>>>>> sharing group with a default profile (will get a default slot).
>>>>>>    - All Operators with a Resource Profile will go into another slot
>>>>> sharing
>>>>>> group (the resource-specified-group).
>>>>>>    - Users can define different slot sharing groups for operators
>> like
>>>>> they
>>>>>> do now, with the exception that you cannot mix operators that have
>> a
>>>>>> resource profile and operators that have no resource profile.
>>>>>>    - The default case where no operator has a resource profile is
>> just a
>>>>>> special case of this model
>>>>>>    - The chaining logic sums up the profiles per operator, like it
>> does
>>>>> now,
>>>>>> and the scheduler sums up the profiles of the tasks that it
>> schedules
>>>>>> together.
>>>>>>
>>>>>>
>>>>>> There is another question about reactive scaling raised in the
>> FLIP. I
>>>>> need
>>>>>> to think a bit about that. That is indeed a bit more tricky once we
>>>> have
>>>>>> slots of different sizes.
>>>>>> It is not clear then which of the different slot requests the
>>>>>> ResourceManager should fulfill when new resources (TMs) show up,
>> or how
>>>>> the
>>>>>> JobManager redistributes the slots resources when resources (TMs)
>>>>> disappear
>>>>>> This question is pretty orthogonal, though, to the "how to specify
>> the
>>>>>> resources".
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Stephan
>>>>>>
>>>>>> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <tonysong820@gmail.com
>>>>> wrote:
>>>>>>> Thanks for drafting the FLIP and driving the discussion, Yangze.
>>>>>>> And Thanks for the feedback, Till and Chesnay.
>>>>>>>
>>>>>>> @Till,
>>>>>>>
>>>>>>> I agree that specifying requirements for SSGs means that SSGs
>> need to
>>>>> be
>>>>>>> supported in fine-grained resource management, otherwise each
>>>> operator
>>>>>>> might use as many resources as the whole group. However, I cannot
>>>> think
>>>>>> of
>>>>>>> a strong reason for not supporting SSGs in fine-grained resource
>>>>>>> management.
>>>>>>>
>>>>>>>
>>>>>>>> Interestingly, if all operators have their resources properly
>>>>>> specified,
>>>>>>>> then slot sharing is no longer needed because Flink could
>> slice off
>>>>> the
>>>>>>>> appropriately sized slots for every Task individually.
>>>>>>>>
>>>>>>> So for example, if we have a job consisting of two operator op_1
>> and
>>>>> op_2
>>>>>>>> where each op needs 100 MB of memory, we would then say that
>> the
>>>> slot
>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>> cluster
>>>>> with
>>>>>> 2
>>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>> this
>>>>> job.
>>>>>> If
>>>>>>>> the resources were specified on an operator level, then the
>> system
>>>>>> could
>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>> TM_2.
>>>>>>>
>>>>>>> Couldn't agree more that if all operators' requirements are
>> properly
>>>>>>> specified, slot sharing should be no longer needed. I think this
>>>>> exactly
>>>>>>> disproves the example. If we already know op_1 and op_2 each
>> needs
>>>> 100
>>>>> MB
>>>>>>> of memory, why would we put them in the same group? If they are
>> in
>>>>>> separate
>>>>>>> groups, with the proposed approach the system can freely deploy
>> them
>>>> to
>>>>>>> either a 200 MB TM or two 100 MB TMs.
>>>>>>>
>>>>>>> Moreover, the precondition for not needing slot sharing is having
>>>>>> resource
>>>>>>> requirements properly specified for all operators. This is not
>> always
>>>>>>> possible, and usually requires tremendous efforts. One of the
>>>> benefits
>>>>>> for
>>>>>>> SSG-based requirements is that it allows the user to freely
>> decide
>>>> the
>>>>>>> granularity, thus efforts they want to pay. I would consider SSG
>> in
>>>>>>> fine-grained resource management as a group of operators that the
>>>> user
>>>>>>> would like to specify the total resource for. There can be only
>> one
>>>>> group
>>>>>>> in the job, 2~3 groups dividing the job into a few major parts,
>> or as
>>>>>> many
>>>>>>> groups as the number of tasks/operators, depending on how
>>>> fine-grained
>>>>>> the
>>>>>>> user is able to specify the resources.
>>>>>>>
>>>>>>> Having to support SSGs might be a constraint. But given that all
>> the
>>>>>>> current scheduler implementations already support SSGs, I tend to
>>>> think
>>>>>>> that as an acceptable price for the above discussed usability and
>>>>>>> flexibility.
>>>>>>>
>>>>>>> @Chesnay
>>>>>>>
>>>>>>> Will declaring them on slot sharing groups not also waste
>> resources
>>>> if
>>>>>> the
>>>>>>>> parallelism of operators within that group are different?
>>>>>>>>
>>>>>>> Yes. It's a trade-off between usability and resource
>> utilization. To
>>>>>> avoid
>>>>>>> such wasting, the user can define more groups, so that each group
>>>>>> contains
>>>>>>> less operators and the chance of having operators with different
>>>>>>> parallelism will be reduced. The price is to have more resource
>>>>>>> requirements to specify.
>>>>>>>
>>>>>>> It also seems like quite a hassle for users having to
>> recalculate the
>>>>>>>> resource requirements if they change the slot sharing.
>>>>>>>> I'd think that it's not really workable for users that create
>> a set
>>>>> of
>>>>>>>> re-usable operators which are mixed and matched in their
>>>>> applications;
>>>>>>>> managing the resources requirements in such a setting would be
>> a
>>>>>>>> nightmare, and in the end would require operator-level
>> requirements
>>>>> any
>>>>>>>> way.
>>>>>>>> In that sense, I'm not even sure whether it really increases
>>>>> usability.
>>>>>>>     - As mentioned in my reply to Till's comment, there's no
>> reason to
>>>>> put
>>>>>>>     multiple operators whose individual resource requirements are
>>>>> already
>>>>>>> known
>>>>>>>     into the same group in fine-grained resource management.
>>>>>>>     - Even an operator implementation is reused for multiple
>>>>> applications,
>>>>>>>     it does not guarantee the same resource requirements. During
>> our
>>>>> years
>>>>>>> of
>>>>>>>     practices in Alibaba, with per-operator requirements
>> specified for
>>>>>>> Blink's
>>>>>>>     fine-grained resource management, very few users (including
>> our
>>>>>>> specialists
>>>>>>>     who are dedicated to supporting Blink users) are as
>> experienced as
>>>>> to
>>>>>>>     accurately predict/estimate the operator resource
>> requirements.
>>>> Most
>>>>>>> people
>>>>>>>     rely on the execution-time metrics (throughput, delay, cpu
>> load,
>>>>>> memory
>>>>>>>     usage, GC pressure, etc.) to improve the specification.
>>>>>>>
>>>>>>> To sum up:
>>>>>>> If the user is capable of providing proper resource requirements
>> for
>>>>>> every
>>>>>>> operator, that's definitely a good thing and we would not need to
>>>> rely
>>>>> on
>>>>>>> the SSGs. However, that shouldn't be a *must* for the
>> fine-grained
>>>>>> resource
>>>>>>> management to work. For those users who are capable and do not
>> like
>>>>>> having
>>>>>>> to set each operator to a separate SSG, I would be ok to have
>> both
>>>>>>> SSG-based and operator-based runtime interfaces and to only
>> fallback
>>>> to
>>>>>> the
>>>>>>> SSG requirements when the operator requirements are not
>> specified.
>>>>>> However,
>>>>>>> as the first step, I think we should prioritise the use cases
>> where
>>>>> users
>>>>>>> are not that experienced.
>>>>>>>
>>>>>>> Thank you~
>>>>>>>
>>>>>>> Xintong Song
>>>>>>>
>>>>>>> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
>> chesnay@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Will declaring them on slot sharing groups not also waste
>> resources
>>>>> if
>>>>>>>> the parallelism of operators within that group are different?
>>>>>>>>
>>>>>>>> It also seems like quite a hassle for users having to
>> recalculate
>>>> the
>>>>>>>> resource requirements if they change the slot sharing.
>>>>>>>> I'd think that it's not really workable for users that create
>> a set
>>>>> of
>>>>>>>> re-usable operators which are mixed and matched in their
>>>>> applications;
>>>>>>>> managing the resources requirements in such a setting would be
>> a
>>>>>>>> nightmare, and in the end would require operator-level
>> requirements
>>>>> any
>>>>>>>> way.
>>>>>>>> In that sense, I'm not even sure whether it really increases
>>>>> usability.
>>>>>>>> My main worry is that it if we wire the runtime to work on SSGs
>>>> it's
>>>>>>>> gonna be difficult to implement more fine-grained approaches,
>> which
>>>>>>>> would not be the case if, for the runtime, they are always
>> defined
>>>> on
>>>>>> an
>>>>>>>> operator-level.
>>>>>>>>
>>>>>>>> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
>>>>>>>>> Thanks for drafting this FLIP and starting this discussion
>>>> Yangze.
>>>>>>>>> I like that defining resource requirements on a slot sharing
>>>> group
>>>>>>> makes
>>>>>>>>> the overall setup easier and improves usability of resource
>>>>>>> requirements.
>>>>>>>>> What I do not like about it is that it changes slot sharing
>>>> groups
>>>>>> from
>>>>>>>>> being a scheduling hint to something which needs to be
>> supported
>>>> in
>>>>>>> order
>>>>>>>>> to support fine grained resource requirements. So far, the
>> idea
>>>> of
>>>>>> slot
>>>>>>>>> sharing groups was that it tells the system that a set of
>>>> operators
>>>>>> can
>>>>>>>> be
>>>>>>>>> deployed in the same slot. But the system still had the
>> freedom
>>>> to
>>>>>> say
>>>>>>>> that
>>>>>>>>> it would rather place these tasks in different slots if it
>>>> wanted.
>>>>> If
>>>>>>> we
>>>>>>>>> now specify resource requirements on a per slot sharing
>> group,
>>>> then
>>>>>> the
>>>>>>>>> only option for a scheduler which does not support slot
>> sharing
>>>>>> groups
>>>>>>> is
>>>>>>>>> to say that every operator in this slot sharing group needs a
>>>> slot
>>>>>> with
>>>>>>>> the
>>>>>>>>> same resources as the whole group.
>>>>>>>>>
>>>>>>>>> So for example, if we have a job consisting of two operator
>> op_1
>>>>> and
>>>>>>> op_2
>>>>>>>>> where each op needs 100 MB of memory, we would then say that
>> the
>>>>> slot
>>>>>>>>> sharing group needs 200 MB of memory to run. If we have a
>> cluster
>>>>>> with
>>>>>>> 2
>>>>>>>>> TMs with one slot of 100 MB each, then the system cannot run
>> this
>>>>>> job.
>>>>>>> If
>>>>>>>>> the resources were specified on an operator level, then the
>>>> system
>>>>>>> could
>>>>>>>>> still make the decision to deploy op_1 to TM_1 and op_2 to
>> TM_2.
>>>>>>>>> Originally, one of the primary goals of slot sharing groups
>> was
>>>> to
>>>>>> make
>>>>>>>> it
>>>>>>>>> easier for the user to reason about how many slots a job
>> needs
>>>>>>>> independent
>>>>>>>>> of the actual number of operators in the job. Interestingly,
>> if
>>>> all
>>>>>>>>> operators have their resources properly specified, then slot
>>>>> sharing
>>>>>> is
>>>>>>>> no
>>>>>>>>> longer needed because Flink could slice off the appropriately
>>>> sized
>>>>>>> slots
>>>>>>>>> for every Task individually. What matters is whether the
>> whole
>>>>>> cluster
>>>>>>>> has
>>>>>>>>> enough resources to run all tasks or not.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Till
>>>>>>>>>
>>>>>>>>> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
>> karmagyz@gmail.com>
>>>>>> wrote:
>>>>>>>>>> Hi, there,
>>>>>>>>>>
>>>>>>>>>> We would like to start a discussion thread on "FLIP-156:
>> Runtime
>>>>>>>>>> Interfaces for Fine-Grained Resource Requirements"[1],
>> where we
>>>>>>>>>> propose Slot Sharing Group (SSG) based runtime interfaces
>> for
>>>>>>>>>> specifying fine-grained resource requirements.
>>>>>>>>>>
>>>>>>>>>> In this FLIP:
>>>>>>>>>> - Expound the user story of fine-grained resource
>> management.
>>>>>>>>>> - Propose runtime interfaces for specifying SSG-based
>> resource
>>>>>>>>>> requirements.
>>>>>>>>>> - Discuss the pros and cons of the three potential
>> granularities
>>>>> for
>>>>>>>>>> specifying the resource requirements (op, task and slot
>> sharing
>>>>>> group)
>>>>>>>>>> and explain why we choose the slot sharing group.
>>>>>>>>>>
>>>>>>>>>> Please find more details in the FLIP wiki document [1].
>> Looking
>>>>>>>>>> forward to your feedback.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>>>>>>>>>> Best,
>>>>>>>>>> Yangze Guo
>>>>>>>>>>
>>>>>>>>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Till Rohrmann <tr...@apache.org>.

Maybe a different minor idea: Would it be possible to treat the SSG
resource requirements as a hint for the runtime similar to how slot sharing
groups are designed at the moment? Meaning that we don't give the guarantee
that Flink will always deploy this set of tasks together no matter what
comes. If, for example, the runtime can derive by some means the resource
requirements for each task based on the requirements for the SSG, this
could be possible. One easy strategy would be to give every task the same
resources as the whole slot sharing group. Another one could be
distributing the resources equally among the tasks. This does not even have
to be implemented but we would give ourselves the freedom to change
scheduling if need should arise.

Cheers,
Till

On Wed, Jan 20, 2021 at 7:04 AM Yangze Guo <ka...@gmail.com> wrote:

> Thanks for the responses, Till and Xintong.
>
> I second Xintong's comment that SSG-based runtime interface will give
> us the flexibility to achieve op/task-based approach. That's one of
> the most important reasons for our design choice.
>
> Some cents regarding the default operator resource:
> - It might be good for the scenario of DataStream jobs.
>    ** For light-weight operators, the accumulative configuration error
> will not be significant. Then, the resource of a task used is
> proportional to the number of operators it contains.
>    ** For heavy operators like join and window or operators using the
> external resources, user will turn to the fine-grained resource
> configuration.
> - It can increase the stability for the standalone cluster where task
> executors registered are heterogeneous(with different default slot
> resources).
> - It might not be good for SQL users. The operators that SQL will be
> transferred to is a black box to the user. We also do not guarantee
> the cross-version of consistency of the transformation so far.
>
> I think it can be treated as a follow-up work when the fine-grained
> resource management is end-to-end ready.
>
> Best,
> Yangze Guo
>
>
> On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > Thanks for the feedback, Till.
> >
> > ## I feel that what you proposed (operator-based + default value) might
> be
> > subsumed by the SSG-based approach.
> > Thinking of op_1 -> op_2, there are the following 4 cases, categorized by
> > whether the resource requirements are known to the users.
> >
> >    1. *Both known.* As previously mentioned, there's no reason to put
> >    multiple operators whose individual resource requirements are already
> known
> >    into the same group in fine-grained resource management. And if op_1
> and
> >    op_2 are in different groups, there should be no problem switching
> data
> >    exchange mode from pipelined to blocking. This is equivalent to
> specifying
> >    operator resource requirements in your proposal.
> >    2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is in a
> >    SSG whose resource is not specified thus would have the default slot
> >    resource. This is equivalent to having default operator resources in
> your
> >    proposal.
> >    3. *Both unknown*. The user can either set op_1 and op_2 to the same
> SSG
> >    or separate SSGs.
> >       - If op_1 and op_2 are in the same SSG, it will be equivalent to
> the
> >       coarse-grained resource management, where op_1 and op_2 share a
> default
> >       size slot no matter which data exchange mode is used.
> >       - If op_1 and op_2 are in different SSGs, then each of them will
> use
> >       a default size slot. This is equivalent to setting them with
> default
> >       operator resources in your proposal.
> >    4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
> >       - It is possible that the user learns the total / max resource
> >       requirement from executing and monitoring the job, while not
> > being aware of
> >       individual operator requirements.
> >       - I believe this is the case your proposal does not cover. And TBH,
> >       this is probably how most users learn the resource requirements,
> > according
> >       to my experiences.
> >       - In this case, the user might need to specify different resources
> if
> >       he wants to switch the execution mode, which should not be worse
> than not
> >       being able to use fine-grained resource management.
> >
> >
> > ## An additional idea inspired by your proposal.
> > We may provide multiple options for deciding resources for SSGs whose
> > requirement is not specified, if needed.
> >
> >    - Default slot resource (current design)
> >    - Default operator resource times number of operators (equivalent to
> >    your proposal)
> >
> >
> > ## Exposing internal runtime strategies
> > Theoretically, yes. Tying to the SSGs, the resource requirements might be
> > affected if how SSGs are internally handled changes in future.
> Practically,
> > I do not concretely see at the moment what kind of changes we may want in
> > future that might conflict with this FLIP proposal, as the question of
> > switching data exchange mode answered above. I'd suggest to not give up
> the
> > user friendliness we may gain now for the future problems that may or may
> > not exist.
> >
> > Moreover, the SSG-based approach has the flexibility to achieve the
> > equivalent behavior as the operator-based approach, if we set each
> operator
> > (or task) to a separate SSG. We can even provide a shortcut option to
> > automatically do that for users, if needed.
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > > Thanks for the responses Xintong and Stephan,
> > >
> > > I agree that being able to define the resource requirements for a
> group of
> > > operators is more user friendly. However, my concern is that we are
> > > exposing thereby internal runtime strategies which might limit our
> > > flexibility to execute a given job. Moreover, the semantics of
> configuring
> > > resource requirements for SSGs could break if switching from streaming
> to
> > > batch execution. If one defines the resource requirements for op_1 ->
> op_2
> > > which run in pipelined mode when using the streaming execution, then
> how do
> > > we interpret these requirements when op_1 -> op_2 are executed with a
> > > blocking data exchange in batch execution mode? Consequently, I am
> still
> > > leaning towards Stephan's proposal to set the resource requirements per
> > > operator.
> > >
> > > Maybe the following proposal makes the configuration easier: If the
> user
> > > wants to use fine-grained resource requirements, then she needs to
> specify
> > > the default size which is used for operators which have no explicit
> > > resource annotation. If this holds true, then every operator would
> have a
> > > resource requirement and the system can try to execute the operators
> in the
> > > best possible manner w/o being constrained by how the user set the SSG
> > > requirements.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for the feedback, Stephan.
> > > >
> > > > Actually, your proposal has also come to my mind at some point. And I
> > > have
> > > > some concerns about it.
> > > >
> > > >
> > > > 1. It does not give users the same control as the SSG-based approach.
> > > >
> > > >
> > > > While both approaches do not require specifying for each operator,
> > > > SSG-based approach supports the semantic that "some operators
> together
> > > use
> > > > this much resource" while the operator-based approach doesn't.
> > > >
> > > >
> > > > Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and
> at
> > > some
> > > > point there's an agg o_n (1 < n < m) which significantly reduces the
> data
> > > > amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ...,
> o_n)
> > > > and SSG_2 (o_n+1, ... o_m), so that configuring much higher
> parallelisms
> > > > for operators in SSG_1 than for operators in SSG_2 won't lead to too
> much
> > > > wasting of resources. If the two SSGs end up needing different
> resources,
> > > > with the SSG-based approach one can directly specify resources for
> the
> > > two
> > > > groups. However, with the operator-based approach, the user will
> have to
> > > > specify resources for each operator in one of the two groups, and
> tune
> > > the
> > > > default slot resource via configurations to fit the other group.
> > > >
> > > >
> > > > 2. It increases the chance of breaking operator chains.
> > > >
> > > >
> > > > Setting chainnable operators into different slot sharing groups will
> > > > prevent them from being chained. In the current implementation,
> > > downstream
> > > > operators, if SSG not explicitly specified, will be set to the same
> group
> > > > as the chainable upstream operators (unless multiple upstream
> operators
> > > in
> > > > different groups), to reduce the chance of breaking chains.
> > > >
> > > >
> > > > Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding
> SSGs
> > > > based on whether resource is specified we will easily get groups like
> > > (o_1,
> > > > o_3) & (o_2, o_4), where none of the operators can be chained. This
> is
> > > also
> > > > possible for the SSG-based approach, but I believe the chance is much
> > > > smaller because there's no strong reason for users to specify the
> groups
> > > > with alternate operators like that. We are more likely to get groups
> like
> > > > (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and
> o_3.
> > > >
> > > >
> > > > 3. It complicates the system by having two different mechanisms for
> > > sharing
> > > > managed memory in  a slot.
> > > >
> > > >
> > > > - In FLIP-141, we introduced the intra-slot managed memory sharing
> > > > mechanism, where managed memory is first distributed according to the
> > > > consumer type, then further distributed across operators of that
> consumer
> > > > type.
> > > >
> > > > - With the operator-based approach, managed memory size specified
> for an
> > > > operator should account for all the consumer types of that operator.
> That
> > > > means the managed memory is first distributed across operators, then
> > > > distributed to different consumer types of each operator.
> > > >
> > > >
> > > > Unfortunately, the different order of the two calculation steps can
> lead
> > > to
> > > > different results. To be specific, the semantic of the configuration
> > > option
> > > > `consumer-weights` changed (within a slot vs. within an operator).
> > > >
> > > >
> > > >
> > > > To sum up things:
> > > >
> > > > While (3) might be a bit more implementation related, I think (1)
> and (2)
> > > > somehow suggest that, the price for the proposed approach to avoid
> > > > specifying resource for every operator is that it's not as
> independent
> > > from
> > > > operator chaining and slot sharing as the operator-based approach
> > > discussed
> > > > in the FLIP.
> > > >
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org>
> wrote:
> > > >
> > > > > Thanks a lot, Yangze and Xintong for this FLIP.
> > > > >
> > > > > I want to say, first of all, that this is super well written. And
> the
> > > > > points that the FLIP makes about how to expose the configuration to
> > > users
> > > > > is exactly the right thing to figure out first.
> > > > > So good job here!
> > > > >
> > > > > About how to let users specify the resource profiles. If I can sum
> the
> > > > FLIP
> > > > > and previous discussion up in my own words, the problem is the
> > > following:
> > > > >
> > > > > Operator-level specification is the simplest and cleanest approach,
> > > > because
> > > > > > it avoids mixing operator configuration (resource) and
> scheduling. No
> > > > > > matter what other parameters change (chaining, slot sharing,
> > > switching
> > > > > > pipelined and blocking shuffles), the resource profiles stay the
> > > same.
> > > > > > But it would require that a user specifies resources on all
> > > operators,
> > > > > > which makes it hard to use. That's why the FLIP suggests going
> with
> > > > > > specifying resources on a Sharing-Group.
> > > > >
> > > > >
> > > > > I think both thoughts are important, so can we find a solution
> where
> > > the
> > > > > Resource Profiles are specified on an Operator, but we still avoid
> that
> > > > we
> > > > > need to specify a resource profile on every operator?
> > > > >
> > > > > What do you think about something like the following:
> > > > >   - Resource Profiles are specified on an operator level.
> > > > >   - Not all operators need profiles
> > > > >   - All Operators without a Resource Profile ended up in the
> default
> > > slot
> > > > > sharing group with a default profile (will get a default slot).
> > > > >   - All Operators with a Resource Profile will go into another slot
> > > > sharing
> > > > > group (the resource-specified-group).
> > > > >   - Users can define different slot sharing groups for operators
> like
> > > > they
> > > > > do now, with the exception that you cannot mix operators that have
> a
> > > > > resource profile and operators that have no resource profile.
> > > > >   - The default case where no operator has a resource profile is
> just a
> > > > > special case of this model
> > > > >   - The chaining logic sums up the profiles per operator, like it
> does
> > > > now,
> > > > > and the scheduler sums up the profiles of the tasks that it
> schedules
> > > > > together.
> > > > >
> > > > >
> > > > > There is another question about reactive scaling raised in the
> FLIP. I
> > > > need
> > > > > to think a bit about that. That is indeed a bit more tricky once we
> > > have
> > > > > slots of different sizes.
> > > > > It is not clear then which of the different slot requests the
> > > > > ResourceManager should fulfill when new resources (TMs) show up,
> or how
> > > > the
> > > > > JobManager redistributes the slots resources when resources (TMs)
> > > > disappear
> > > > > This question is pretty orthogonal, though, to the "how to specify
> the
> > > > > resources".
> > > > >
> > > > >
> > > > > Best,
> > > > > Stephan
> > > > >
> > > > > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <tonysong820@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Thanks for drafting the FLIP and driving the discussion, Yangze.
> > > > > > And Thanks for the feedback, Till and Chesnay.
> > > > > >
> > > > > > @Till,
> > > > > >
> > > > > > I agree that specifying requirements for SSGs means that SSGs
> need to
> > > > be
> > > > > > supported in fine-grained resource management, otherwise each
> > > operator
> > > > > > might use as many resources as the whole group. However, I cannot
> > > think
> > > > > of
> > > > > > a strong reason for not supporting SSGs in fine-grained resource
> > > > > > management.
> > > > > >
> > > > > >
> > > > > > > Interestingly, if all operators have their resources properly
> > > > > specified,
> > > > > > > then slot sharing is no longer needed because Flink could
> slice off
> > > > the
> > > > > > > appropriately sized slots for every Task individually.
> > > > > > >
> > > > > >
> > > > > > So for example, if we have a job consisting of two operator op_1
> and
> > > > op_2
> > > > > > > where each op needs 100 MB of memory, we would then say that
> the
> > > slot
> > > > > > > sharing group needs 200 MB of memory to run. If we have a
> cluster
> > > > with
> > > > > 2
> > > > > > > TMs with one slot of 100 MB each, then the system cannot run
> this
> > > > job.
> > > > > If
> > > > > > > the resources were specified on an operator level, then the
> system
> > > > > could
> > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> TM_2.
> > > > > >
> > > > > >
> > > > > > Couldn't agree more that if all operators' requirements are
> properly
> > > > > > specified, slot sharing should be no longer needed. I think this
> > > > exactly
> > > > > > disproves the example. If we already know op_1 and op_2 each
> needs
> > > 100
> > > > MB
> > > > > > of memory, why would we put them in the same group? If they are
> in
> > > > > separate
> > > > > > groups, with the proposed approach the system can freely deploy
> them
> > > to
> > > > > > either a 200 MB TM or two 100 MB TMs.
> > > > > >
> > > > > > Moreover, the precondition for not needing slot sharing is having
> > > > > resource
> > > > > > requirements properly specified for all operators. This is not
> always
> > > > > > possible, and usually requires tremendous efforts. One of the
> > > benefits
> > > > > for
> > > > > > SSG-based requirements is that it allows the user to freely
> decide
> > > the
> > > > > > granularity, thus efforts they want to pay. I would consider SSG
> in
> > > > > > fine-grained resource management as a group of operators that the
> > > user
> > > > > > would like to specify the total resource for. There can be only
> one
> > > > group
> > > > > > in the job, 2~3 groups dividing the job into a few major parts,
> or as
> > > > > many
> > > > > > groups as the number of tasks/operators, depending on how
> > > fine-grained
> > > > > the
> > > > > > user is able to specify the resources.
> > > > > >
> > > > > > Having to support SSGs might be a constraint. But given that all
> the
> > > > > > current scheduler implementations already support SSGs, I tend to
> > > think
> > > > > > that as an acceptable price for the above discussed usability and
> > > > > > flexibility.
> > > > > >
> > > > > > @Chesnay
> > > > > >
> > > > > > Will declaring them on slot sharing groups not also waste
> resources
> > > if
> > > > > the
> > > > > > > parallelism of operators within that group are different?
> > > > > > >
> > > > > > Yes. It's a trade-off between usability and resource
> utilization. To
> > > > > avoid
> > > > > > such wasting, the user can define more groups, so that each group
> > > > > contains
> > > > > > less operators and the chance of having operators with different
> > > > > > parallelism will be reduced. The price is to have more resource
> > > > > > requirements to specify.
> > > > > >
> > > > > > It also seems like quite a hassle for users having to
> recalculate the
> > > > > > > resource requirements if they change the slot sharing.
> > > > > > > I'd think that it's not really workable for users that create
> a set
> > > > of
> > > > > > > re-usable operators which are mixed and matched in their
> > > > applications;
> > > > > > > managing the resources requirements in such a setting would be
> a
> > > > > > > nightmare, and in the end would require operator-level
> requirements
> > > > any
> > > > > > > way.
> > > > > > > In that sense, I'm not even sure whether it really increases
> > > > usability.
> > > > > > >
> > > > > >
> > > > > >    - As mentioned in my reply to Till's comment, there's no
> reason to
> > > > put
> > > > > >    multiple operators whose individual resource requirements are
> > > > already
> > > > > > known
> > > > > >    into the same group in fine-grained resource management.
> > > > > >    - Even an operator implementation is reused for multiple
> > > > applications,
> > > > > >    it does not guarantee the same resource requirements. During
> our
> > > > years
> > > > > > of
> > > > > >    practices in Alibaba, with per-operator requirements
> specified for
> > > > > > Blink's
> > > > > >    fine-grained resource management, very few users (including
> our
> > > > > > specialists
> > > > > >    who are dedicated to supporting Blink users) are as
> experienced as
> > > > to
> > > > > >    accurately predict/estimate the operator resource
> requirements.
> > > Most
> > > > > > people
> > > > > >    rely on the execution-time metrics (throughput, delay, cpu
> load,
> > > > > memory
> > > > > >    usage, GC pressure, etc.) to improve the specification.
> > > > > >
> > > > > > To sum up:
> > > > > > If the user is capable of providing proper resource requirements
> for
> > > > > every
> > > > > > operator, that's definitely a good thing and we would not need to
> > > rely
> > > > on
> > > > > > the SSGs. However, that shouldn't be a *must* for the
> fine-grained
> > > > > resource
> > > > > > management to work. For those users who are capable and do not
> like
> > > > > having
> > > > > > to set each operator to a separate SSG, I would be ok to have
> both
> > > > > > SSG-based and operator-based runtime interfaces and to only
> fallback
> > > to
> > > > > the
> > > > > > SSG requirements when the operator requirements are not
> specified.
> > > > > However,
> > > > > > as the first step, I think we should prioritise the use cases
> where
> > > > users
> > > > > > are not that experienced.
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <
> chesnay@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Will declaring them on slot sharing groups not also waste
> resources
> > > > if
> > > > > > > the parallelism of operators within that group are different?
> > > > > > >
> > > > > > > It also seems like quite a hassle for users having to
> recalculate
> > > the
> > > > > > > resource requirements if they change the slot sharing.
> > > > > > > I'd think that it's not really workable for users that create
> a set
> > > > of
> > > > > > > re-usable operators which are mixed and matched in their
> > > > applications;
> > > > > > > managing the resources requirements in such a setting would be
> a
> > > > > > > nightmare, and in the end would require operator-level
> requirements
> > > > any
> > > > > > > way.
> > > > > > > In that sense, I'm not even sure whether it really increases
> > > > usability.
> > > > > > >
> > > > > > > My main worry is that it if we wire the runtime to work on SSGs
> > > it's
> > > > > > > gonna be difficult to implement more fine-grained approaches,
> which
> > > > > > > would not be the case if, for the runtime, they are always
> defined
> > > on
> > > > > an
> > > > > > > operator-level.
> > > > > > >
> > > > > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > > > Thanks for drafting this FLIP and starting this discussion
> > > Yangze.
> > > > > > > >
> > > > > > > > I like that defining resource requirements on a slot sharing
> > > group
> > > > > > makes
> > > > > > > > the overall setup easier and improves usability of resource
> > > > > > requirements.
> > > > > > > >
> > > > > > > > What I do not like about it is that it changes slot sharing
> > > groups
> > > > > from
> > > > > > > > being a scheduling hint to something which needs to be
> supported
> > > in
> > > > > > order
> > > > > > > > to support fine grained resource requirements. So far, the
> idea
> > > of
> > > > > slot
> > > > > > > > sharing groups was that it tells the system that a set of
> > > operators
> > > > > can
> > > > > > > be
> > > > > > > > deployed in the same slot. But the system still had the
> freedom
> > > to
> > > > > say
> > > > > > > that
> > > > > > > > it would rather place these tasks in different slots if it
> > > wanted.
> > > > If
> > > > > > we
> > > > > > > > now specify resource requirements on a per slot sharing
> group,
> > > then
> > > > > the
> > > > > > > > only option for a scheduler which does not support slot
> sharing
> > > > > groups
> > > > > > is
> > > > > > > > to say that every operator in this slot sharing group needs a
> > > slot
> > > > > with
> > > > > > > the
> > > > > > > > same resources as the whole group.
> > > > > > > >
> > > > > > > > So for example, if we have a job consisting of two operator
> op_1
> > > > and
> > > > > > op_2
> > > > > > > > where each op needs 100 MB of memory, we would then say that
> the
> > > > slot
> > > > > > > > sharing group needs 200 MB of memory to run. If we have a
> cluster
> > > > > with
> > > > > > 2
> > > > > > > > TMs with one slot of 100 MB each, then the system cannot run
> this
> > > > > job.
> > > > > > If
> > > > > > > > the resources were specified on an operator level, then the
> > > system
> > > > > > could
> > > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to
> TM_2.
> > > > > > > >
> > > > > > > > Originally, one of the primary goals of slot sharing groups
> was
> > > to
> > > > > make
> > > > > > > it
> > > > > > > > easier for the user to reason about how many slots a job
> needs
> > > > > > > independent
> > > > > > > > of the actual number of operators in the job. Interestingly,
> if
> > > all
> > > > > > > > operators have their resources properly specified, then slot
> > > > sharing
> > > > > is
> > > > > > > no
> > > > > > > > longer needed because Flink could slice off the appropriately
> > > sized
> > > > > > slots
> > > > > > > > for every Task individually. What matters is whether the
> whole
> > > > > cluster
> > > > > > > has
> > > > > > > > enough resources to run all tasks or not.
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <
> karmagyz@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi, there,
> > > > > > > >>
> > > > > > > >> We would like to start a discussion thread on "FLIP-156:
> Runtime
> > > > > > > >> Interfaces for Fine-Grained Resource Requirements"[1],
> where we
> > > > > > > >> propose Slot Sharing Group (SSG) based runtime interfaces
> for
> > > > > > > >> specifying fine-grained resource requirements.
> > > > > > > >>
> > > > > > > >> In this FLIP:
> > > > > > > >> - Expound the user story of fine-grained resource
> management.
> > > > > > > >> - Propose runtime interfaces for specifying SSG-based
> resource
> > > > > > > >> requirements.
> > > > > > > >> - Discuss the pros and cons of the three potential
> granularities
> > > > for
> > > > > > > >> specifying the resource requirements (op, task and slot
> sharing
> > > > > group)
> > > > > > > >> and explain why we choose the slot sharing group.
> > > > > > > >>
> > > > > > > >> Please find more details in the FLIP wiki document [1].
> Looking
> > > > > > > >> forward to your feedback.
> > > > > > > >>
> > > > > > > >> [1]
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Yangze Guo
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Yangze Guo <ka...@gmail.com>.

Thanks for the responses, Till and Xintong.

I second Xintong's comment that SSG-based runtime interface will give
us the flexibility to achieve op/task-based approach. That's one of
the most important reasons for our design choice.

Some cents regarding the default operator resource:
- It might be good for the scenario of DataStream jobs.
   ** For light-weight operators, the accumulative configuration error
will not be significant. Then, the resource of a task used is
proportional to the number of operators it contains.
   ** For heavy operators like join and window or operators using the
external resources, user will turn to the fine-grained resource
configuration.
- It can increase the stability for the standalone cluster where task
executors registered are heterogeneous(with different default slot
resources).
- It might not be good for SQL users. The operators that SQL will be
transferred to is a black box to the user. We also do not guarantee
the cross-version of consistency of the transformation so far.

I think it can be treated as a follow-up work when the fine-grained
resource management is end-to-end ready.

Best,
Yangze Guo


On Wed, Jan 20, 2021 at 11:16 AM Xintong Song <to...@gmail.com> wrote:
>
> Thanks for the feedback, Till.
>
> ## I feel that what you proposed (operator-based + default value) might be
> subsumed by the SSG-based approach.
> Thinking of op_1 -> op_2, there are the following 4 cases, categorized by
> whether the resource requirements are known to the users.
>
>    1. *Both known.* As previously mentioned, there's no reason to put
>    multiple operators whose individual resource requirements are already known
>    into the same group in fine-grained resource management. And if op_1 and
>    op_2 are in different groups, there should be no problem switching data
>    exchange mode from pipelined to blocking. This is equivalent to specifying
>    operator resource requirements in your proposal.
>    2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is in a
>    SSG whose resource is not specified thus would have the default slot
>    resource. This is equivalent to having default operator resources in your
>    proposal.
>    3. *Both unknown*. The user can either set op_1 and op_2 to the same SSG
>    or separate SSGs.
>       - If op_1 and op_2 are in the same SSG, it will be equivalent to the
>       coarse-grained resource management, where op_1 and op_2 share a default
>       size slot no matter which data exchange mode is used.
>       - If op_1 and op_2 are in different SSGs, then each of them will use
>       a default size slot. This is equivalent to setting them with default
>       operator resources in your proposal.
>    4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
>       - It is possible that the user learns the total / max resource
>       requirement from executing and monitoring the job, while not
> being aware of
>       individual operator requirements.
>       - I believe this is the case your proposal does not cover. And TBH,
>       this is probably how most users learn the resource requirements,
> according
>       to my experiences.
>       - In this case, the user might need to specify different resources if
>       he wants to switch the execution mode, which should not be worse than not
>       being able to use fine-grained resource management.
>
>
> ## An additional idea inspired by your proposal.
> We may provide multiple options for deciding resources for SSGs whose
> requirement is not specified, if needed.
>
>    - Default slot resource (current design)
>    - Default operator resource times number of operators (equivalent to
>    your proposal)
>
>
> ## Exposing internal runtime strategies
> Theoretically, yes. Tying to the SSGs, the resource requirements might be
> affected if how SSGs are internally handled changes in future. Practically,
> I do not concretely see at the moment what kind of changes we may want in
> future that might conflict with this FLIP proposal, as the question of
> switching data exchange mode answered above. I'd suggest to not give up the
> user friendliness we may gain now for the future problems that may or may
> not exist.
>
> Moreover, the SSG-based approach has the flexibility to achieve the
> equivalent behavior as the operator-based approach, if we set each operator
> (or task) to a separate SSG. We can even provide a shortcut option to
> automatically do that for users, if needed.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <tr...@apache.org> wrote:
>
> > Thanks for the responses Xintong and Stephan,
> >
> > I agree that being able to define the resource requirements for a group of
> > operators is more user friendly. However, my concern is that we are
> > exposing thereby internal runtime strategies which might limit our
> > flexibility to execute a given job. Moreover, the semantics of configuring
> > resource requirements for SSGs could break if switching from streaming to
> > batch execution. If one defines the resource requirements for op_1 -> op_2
> > which run in pipelined mode when using the streaming execution, then how do
> > we interpret these requirements when op_1 -> op_2 are executed with a
> > blocking data exchange in batch execution mode? Consequently, I am still
> > leaning towards Stephan's proposal to set the resource requirements per
> > operator.
> >
> > Maybe the following proposal makes the configuration easier: If the user
> > wants to use fine-grained resource requirements, then she needs to specify
> > the default size which is used for operators which have no explicit
> > resource annotation. If this holds true, then every operator would have a
> > resource requirement and the system can try to execute the operators in the
> > best possible manner w/o being constrained by how the user set the SSG
> > requirements.
> >
> > Cheers,
> > Till
> >
> > On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com>
> > wrote:
> >
> > > Thanks for the feedback, Stephan.
> > >
> > > Actually, your proposal has also come to my mind at some point. And I
> > have
> > > some concerns about it.
> > >
> > >
> > > 1. It does not give users the same control as the SSG-based approach.
> > >
> > >
> > > While both approaches do not require specifying for each operator,
> > > SSG-based approach supports the semantic that "some operators together
> > use
> > > this much resource" while the operator-based approach doesn't.
> > >
> > >
> > > Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and at
> > some
> > > point there's an agg o_n (1 < n < m) which significantly reduces the data
> > > amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ..., o_n)
> > > and SSG_2 (o_n+1, ... o_m), so that configuring much higher parallelisms
> > > for operators in SSG_1 than for operators in SSG_2 won't lead to too much
> > > wasting of resources. If the two SSGs end up needing different resources,
> > > with the SSG-based approach one can directly specify resources for the
> > two
> > > groups. However, with the operator-based approach, the user will have to
> > > specify resources for each operator in one of the two groups, and tune
> > the
> > > default slot resource via configurations to fit the other group.
> > >
> > >
> > > 2. It increases the chance of breaking operator chains.
> > >
> > >
> > > Setting chainnable operators into different slot sharing groups will
> > > prevent them from being chained. In the current implementation,
> > downstream
> > > operators, if SSG not explicitly specified, will be set to the same group
> > > as the chainable upstream operators (unless multiple upstream operators
> > in
> > > different groups), to reduce the chance of breaking chains.
> > >
> > >
> > > Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding SSGs
> > > based on whether resource is specified we will easily get groups like
> > (o_1,
> > > o_3) & (o_2, o_4), where none of the operators can be chained. This is
> > also
> > > possible for the SSG-based approach, but I believe the chance is much
> > > smaller because there's no strong reason for users to specify the groups
> > > with alternate operators like that. We are more likely to get groups like
> > > (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and o_3.
> > >
> > >
> > > 3. It complicates the system by having two different mechanisms for
> > sharing
> > > managed memory in  a slot.
> > >
> > >
> > > - In FLIP-141, we introduced the intra-slot managed memory sharing
> > > mechanism, where managed memory is first distributed according to the
> > > consumer type, then further distributed across operators of that consumer
> > > type.
> > >
> > > - With the operator-based approach, managed memory size specified for an
> > > operator should account for all the consumer types of that operator. That
> > > means the managed memory is first distributed across operators, then
> > > distributed to different consumer types of each operator.
> > >
> > >
> > > Unfortunately, the different order of the two calculation steps can lead
> > to
> > > different results. To be specific, the semantic of the configuration
> > option
> > > `consumer-weights` changed (within a slot vs. within an operator).
> > >
> > >
> > >
> > > To sum up things:
> > >
> > > While (3) might be a bit more implementation related, I think (1) and (2)
> > > somehow suggest that, the price for the proposed approach to avoid
> > > specifying resource for every operator is that it's not as independent
> > from
> > > operator chaining and slot sharing as the operator-based approach
> > discussed
> > > in the FLIP.
> > >
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org> wrote:
> > >
> > > > Thanks a lot, Yangze and Xintong for this FLIP.
> > > >
> > > > I want to say, first of all, that this is super well written. And the
> > > > points that the FLIP makes about how to expose the configuration to
> > users
> > > > is exactly the right thing to figure out first.
> > > > So good job here!
> > > >
> > > > About how to let users specify the resource profiles. If I can sum the
> > > FLIP
> > > > and previous discussion up in my own words, the problem is the
> > following:
> > > >
> > > > Operator-level specification is the simplest and cleanest approach,
> > > because
> > > > > it avoids mixing operator configuration (resource) and scheduling. No
> > > > > matter what other parameters change (chaining, slot sharing,
> > switching
> > > > > pipelined and blocking shuffles), the resource profiles stay the
> > same.
> > > > > But it would require that a user specifies resources on all
> > operators,
> > > > > which makes it hard to use. That's why the FLIP suggests going with
> > > > > specifying resources on a Sharing-Group.
> > > >
> > > >
> > > > I think both thoughts are important, so can we find a solution where
> > the
> > > > Resource Profiles are specified on an Operator, but we still avoid that
> > > we
> > > > need to specify a resource profile on every operator?
> > > >
> > > > What do you think about something like the following:
> > > >   - Resource Profiles are specified on an operator level.
> > > >   - Not all operators need profiles
> > > >   - All Operators without a Resource Profile ended up in the default
> > slot
> > > > sharing group with a default profile (will get a default slot).
> > > >   - All Operators with a Resource Profile will go into another slot
> > > sharing
> > > > group (the resource-specified-group).
> > > >   - Users can define different slot sharing groups for operators like
> > > they
> > > > do now, with the exception that you cannot mix operators that have a
> > > > resource profile and operators that have no resource profile.
> > > >   - The default case where no operator has a resource profile is just a
> > > > special case of this model
> > > >   - The chaining logic sums up the profiles per operator, like it does
> > > now,
> > > > and the scheduler sums up the profiles of the tasks that it schedules
> > > > together.
> > > >
> > > >
> > > > There is another question about reactive scaling raised in the FLIP. I
> > > need
> > > > to think a bit about that. That is indeed a bit more tricky once we
> > have
> > > > slots of different sizes.
> > > > It is not clear then which of the different slot requests the
> > > > ResourceManager should fulfill when new resources (TMs) show up, or how
> > > the
> > > > JobManager redistributes the slots resources when resources (TMs)
> > > disappear
> > > > This question is pretty orthogonal, though, to the "how to specify the
> > > > resources".
> > > >
> > > >
> > > > Best,
> > > > Stephan
> > > >
> > > > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <to...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks for drafting the FLIP and driving the discussion, Yangze.
> > > > > And Thanks for the feedback, Till and Chesnay.
> > > > >
> > > > > @Till,
> > > > >
> > > > > I agree that specifying requirements for SSGs means that SSGs need to
> > > be
> > > > > supported in fine-grained resource management, otherwise each
> > operator
> > > > > might use as many resources as the whole group. However, I cannot
> > think
> > > > of
> > > > > a strong reason for not supporting SSGs in fine-grained resource
> > > > > management.
> > > > >
> > > > >
> > > > > > Interestingly, if all operators have their resources properly
> > > > specified,
> > > > > > then slot sharing is no longer needed because Flink could slice off
> > > the
> > > > > > appropriately sized slots for every Task individually.
> > > > > >
> > > > >
> > > > > So for example, if we have a job consisting of two operator op_1 and
> > > op_2
> > > > > > where each op needs 100 MB of memory, we would then say that the
> > slot
> > > > > > sharing group needs 200 MB of memory to run. If we have a cluster
> > > with
> > > > 2
> > > > > > TMs with one slot of 100 MB each, then the system cannot run this
> > > job.
> > > > If
> > > > > > the resources were specified on an operator level, then the system
> > > > could
> > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > > > >
> > > > >
> > > > > Couldn't agree more that if all operators' requirements are properly
> > > > > specified, slot sharing should be no longer needed. I think this
> > > exactly
> > > > > disproves the example. If we already know op_1 and op_2 each needs
> > 100
> > > MB
> > > > > of memory, why would we put them in the same group? If they are in
> > > > separate
> > > > > groups, with the proposed approach the system can freely deploy them
> > to
> > > > > either a 200 MB TM or two 100 MB TMs.
> > > > >
> > > > > Moreover, the precondition for not needing slot sharing is having
> > > > resource
> > > > > requirements properly specified for all operators. This is not always
> > > > > possible, and usually requires tremendous efforts. One of the
> > benefits
> > > > for
> > > > > SSG-based requirements is that it allows the user to freely decide
> > the
> > > > > granularity, thus efforts they want to pay. I would consider SSG in
> > > > > fine-grained resource management as a group of operators that the
> > user
> > > > > would like to specify the total resource for. There can be only one
> > > group
> > > > > in the job, 2~3 groups dividing the job into a few major parts, or as
> > > > many
> > > > > groups as the number of tasks/operators, depending on how
> > fine-grained
> > > > the
> > > > > user is able to specify the resources.
> > > > >
> > > > > Having to support SSGs might be a constraint. But given that all the
> > > > > current scheduler implementations already support SSGs, I tend to
> > think
> > > > > that as an acceptable price for the above discussed usability and
> > > > > flexibility.
> > > > >
> > > > > @Chesnay
> > > > >
> > > > > Will declaring them on slot sharing groups not also waste resources
> > if
> > > > the
> > > > > > parallelism of operators within that group are different?
> > > > > >
> > > > > Yes. It's a trade-off between usability and resource utilization. To
> > > > avoid
> > > > > such wasting, the user can define more groups, so that each group
> > > > contains
> > > > > less operators and the chance of having operators with different
> > > > > parallelism will be reduced. The price is to have more resource
> > > > > requirements to specify.
> > > > >
> > > > > It also seems like quite a hassle for users having to recalculate the
> > > > > > resource requirements if they change the slot sharing.
> > > > > > I'd think that it's not really workable for users that create a set
> > > of
> > > > > > re-usable operators which are mixed and matched in their
> > > applications;
> > > > > > managing the resources requirements in such a setting would be a
> > > > > > nightmare, and in the end would require operator-level requirements
> > > any
> > > > > > way.
> > > > > > In that sense, I'm not even sure whether it really increases
> > > usability.
> > > > > >
> > > > >
> > > > >    - As mentioned in my reply to Till's comment, there's no reason to
> > > put
> > > > >    multiple operators whose individual resource requirements are
> > > already
> > > > > known
> > > > >    into the same group in fine-grained resource management.
> > > > >    - Even an operator implementation is reused for multiple
> > > applications,
> > > > >    it does not guarantee the same resource requirements. During our
> > > years
> > > > > of
> > > > >    practices in Alibaba, with per-operator requirements specified for
> > > > > Blink's
> > > > >    fine-grained resource management, very few users (including our
> > > > > specialists
> > > > >    who are dedicated to supporting Blink users) are as experienced as
> > > to
> > > > >    accurately predict/estimate the operator resource requirements.
> > Most
> > > > > people
> > > > >    rely on the execution-time metrics (throughput, delay, cpu load,
> > > > memory
> > > > >    usage, GC pressure, etc.) to improve the specification.
> > > > >
> > > > > To sum up:
> > > > > If the user is capable of providing proper resource requirements for
> > > > every
> > > > > operator, that's definitely a good thing and we would not need to
> > rely
> > > on
> > > > > the SSGs. However, that shouldn't be a *must* for the fine-grained
> > > > resource
> > > > > management to work. For those users who are capable and do not like
> > > > having
> > > > > to set each operator to a separate SSG, I would be ok to have both
> > > > > SSG-based and operator-based runtime interfaces and to only fallback
> > to
> > > > the
> > > > > SSG requirements when the operator requirements are not specified.
> > > > However,
> > > > > as the first step, I think we should prioritise the use cases where
> > > users
> > > > > are not that experienced.
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <ch...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Will declaring them on slot sharing groups not also waste resources
> > > if
> > > > > > the parallelism of operators within that group are different?
> > > > > >
> > > > > > It also seems like quite a hassle for users having to recalculate
> > the
> > > > > > resource requirements if they change the slot sharing.
> > > > > > I'd think that it's not really workable for users that create a set
> > > of
> > > > > > re-usable operators which are mixed and matched in their
> > > applications;
> > > > > > managing the resources requirements in such a setting would be a
> > > > > > nightmare, and in the end would require operator-level requirements
> > > any
> > > > > > way.
> > > > > > In that sense, I'm not even sure whether it really increases
> > > usability.
> > > > > >
> > > > > > My main worry is that it if we wire the runtime to work on SSGs
> > it's
> > > > > > gonna be difficult to implement more fine-grained approaches, which
> > > > > > would not be the case if, for the runtime, they are always defined
> > on
> > > > an
> > > > > > operator-level.
> > > > > >
> > > > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > > Thanks for drafting this FLIP and starting this discussion
> > Yangze.
> > > > > > >
> > > > > > > I like that defining resource requirements on a slot sharing
> > group
> > > > > makes
> > > > > > > the overall setup easier and improves usability of resource
> > > > > requirements.
> > > > > > >
> > > > > > > What I do not like about it is that it changes slot sharing
> > groups
> > > > from
> > > > > > > being a scheduling hint to something which needs to be supported
> > in
> > > > > order
> > > > > > > to support fine grained resource requirements. So far, the idea
> > of
> > > > slot
> > > > > > > sharing groups was that it tells the system that a set of
> > operators
> > > > can
> > > > > > be
> > > > > > > deployed in the same slot. But the system still had the freedom
> > to
> > > > say
> > > > > > that
> > > > > > > it would rather place these tasks in different slots if it
> > wanted.
> > > If
> > > > > we
> > > > > > > now specify resource requirements on a per slot sharing group,
> > then
> > > > the
> > > > > > > only option for a scheduler which does not support slot sharing
> > > > groups
> > > > > is
> > > > > > > to say that every operator in this slot sharing group needs a
> > slot
> > > > with
> > > > > > the
> > > > > > > same resources as the whole group.
> > > > > > >
> > > > > > > So for example, if we have a job consisting of two operator op_1
> > > and
> > > > > op_2
> > > > > > > where each op needs 100 MB of memory, we would then say that the
> > > slot
> > > > > > > sharing group needs 200 MB of memory to run. If we have a cluster
> > > > with
> > > > > 2
> > > > > > > TMs with one slot of 100 MB each, then the system cannot run this
> > > > job.
> > > > > If
> > > > > > > the resources were specified on an operator level, then the
> > system
> > > > > could
> > > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > > > > > >
> > > > > > > Originally, one of the primary goals of slot sharing groups was
> > to
> > > > make
> > > > > > it
> > > > > > > easier for the user to reason about how many slots a job needs
> > > > > > independent
> > > > > > > of the actual number of operators in the job. Interestingly, if
> > all
> > > > > > > operators have their resources properly specified, then slot
> > > sharing
> > > > is
> > > > > > no
> > > > > > > longer needed because Flink could slice off the appropriately
> > sized
> > > > > slots
> > > > > > > for every Task individually. What matters is whether the whole
> > > > cluster
> > > > > > has
> > > > > > > enough resources to run all tasks or not.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > >> Hi, there,
> > > > > > >>
> > > > > > >> We would like to start a discussion thread on "FLIP-156: Runtime
> > > > > > >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> > > > > > >> propose Slot Sharing Group (SSG) based runtime interfaces for
> > > > > > >> specifying fine-grained resource requirements.
> > > > > > >>
> > > > > > >> In this FLIP:
> > > > > > >> - Expound the user story of fine-grained resource management.
> > > > > > >> - Propose runtime interfaces for specifying SSG-based resource
> > > > > > >> requirements.
> > > > > > >> - Discuss the pros and cons of the three potential granularities
> > > for
> > > > > > >> specifying the resource requirements (op, task and slot sharing
> > > > group)
> > > > > > >> and explain why we choose the slot sharing group.
> > > > > > >>
> > > > > > >> Please find more details in the FLIP wiki document [1]. Looking
> > > > > > >> forward to your feedback.
> > > > > > >>
> > > > > > >> [1]
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Yangze Guo
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

Thanks for the feedback, Till.

## I feel that what you proposed (operator-based + default value) might be
subsumed by the SSG-based approach.
Thinking of op_1 -> op_2, there are the following 4 cases, categorized by
whether the resource requirements are known to the users.

   1. *Both known.* As previously mentioned, there's no reason to put
   multiple operators whose individual resource requirements are already known
   into the same group in fine-grained resource management. And if op_1 and
   op_2 are in different groups, there should be no problem switching data
   exchange mode from pipelined to blocking. This is equivalent to specifying
   operator resource requirements in your proposal.
   2. *op_1 known, op_2 unknown.* Similar to 1), except that op_2 is in a
   SSG whose resource is not specified thus would have the default slot
   resource. This is equivalent to having default operator resources in your
   proposal.
   3. *Both unknown*. The user can either set op_1 and op_2 to the same SSG
   or separate SSGs.
      - If op_1 and op_2 are in the same SSG, it will be equivalent to the
      coarse-grained resource management, where op_1 and op_2 share a default
      size slot no matter which data exchange mode is used.
      - If op_1 and op_2 are in different SSGs, then each of them will use
      a default size slot. This is equivalent to setting them with default
      operator resources in your proposal.
   4. *Total (pipeline) or max (blocking) of op_1 and op_2 is known.*
      - It is possible that the user learns the total / max resource
      requirement from executing and monitoring the job, while not
being aware of
      individual operator requirements.
      - I believe this is the case your proposal does not cover. And TBH,
      this is probably how most users learn the resource requirements,
according
      to my experiences.
      - In this case, the user might need to specify different resources if
      he wants to switch the execution mode, which should not be worse than not
      being able to use fine-grained resource management.


## An additional idea inspired by your proposal.
We may provide multiple options for deciding resources for SSGs whose
requirement is not specified, if needed.

   - Default slot resource (current design)
   - Default operator resource times number of operators (equivalent to
   your proposal)


## Exposing internal runtime strategies
Theoretically, yes. Tying to the SSGs, the resource requirements might be
affected if how SSGs are internally handled changes in future. Practically,
I do not concretely see at the moment what kind of changes we may want in
future that might conflict with this FLIP proposal, as the question of
switching data exchange mode answered above. I'd suggest to not give up the
user friendliness we may gain now for the future problems that may or may
not exist.

Moreover, the SSG-based approach has the flexibility to achieve the
equivalent behavior as the operator-based approach, if we set each operator
(or task) to a separate SSG. We can even provide a shortcut option to
automatically do that for users, if needed.


Thank you~

Xintong Song



On Tue, Jan 19, 2021 at 11:48 PM Till Rohrmann <tr...@apache.org> wrote:

> Thanks for the responses Xintong and Stephan,
>
> I agree that being able to define the resource requirements for a group of
> operators is more user friendly. However, my concern is that we are
> exposing thereby internal runtime strategies which might limit our
> flexibility to execute a given job. Moreover, the semantics of configuring
> resource requirements for SSGs could break if switching from streaming to
> batch execution. If one defines the resource requirements for op_1 -> op_2
> which run in pipelined mode when using the streaming execution, then how do
> we interpret these requirements when op_1 -> op_2 are executed with a
> blocking data exchange in batch execution mode? Consequently, I am still
> leaning towards Stephan's proposal to set the resource requirements per
> operator.
>
> Maybe the following proposal makes the configuration easier: If the user
> wants to use fine-grained resource requirements, then she needs to specify
> the default size which is used for operators which have no explicit
> resource annotation. If this holds true, then every operator would have a
> resource requirement and the system can try to execute the operators in the
> best possible manner w/o being constrained by how the user set the SSG
> requirements.
>
> Cheers,
> Till
>
> On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com>
> wrote:
>
> > Thanks for the feedback, Stephan.
> >
> > Actually, your proposal has also come to my mind at some point. And I
> have
> > some concerns about it.
> >
> >
> > 1. It does not give users the same control as the SSG-based approach.
> >
> >
> > While both approaches do not require specifying for each operator,
> > SSG-based approach supports the semantic that "some operators together
> use
> > this much resource" while the operator-based approach doesn't.
> >
> >
> > Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and at
> some
> > point there's an agg o_n (1 < n < m) which significantly reduces the data
> > amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ..., o_n)
> > and SSG_2 (o_n+1, ... o_m), so that configuring much higher parallelisms
> > for operators in SSG_1 than for operators in SSG_2 won't lead to too much
> > wasting of resources. If the two SSGs end up needing different resources,
> > with the SSG-based approach one can directly specify resources for the
> two
> > groups. However, with the operator-based approach, the user will have to
> > specify resources for each operator in one of the two groups, and tune
> the
> > default slot resource via configurations to fit the other group.
> >
> >
> > 2. It increases the chance of breaking operator chains.
> >
> >
> > Setting chainnable operators into different slot sharing groups will
> > prevent them from being chained. In the current implementation,
> downstream
> > operators, if SSG not explicitly specified, will be set to the same group
> > as the chainable upstream operators (unless multiple upstream operators
> in
> > different groups), to reduce the chance of breaking chains.
> >
> >
> > Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding SSGs
> > based on whether resource is specified we will easily get groups like
> (o_1,
> > o_3) & (o_2, o_4), where none of the operators can be chained. This is
> also
> > possible for the SSG-based approach, but I believe the chance is much
> > smaller because there's no strong reason for users to specify the groups
> > with alternate operators like that. We are more likely to get groups like
> > (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and o_3.
> >
> >
> > 3. It complicates the system by having two different mechanisms for
> sharing
> > managed memory in  a slot.
> >
> >
> > - In FLIP-141, we introduced the intra-slot managed memory sharing
> > mechanism, where managed memory is first distributed according to the
> > consumer type, then further distributed across operators of that consumer
> > type.
> >
> > - With the operator-based approach, managed memory size specified for an
> > operator should account for all the consumer types of that operator. That
> > means the managed memory is first distributed across operators, then
> > distributed to different consumer types of each operator.
> >
> >
> > Unfortunately, the different order of the two calculation steps can lead
> to
> > different results. To be specific, the semantic of the configuration
> option
> > `consumer-weights` changed (within a slot vs. within an operator).
> >
> >
> >
> > To sum up things:
> >
> > While (3) might be a bit more implementation related, I think (1) and (2)
> > somehow suggest that, the price for the proposed approach to avoid
> > specifying resource for every operator is that it's not as independent
> from
> > operator chaining and slot sharing as the operator-based approach
> discussed
> > in the FLIP.
> >
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org> wrote:
> >
> > > Thanks a lot, Yangze and Xintong for this FLIP.
> > >
> > > I want to say, first of all, that this is super well written. And the
> > > points that the FLIP makes about how to expose the configuration to
> users
> > > is exactly the right thing to figure out first.
> > > So good job here!
> > >
> > > About how to let users specify the resource profiles. If I can sum the
> > FLIP
> > > and previous discussion up in my own words, the problem is the
> following:
> > >
> > > Operator-level specification is the simplest and cleanest approach,
> > because
> > > > it avoids mixing operator configuration (resource) and scheduling. No
> > > > matter what other parameters change (chaining, slot sharing,
> switching
> > > > pipelined and blocking shuffles), the resource profiles stay the
> same.
> > > > But it would require that a user specifies resources on all
> operators,
> > > > which makes it hard to use. That's why the FLIP suggests going with
> > > > specifying resources on a Sharing-Group.
> > >
> > >
> > > I think both thoughts are important, so can we find a solution where
> the
> > > Resource Profiles are specified on an Operator, but we still avoid that
> > we
> > > need to specify a resource profile on every operator?
> > >
> > > What do you think about something like the following:
> > >   - Resource Profiles are specified on an operator level.
> > >   - Not all operators need profiles
> > >   - All Operators without a Resource Profile ended up in the default
> slot
> > > sharing group with a default profile (will get a default slot).
> > >   - All Operators with a Resource Profile will go into another slot
> > sharing
> > > group (the resource-specified-group).
> > >   - Users can define different slot sharing groups for operators like
> > they
> > > do now, with the exception that you cannot mix operators that have a
> > > resource profile and operators that have no resource profile.
> > >   - The default case where no operator has a resource profile is just a
> > > special case of this model
> > >   - The chaining logic sums up the profiles per operator, like it does
> > now,
> > > and the scheduler sums up the profiles of the tasks that it schedules
> > > together.
> > >
> > >
> > > There is another question about reactive scaling raised in the FLIP. I
> > need
> > > to think a bit about that. That is indeed a bit more tricky once we
> have
> > > slots of different sizes.
> > > It is not clear then which of the different slot requests the
> > > ResourceManager should fulfill when new resources (TMs) show up, or how
> > the
> > > JobManager redistributes the slots resources when resources (TMs)
> > disappear
> > > This question is pretty orthogonal, though, to the "how to specify the
> > > resources".
> > >
> > >
> > > Best,
> > > Stephan
> > >
> > > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <to...@gmail.com>
> > wrote:
> > >
> > > > Thanks for drafting the FLIP and driving the discussion, Yangze.
> > > > And Thanks for the feedback, Till and Chesnay.
> > > >
> > > > @Till,
> > > >
> > > > I agree that specifying requirements for SSGs means that SSGs need to
> > be
> > > > supported in fine-grained resource management, otherwise each
> operator
> > > > might use as many resources as the whole group. However, I cannot
> think
> > > of
> > > > a strong reason for not supporting SSGs in fine-grained resource
> > > > management.
> > > >
> > > >
> > > > > Interestingly, if all operators have their resources properly
> > > specified,
> > > > > then slot sharing is no longer needed because Flink could slice off
> > the
> > > > > appropriately sized slots for every Task individually.
> > > > >
> > > >
> > > > So for example, if we have a job consisting of two operator op_1 and
> > op_2
> > > > > where each op needs 100 MB of memory, we would then say that the
> slot
> > > > > sharing group needs 200 MB of memory to run. If we have a cluster
> > with
> > > 2
> > > > > TMs with one slot of 100 MB each, then the system cannot run this
> > job.
> > > If
> > > > > the resources were specified on an operator level, then the system
> > > could
> > > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > > >
> > > >
> > > > Couldn't agree more that if all operators' requirements are properly
> > > > specified, slot sharing should be no longer needed. I think this
> > exactly
> > > > disproves the example. If we already know op_1 and op_2 each needs
> 100
> > MB
> > > > of memory, why would we put them in the same group? If they are in
> > > separate
> > > > groups, with the proposed approach the system can freely deploy them
> to
> > > > either a 200 MB TM or two 100 MB TMs.
> > > >
> > > > Moreover, the precondition for not needing slot sharing is having
> > > resource
> > > > requirements properly specified for all operators. This is not always
> > > > possible, and usually requires tremendous efforts. One of the
> benefits
> > > for
> > > > SSG-based requirements is that it allows the user to freely decide
> the
> > > > granularity, thus efforts they want to pay. I would consider SSG in
> > > > fine-grained resource management as a group of operators that the
> user
> > > > would like to specify the total resource for. There can be only one
> > group
> > > > in the job, 2~3 groups dividing the job into a few major parts, or as
> > > many
> > > > groups as the number of tasks/operators, depending on how
> fine-grained
> > > the
> > > > user is able to specify the resources.
> > > >
> > > > Having to support SSGs might be a constraint. But given that all the
> > > > current scheduler implementations already support SSGs, I tend to
> think
> > > > that as an acceptable price for the above discussed usability and
> > > > flexibility.
> > > >
> > > > @Chesnay
> > > >
> > > > Will declaring them on slot sharing groups not also waste resources
> if
> > > the
> > > > > parallelism of operators within that group are different?
> > > > >
> > > > Yes. It's a trade-off between usability and resource utilization. To
> > > avoid
> > > > such wasting, the user can define more groups, so that each group
> > > contains
> > > > less operators and the chance of having operators with different
> > > > parallelism will be reduced. The price is to have more resource
> > > > requirements to specify.
> > > >
> > > > It also seems like quite a hassle for users having to recalculate the
> > > > > resource requirements if they change the slot sharing.
> > > > > I'd think that it's not really workable for users that create a set
> > of
> > > > > re-usable operators which are mixed and matched in their
> > applications;
> > > > > managing the resources requirements in such a setting would be a
> > > > > nightmare, and in the end would require operator-level requirements
> > any
> > > > > way.
> > > > > In that sense, I'm not even sure whether it really increases
> > usability.
> > > > >
> > > >
> > > >    - As mentioned in my reply to Till's comment, there's no reason to
> > put
> > > >    multiple operators whose individual resource requirements are
> > already
> > > > known
> > > >    into the same group in fine-grained resource management.
> > > >    - Even an operator implementation is reused for multiple
> > applications,
> > > >    it does not guarantee the same resource requirements. During our
> > years
> > > > of
> > > >    practices in Alibaba, with per-operator requirements specified for
> > > > Blink's
> > > >    fine-grained resource management, very few users (including our
> > > > specialists
> > > >    who are dedicated to supporting Blink users) are as experienced as
> > to
> > > >    accurately predict/estimate the operator resource requirements.
> Most
> > > > people
> > > >    rely on the execution-time metrics (throughput, delay, cpu load,
> > > memory
> > > >    usage, GC pressure, etc.) to improve the specification.
> > > >
> > > > To sum up:
> > > > If the user is capable of providing proper resource requirements for
> > > every
> > > > operator, that's definitely a good thing and we would not need to
> rely
> > on
> > > > the SSGs. However, that shouldn't be a *must* for the fine-grained
> > > resource
> > > > management to work. For those users who are capable and do not like
> > > having
> > > > to set each operator to a separate SSG, I would be ok to have both
> > > > SSG-based and operator-based runtime interfaces and to only fallback
> to
> > > the
> > > > SSG requirements when the operator requirements are not specified.
> > > However,
> > > > as the first step, I think we should prioritise the use cases where
> > users
> > > > are not that experienced.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <ch...@apache.org>
> > > > wrote:
> > > >
> > > > > Will declaring them on slot sharing groups not also waste resources
> > if
> > > > > the parallelism of operators within that group are different?
> > > > >
> > > > > It also seems like quite a hassle for users having to recalculate
> the
> > > > > resource requirements if they change the slot sharing.
> > > > > I'd think that it's not really workable for users that create a set
> > of
> > > > > re-usable operators which are mixed and matched in their
> > applications;
> > > > > managing the resources requirements in such a setting would be a
> > > > > nightmare, and in the end would require operator-level requirements
> > any
> > > > > way.
> > > > > In that sense, I'm not even sure whether it really increases
> > usability.
> > > > >
> > > > > My main worry is that it if we wire the runtime to work on SSGs
> it's
> > > > > gonna be difficult to implement more fine-grained approaches, which
> > > > > would not be the case if, for the runtime, they are always defined
> on
> > > an
> > > > > operator-level.
> > > > >
> > > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > > Thanks for drafting this FLIP and starting this discussion
> Yangze.
> > > > > >
> > > > > > I like that defining resource requirements on a slot sharing
> group
> > > > makes
> > > > > > the overall setup easier and improves usability of resource
> > > > requirements.
> > > > > >
> > > > > > What I do not like about it is that it changes slot sharing
> groups
> > > from
> > > > > > being a scheduling hint to something which needs to be supported
> in
> > > > order
> > > > > > to support fine grained resource requirements. So far, the idea
> of
> > > slot
> > > > > > sharing groups was that it tells the system that a set of
> operators
> > > can
> > > > > be
> > > > > > deployed in the same slot. But the system still had the freedom
> to
> > > say
> > > > > that
> > > > > > it would rather place these tasks in different slots if it
> wanted.
> > If
> > > > we
> > > > > > now specify resource requirements on a per slot sharing group,
> then
> > > the
> > > > > > only option for a scheduler which does not support slot sharing
> > > groups
> > > > is
> > > > > > to say that every operator in this slot sharing group needs a
> slot
> > > with
> > > > > the
> > > > > > same resources as the whole group.
> > > > > >
> > > > > > So for example, if we have a job consisting of two operator op_1
> > and
> > > > op_2
> > > > > > where each op needs 100 MB of memory, we would then say that the
> > slot
> > > > > > sharing group needs 200 MB of memory to run. If we have a cluster
> > > with
> > > > 2
> > > > > > TMs with one slot of 100 MB each, then the system cannot run this
> > > job.
> > > > If
> > > > > > the resources were specified on an operator level, then the
> system
> > > > could
> > > > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > > > > >
> > > > > > Originally, one of the primary goals of slot sharing groups was
> to
> > > make
> > > > > it
> > > > > > easier for the user to reason about how many slots a job needs
> > > > > independent
> > > > > > of the actual number of operators in the job. Interestingly, if
> all
> > > > > > operators have their resources properly specified, then slot
> > sharing
> > > is
> > > > > no
> > > > > > longer needed because Flink could slice off the appropriately
> sized
> > > > slots
> > > > > > for every Task individually. What matters is whether the whole
> > > cluster
> > > > > has
> > > > > > enough resources to run all tasks or not.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hi, there,
> > > > > >>
> > > > > >> We would like to start a discussion thread on "FLIP-156: Runtime
> > > > > >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> > > > > >> propose Slot Sharing Group (SSG) based runtime interfaces for
> > > > > >> specifying fine-grained resource requirements.
> > > > > >>
> > > > > >> In this FLIP:
> > > > > >> - Expound the user story of fine-grained resource management.
> > > > > >> - Propose runtime interfaces for specifying SSG-based resource
> > > > > >> requirements.
> > > > > >> - Discuss the pros and cons of the three potential granularities
> > for
> > > > > >> specifying the resource requirements (op, task and slot sharing
> > > group)
> > > > > >> and explain why we choose the slot sharing group.
> > > > > >>
> > > > > >> Please find more details in the FLIP wiki document [1]. Looking
> > > > > >> forward to your feedback.
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > > >>
> > > > > >> Best,
> > > > > >> Yangze Guo
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for the responses Xintong and Stephan,

I agree that being able to define the resource requirements for a group of
operators is more user friendly. However, my concern is that we are
exposing thereby internal runtime strategies which might limit our
flexibility to execute a given job. Moreover, the semantics of configuring
resource requirements for SSGs could break if switching from streaming to
batch execution. If one defines the resource requirements for op_1 -> op_2
which run in pipelined mode when using the streaming execution, then how do
we interpret these requirements when op_1 -> op_2 are executed with a
blocking data exchange in batch execution mode? Consequently, I am still
leaning towards Stephan's proposal to set the resource requirements per
operator.

Maybe the following proposal makes the configuration easier: If the user
wants to use fine-grained resource requirements, then she needs to specify
the default size which is used for operators which have no explicit
resource annotation. If this holds true, then every operator would have a
resource requirement and the system can try to execute the operators in the
best possible manner w/o being constrained by how the user set the SSG
requirements.

Cheers,
Till

On Tue, Jan 19, 2021 at 9:09 AM Xintong Song <to...@gmail.com> wrote:

> Thanks for the feedback, Stephan.
>
> Actually, your proposal has also come to my mind at some point. And I have
> some concerns about it.
>
>
> 1. It does not give users the same control as the SSG-based approach.
>
>
> While both approaches do not require specifying for each operator,
> SSG-based approach supports the semantic that "some operators together use
> this much resource" while the operator-based approach doesn't.
>
>
> Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and at some
> point there's an agg o_n (1 < n < m) which significantly reduces the data
> amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ..., o_n)
> and SSG_2 (o_n+1, ... o_m), so that configuring much higher parallelisms
> for operators in SSG_1 than for operators in SSG_2 won't lead to too much
> wasting of resources. If the two SSGs end up needing different resources,
> with the SSG-based approach one can directly specify resources for the two
> groups. However, with the operator-based approach, the user will have to
> specify resources for each operator in one of the two groups, and tune the
> default slot resource via configurations to fit the other group.
>
>
> 2. It increases the chance of breaking operator chains.
>
>
> Setting chainnable operators into different slot sharing groups will
> prevent them from being chained. In the current implementation, downstream
> operators, if SSG not explicitly specified, will be set to the same group
> as the chainable upstream operators (unless multiple upstream operators in
> different groups), to reduce the chance of breaking chains.
>
>
> Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding SSGs
> based on whether resource is specified we will easily get groups like (o_1,
> o_3) & (o_2, o_4), where none of the operators can be chained. This is also
> possible for the SSG-based approach, but I believe the chance is much
> smaller because there's no strong reason for users to specify the groups
> with alternate operators like that. We are more likely to get groups like
> (o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and o_3.
>
>
> 3. It complicates the system by having two different mechanisms for sharing
> managed memory in  a slot.
>
>
> - In FLIP-141, we introduced the intra-slot managed memory sharing
> mechanism, where managed memory is first distributed according to the
> consumer type, then further distributed across operators of that consumer
> type.
>
> - With the operator-based approach, managed memory size specified for an
> operator should account for all the consumer types of that operator. That
> means the managed memory is first distributed across operators, then
> distributed to different consumer types of each operator.
>
>
> Unfortunately, the different order of the two calculation steps can lead to
> different results. To be specific, the semantic of the configuration option
> `consumer-weights` changed (within a slot vs. within an operator).
>
>
>
> To sum up things:
>
> While (3) might be a bit more implementation related, I think (1) and (2)
> somehow suggest that, the price for the proposed approach to avoid
> specifying resource for every operator is that it's not as independent from
> operator chaining and slot sharing as the operator-based approach discussed
> in the FLIP.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org> wrote:
>
> > Thanks a lot, Yangze and Xintong for this FLIP.
> >
> > I want to say, first of all, that this is super well written. And the
> > points that the FLIP makes about how to expose the configuration to users
> > is exactly the right thing to figure out first.
> > So good job here!
> >
> > About how to let users specify the resource profiles. If I can sum the
> FLIP
> > and previous discussion up in my own words, the problem is the following:
> >
> > Operator-level specification is the simplest and cleanest approach,
> because
> > > it avoids mixing operator configuration (resource) and scheduling. No
> > > matter what other parameters change (chaining, slot sharing, switching
> > > pipelined and blocking shuffles), the resource profiles stay the same.
> > > But it would require that a user specifies resources on all operators,
> > > which makes it hard to use. That's why the FLIP suggests going with
> > > specifying resources on a Sharing-Group.
> >
> >
> > I think both thoughts are important, so can we find a solution where the
> > Resource Profiles are specified on an Operator, but we still avoid that
> we
> > need to specify a resource profile on every operator?
> >
> > What do you think about something like the following:
> >   - Resource Profiles are specified on an operator level.
> >   - Not all operators need profiles
> >   - All Operators without a Resource Profile ended up in the default slot
> > sharing group with a default profile (will get a default slot).
> >   - All Operators with a Resource Profile will go into another slot
> sharing
> > group (the resource-specified-group).
> >   - Users can define different slot sharing groups for operators like
> they
> > do now, with the exception that you cannot mix operators that have a
> > resource profile and operators that have no resource profile.
> >   - The default case where no operator has a resource profile is just a
> > special case of this model
> >   - The chaining logic sums up the profiles per operator, like it does
> now,
> > and the scheduler sums up the profiles of the tasks that it schedules
> > together.
> >
> >
> > There is another question about reactive scaling raised in the FLIP. I
> need
> > to think a bit about that. That is indeed a bit more tricky once we have
> > slots of different sizes.
> > It is not clear then which of the different slot requests the
> > ResourceManager should fulfill when new resources (TMs) show up, or how
> the
> > JobManager redistributes the slots resources when resources (TMs)
> disappear
> > This question is pretty orthogonal, though, to the "how to specify the
> > resources".
> >
> >
> > Best,
> > Stephan
> >
> > On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <to...@gmail.com>
> wrote:
> >
> > > Thanks for drafting the FLIP and driving the discussion, Yangze.
> > > And Thanks for the feedback, Till and Chesnay.
> > >
> > > @Till,
> > >
> > > I agree that specifying requirements for SSGs means that SSGs need to
> be
> > > supported in fine-grained resource management, otherwise each operator
> > > might use as many resources as the whole group. However, I cannot think
> > of
> > > a strong reason for not supporting SSGs in fine-grained resource
> > > management.
> > >
> > >
> > > > Interestingly, if all operators have their resources properly
> > specified,
> > > > then slot sharing is no longer needed because Flink could slice off
> the
> > > > appropriately sized slots for every Task individually.
> > > >
> > >
> > > So for example, if we have a job consisting of two operator op_1 and
> op_2
> > > > where each op needs 100 MB of memory, we would then say that the slot
> > > > sharing group needs 200 MB of memory to run. If we have a cluster
> with
> > 2
> > > > TMs with one slot of 100 MB each, then the system cannot run this
> job.
> > If
> > > > the resources were specified on an operator level, then the system
> > could
> > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > >
> > >
> > > Couldn't agree more that if all operators' requirements are properly
> > > specified, slot sharing should be no longer needed. I think this
> exactly
> > > disproves the example. If we already know op_1 and op_2 each needs 100
> MB
> > > of memory, why would we put them in the same group? If they are in
> > separate
> > > groups, with the proposed approach the system can freely deploy them to
> > > either a 200 MB TM or two 100 MB TMs.
> > >
> > > Moreover, the precondition for not needing slot sharing is having
> > resource
> > > requirements properly specified for all operators. This is not always
> > > possible, and usually requires tremendous efforts. One of the benefits
> > for
> > > SSG-based requirements is that it allows the user to freely decide the
> > > granularity, thus efforts they want to pay. I would consider SSG in
> > > fine-grained resource management as a group of operators that the user
> > > would like to specify the total resource for. There can be only one
> group
> > > in the job, 2~3 groups dividing the job into a few major parts, or as
> > many
> > > groups as the number of tasks/operators, depending on how fine-grained
> > the
> > > user is able to specify the resources.
> > >
> > > Having to support SSGs might be a constraint. But given that all the
> > > current scheduler implementations already support SSGs, I tend to think
> > > that as an acceptable price for the above discussed usability and
> > > flexibility.
> > >
> > > @Chesnay
> > >
> > > Will declaring them on slot sharing groups not also waste resources if
> > the
> > > > parallelism of operators within that group are different?
> > > >
> > > Yes. It's a trade-off between usability and resource utilization. To
> > avoid
> > > such wasting, the user can define more groups, so that each group
> > contains
> > > less operators and the chance of having operators with different
> > > parallelism will be reduced. The price is to have more resource
> > > requirements to specify.
> > >
> > > It also seems like quite a hassle for users having to recalculate the
> > > > resource requirements if they change the slot sharing.
> > > > I'd think that it's not really workable for users that create a set
> of
> > > > re-usable operators which are mixed and matched in their
> applications;
> > > > managing the resources requirements in such a setting would be a
> > > > nightmare, and in the end would require operator-level requirements
> any
> > > > way.
> > > > In that sense, I'm not even sure whether it really increases
> usability.
> > > >
> > >
> > >    - As mentioned in my reply to Till's comment, there's no reason to
> put
> > >    multiple operators whose individual resource requirements are
> already
> > > known
> > >    into the same group in fine-grained resource management.
> > >    - Even an operator implementation is reused for multiple
> applications,
> > >    it does not guarantee the same resource requirements. During our
> years
> > > of
> > >    practices in Alibaba, with per-operator requirements specified for
> > > Blink's
> > >    fine-grained resource management, very few users (including our
> > > specialists
> > >    who are dedicated to supporting Blink users) are as experienced as
> to
> > >    accurately predict/estimate the operator resource requirements. Most
> > > people
> > >    rely on the execution-time metrics (throughput, delay, cpu load,
> > memory
> > >    usage, GC pressure, etc.) to improve the specification.
> > >
> > > To sum up:
> > > If the user is capable of providing proper resource requirements for
> > every
> > > operator, that's definitely a good thing and we would not need to rely
> on
> > > the SSGs. However, that shouldn't be a *must* for the fine-grained
> > resource
> > > management to work. For those users who are capable and do not like
> > having
> > > to set each operator to a separate SSG, I would be ok to have both
> > > SSG-based and operator-based runtime interfaces and to only fallback to
> > the
> > > SSG requirements when the operator requirements are not specified.
> > However,
> > > as the first step, I think we should prioritise the use cases where
> users
> > > are not that experienced.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <ch...@apache.org>
> > > wrote:
> > >
> > > > Will declaring them on slot sharing groups not also waste resources
> if
> > > > the parallelism of operators within that group are different?
> > > >
> > > > It also seems like quite a hassle for users having to recalculate the
> > > > resource requirements if they change the slot sharing.
> > > > I'd think that it's not really workable for users that create a set
> of
> > > > re-usable operators which are mixed and matched in their
> applications;
> > > > managing the resources requirements in such a setting would be a
> > > > nightmare, and in the end would require operator-level requirements
> any
> > > > way.
> > > > In that sense, I'm not even sure whether it really increases
> usability.
> > > >
> > > > My main worry is that it if we wire the runtime to work on SSGs it's
> > > > gonna be difficult to implement more fine-grained approaches, which
> > > > would not be the case if, for the runtime, they are always defined on
> > an
> > > > operator-level.
> > > >
> > > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > > Thanks for drafting this FLIP and starting this discussion Yangze.
> > > > >
> > > > > I like that defining resource requirements on a slot sharing group
> > > makes
> > > > > the overall setup easier and improves usability of resource
> > > requirements.
> > > > >
> > > > > What I do not like about it is that it changes slot sharing groups
> > from
> > > > > being a scheduling hint to something which needs to be supported in
> > > order
> > > > > to support fine grained resource requirements. So far, the idea of
> > slot
> > > > > sharing groups was that it tells the system that a set of operators
> > can
> > > > be
> > > > > deployed in the same slot. But the system still had the freedom to
> > say
> > > > that
> > > > > it would rather place these tasks in different slots if it wanted.
> If
> > > we
> > > > > now specify resource requirements on a per slot sharing group, then
> > the
> > > > > only option for a scheduler which does not support slot sharing
> > groups
> > > is
> > > > > to say that every operator in this slot sharing group needs a slot
> > with
> > > > the
> > > > > same resources as the whole group.
> > > > >
> > > > > So for example, if we have a job consisting of two operator op_1
> and
> > > op_2
> > > > > where each op needs 100 MB of memory, we would then say that the
> slot
> > > > > sharing group needs 200 MB of memory to run. If we have a cluster
> > with
> > > 2
> > > > > TMs with one slot of 100 MB each, then the system cannot run this
> > job.
> > > If
> > > > > the resources were specified on an operator level, then the system
> > > could
> > > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > > > >
> > > > > Originally, one of the primary goals of slot sharing groups was to
> > make
> > > > it
> > > > > easier for the user to reason about how many slots a job needs
> > > > independent
> > > > > of the actual number of operators in the job. Interestingly, if all
> > > > > operators have their resources properly specified, then slot
> sharing
> > is
> > > > no
> > > > > longer needed because Flink could slice off the appropriately sized
> > > slots
> > > > > for every Task individually. What matters is whether the whole
> > cluster
> > > > has
> > > > > enough resources to run all tasks or not.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com>
> > wrote:
> > > > >
> > > > >> Hi, there,
> > > > >>
> > > > >> We would like to start a discussion thread on "FLIP-156: Runtime
> > > > >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> > > > >> propose Slot Sharing Group (SSG) based runtime interfaces for
> > > > >> specifying fine-grained resource requirements.
> > > > >>
> > > > >> In this FLIP:
> > > > >> - Expound the user story of fine-grained resource management.
> > > > >> - Propose runtime interfaces for specifying SSG-based resource
> > > > >> requirements.
> > > > >> - Discuss the pros and cons of the three potential granularities
> for
> > > > >> specifying the resource requirements (op, task and slot sharing
> > group)
> > > > >> and explain why we choose the slot sharing group.
> > > > >>
> > > > >> Please find more details in the FLIP wiki document [1]. Looking
> > > > >> forward to your feedback.
> > > > >>
> > > > >> [1]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > > >>
> > > > >> Best,
> > > > >> Yangze Guo
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

Thanks for the feedback, Stephan.

Actually, your proposal has also come to my mind at some point. And I have
some concerns about it.


1. It does not give users the same control as the SSG-based approach.


While both approaches do not require specifying for each operator,
SSG-based approach supports the semantic that "some operators together use
this much resource" while the operator-based approach doesn't.


Think of a long pipeline with m operators (o_1, o_2, ..., o_m), and at some
point there's an agg o_n (1 < n < m) which significantly reduces the data
amount. One can separate the pipeline into 2 groups SSG_1 (o_1, ..., o_n)
and SSG_2 (o_n+1, ... o_m), so that configuring much higher parallelisms
for operators in SSG_1 than for operators in SSG_2 won't lead to too much
wasting of resources. If the two SSGs end up needing different resources,
with the SSG-based approach one can directly specify resources for the two
groups. However, with the operator-based approach, the user will have to
specify resources for each operator in one of the two groups, and tune the
default slot resource via configurations to fit the other group.


2. It increases the chance of breaking operator chains.


Setting chainnable operators into different slot sharing groups will
prevent them from being chained. In the current implementation, downstream
operators, if SSG not explicitly specified, will be set to the same group
as the chainable upstream operators (unless multiple upstream operators in
different groups), to reduce the chance of breaking chains.


Thinking of chainable operators o_1 -> o_2 -> o_3 -> o_3, deciding SSGs
based on whether resource is specified we will easily get groups like (o_1,
o_3) & (o_2, o_4), where none of the operators can be chained. This is also
possible for the SSG-based approach, but I believe the chance is much
smaller because there's no strong reason for users to specify the groups
with alternate operators like that. We are more likely to get groups like
(o_1, o_2) & (o_3, o_4), where the chain breaks only between o_2 and o_3.


3. It complicates the system by having two different mechanisms for sharing
managed memory in  a slot.


- In FLIP-141, we introduced the intra-slot managed memory sharing
mechanism, where managed memory is first distributed according to the
consumer type, then further distributed across operators of that consumer
type.

- With the operator-based approach, managed memory size specified for an
operator should account for all the consumer types of that operator. That
means the managed memory is first distributed across operators, then
distributed to different consumer types of each operator.


Unfortunately, the different order of the two calculation steps can lead to
different results. To be specific, the semantic of the configuration option
`consumer-weights` changed (within a slot vs. within an operator).



To sum up things:

While (3) might be a bit more implementation related, I think (1) and (2)
somehow suggest that, the price for the proposed approach to avoid
specifying resource for every operator is that it's not as independent from
operator chaining and slot sharing as the operator-based approach discussed
in the FLIP.


Thank you~

Xintong Song



On Tue, Jan 19, 2021 at 4:29 AM Stephan Ewen <se...@apache.org> wrote:

> Thanks a lot, Yangze and Xintong for this FLIP.
>
> I want to say, first of all, that this is super well written. And the
> points that the FLIP makes about how to expose the configuration to users
> is exactly the right thing to figure out first.
> So good job here!
>
> About how to let users specify the resource profiles. If I can sum the FLIP
> and previous discussion up in my own words, the problem is the following:
>
> Operator-level specification is the simplest and cleanest approach, because
> > it avoids mixing operator configuration (resource) and scheduling. No
> > matter what other parameters change (chaining, slot sharing, switching
> > pipelined and blocking shuffles), the resource profiles stay the same.
> > But it would require that a user specifies resources on all operators,
> > which makes it hard to use. That's why the FLIP suggests going with
> > specifying resources on a Sharing-Group.
>
>
> I think both thoughts are important, so can we find a solution where the
> Resource Profiles are specified on an Operator, but we still avoid that we
> need to specify a resource profile on every operator?
>
> What do you think about something like the following:
>   - Resource Profiles are specified on an operator level.
>   - Not all operators need profiles
>   - All Operators without a Resource Profile ended up in the default slot
> sharing group with a default profile (will get a default slot).
>   - All Operators with a Resource Profile will go into another slot sharing
> group (the resource-specified-group).
>   - Users can define different slot sharing groups for operators like they
> do now, with the exception that you cannot mix operators that have a
> resource profile and operators that have no resource profile.
>   - The default case where no operator has a resource profile is just a
> special case of this model
>   - The chaining logic sums up the profiles per operator, like it does now,
> and the scheduler sums up the profiles of the tasks that it schedules
> together.
>
>
> There is another question about reactive scaling raised in the FLIP. I need
> to think a bit about that. That is indeed a bit more tricky once we have
> slots of different sizes.
> It is not clear then which of the different slot requests the
> ResourceManager should fulfill when new resources (TMs) show up, or how the
> JobManager redistributes the slots resources when resources (TMs) disappear
> This question is pretty orthogonal, though, to the "how to specify the
> resources".
>
>
> Best,
> Stephan
>
> On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <to...@gmail.com> wrote:
>
> > Thanks for drafting the FLIP and driving the discussion, Yangze.
> > And Thanks for the feedback, Till and Chesnay.
> >
> > @Till,
> >
> > I agree that specifying requirements for SSGs means that SSGs need to be
> > supported in fine-grained resource management, otherwise each operator
> > might use as many resources as the whole group. However, I cannot think
> of
> > a strong reason for not supporting SSGs in fine-grained resource
> > management.
> >
> >
> > > Interestingly, if all operators have their resources properly
> specified,
> > > then slot sharing is no longer needed because Flink could slice off the
> > > appropriately sized slots for every Task individually.
> > >
> >
> > So for example, if we have a job consisting of two operator op_1 and op_2
> > > where each op needs 100 MB of memory, we would then say that the slot
> > > sharing group needs 200 MB of memory to run. If we have a cluster with
> 2
> > > TMs with one slot of 100 MB each, then the system cannot run this job.
> If
> > > the resources were specified on an operator level, then the system
> could
> > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> >
> >
> > Couldn't agree more that if all operators' requirements are properly
> > specified, slot sharing should be no longer needed. I think this exactly
> > disproves the example. If we already know op_1 and op_2 each needs 100 MB
> > of memory, why would we put them in the same group? If they are in
> separate
> > groups, with the proposed approach the system can freely deploy them to
> > either a 200 MB TM or two 100 MB TMs.
> >
> > Moreover, the precondition for not needing slot sharing is having
> resource
> > requirements properly specified for all operators. This is not always
> > possible, and usually requires tremendous efforts. One of the benefits
> for
> > SSG-based requirements is that it allows the user to freely decide the
> > granularity, thus efforts they want to pay. I would consider SSG in
> > fine-grained resource management as a group of operators that the user
> > would like to specify the total resource for. There can be only one group
> > in the job, 2~3 groups dividing the job into a few major parts, or as
> many
> > groups as the number of tasks/operators, depending on how fine-grained
> the
> > user is able to specify the resources.
> >
> > Having to support SSGs might be a constraint. But given that all the
> > current scheduler implementations already support SSGs, I tend to think
> > that as an acceptable price for the above discussed usability and
> > flexibility.
> >
> > @Chesnay
> >
> > Will declaring them on slot sharing groups not also waste resources if
> the
> > > parallelism of operators within that group are different?
> > >
> > Yes. It's a trade-off between usability and resource utilization. To
> avoid
> > such wasting, the user can define more groups, so that each group
> contains
> > less operators and the chance of having operators with different
> > parallelism will be reduced. The price is to have more resource
> > requirements to specify.
> >
> > It also seems like quite a hassle for users having to recalculate the
> > > resource requirements if they change the slot sharing.
> > > I'd think that it's not really workable for users that create a set of
> > > re-usable operators which are mixed and matched in their applications;
> > > managing the resources requirements in such a setting would be a
> > > nightmare, and in the end would require operator-level requirements any
> > > way.
> > > In that sense, I'm not even sure whether it really increases usability.
> > >
> >
> >    - As mentioned in my reply to Till's comment, there's no reason to put
> >    multiple operators whose individual resource requirements are already
> > known
> >    into the same group in fine-grained resource management.
> >    - Even an operator implementation is reused for multiple applications,
> >    it does not guarantee the same resource requirements. During our years
> > of
> >    practices in Alibaba, with per-operator requirements specified for
> > Blink's
> >    fine-grained resource management, very few users (including our
> > specialists
> >    who are dedicated to supporting Blink users) are as experienced as to
> >    accurately predict/estimate the operator resource requirements. Most
> > people
> >    rely on the execution-time metrics (throughput, delay, cpu load,
> memory
> >    usage, GC pressure, etc.) to improve the specification.
> >
> > To sum up:
> > If the user is capable of providing proper resource requirements for
> every
> > operator, that's definitely a good thing and we would not need to rely on
> > the SSGs. However, that shouldn't be a *must* for the fine-grained
> resource
> > management to work. For those users who are capable and do not like
> having
> > to set each operator to a separate SSG, I would be ok to have both
> > SSG-based and operator-based runtime interfaces and to only fallback to
> the
> > SSG requirements when the operator requirements are not specified.
> However,
> > as the first step, I think we should prioritise the use cases where users
> > are not that experienced.
> >
> > Thank you~
> >
> > Xintong Song
> >
> > On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <ch...@apache.org>
> > wrote:
> >
> > > Will declaring them on slot sharing groups not also waste resources if
> > > the parallelism of operators within that group are different?
> > >
> > > It also seems like quite a hassle for users having to recalculate the
> > > resource requirements if they change the slot sharing.
> > > I'd think that it's not really workable for users that create a set of
> > > re-usable operators which are mixed and matched in their applications;
> > > managing the resources requirements in such a setting would be a
> > > nightmare, and in the end would require operator-level requirements any
> > > way.
> > > In that sense, I'm not even sure whether it really increases usability.
> > >
> > > My main worry is that it if we wire the runtime to work on SSGs it's
> > > gonna be difficult to implement more fine-grained approaches, which
> > > would not be the case if, for the runtime, they are always defined on
> an
> > > operator-level.
> > >
> > > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > > Thanks for drafting this FLIP and starting this discussion Yangze.
> > > >
> > > > I like that defining resource requirements on a slot sharing group
> > makes
> > > > the overall setup easier and improves usability of resource
> > requirements.
> > > >
> > > > What I do not like about it is that it changes slot sharing groups
> from
> > > > being a scheduling hint to something which needs to be supported in
> > order
> > > > to support fine grained resource requirements. So far, the idea of
> slot
> > > > sharing groups was that it tells the system that a set of operators
> can
> > > be
> > > > deployed in the same slot. But the system still had the freedom to
> say
> > > that
> > > > it would rather place these tasks in different slots if it wanted. If
> > we
> > > > now specify resource requirements on a per slot sharing group, then
> the
> > > > only option for a scheduler which does not support slot sharing
> groups
> > is
> > > > to say that every operator in this slot sharing group needs a slot
> with
> > > the
> > > > same resources as the whole group.
> > > >
> > > > So for example, if we have a job consisting of two operator op_1 and
> > op_2
> > > > where each op needs 100 MB of memory, we would then say that the slot
> > > > sharing group needs 200 MB of memory to run. If we have a cluster
> with
> > 2
> > > > TMs with one slot of 100 MB each, then the system cannot run this
> job.
> > If
> > > > the resources were specified on an operator level, then the system
> > could
> > > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > > >
> > > > Originally, one of the primary goals of slot sharing groups was to
> make
> > > it
> > > > easier for the user to reason about how many slots a job needs
> > > independent
> > > > of the actual number of operators in the job. Interestingly, if all
> > > > operators have their resources properly specified, then slot sharing
> is
> > > no
> > > > longer needed because Flink could slice off the appropriately sized
> > slots
> > > > for every Task individually. What matters is whether the whole
> cluster
> > > has
> > > > enough resources to run all tasks or not.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com>
> wrote:
> > > >
> > > >> Hi, there,
> > > >>
> > > >> We would like to start a discussion thread on "FLIP-156: Runtime
> > > >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> > > >> propose Slot Sharing Group (SSG) based runtime interfaces for
> > > >> specifying fine-grained resource requirements.
> > > >>
> > > >> In this FLIP:
> > > >> - Expound the user story of fine-grained resource management.
> > > >> - Propose runtime interfaces for specifying SSG-based resource
> > > >> requirements.
> > > >> - Discuss the pros and cons of the three potential granularities for
> > > >> specifying the resource requirements (op, task and slot sharing
> group)
> > > >> and explain why we choose the slot sharing group.
> > > >>
> > > >> Please find more details in the FLIP wiki document [1]. Looking
> > > >> forward to your feedback.
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > > >>
> > > >> Best,
> > > >> Yangze Guo
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Stephan Ewen <se...@apache.org>.

Thanks a lot, Yangze and Xintong for this FLIP.

I want to say, first of all, that this is super well written. And the
points that the FLIP makes about how to expose the configuration to users
is exactly the right thing to figure out first.
So good job here!

About how to let users specify the resource profiles. If I can sum the FLIP
and previous discussion up in my own words, the problem is the following:

Operator-level specification is the simplest and cleanest approach, because
> it avoids mixing operator configuration (resource) and scheduling. No
> matter what other parameters change (chaining, slot sharing, switching
> pipelined and blocking shuffles), the resource profiles stay the same.
> But it would require that a user specifies resources on all operators,
> which makes it hard to use. That's why the FLIP suggests going with
> specifying resources on a Sharing-Group.


I think both thoughts are important, so can we find a solution where the
Resource Profiles are specified on an Operator, but we still avoid that we
need to specify a resource profile on every operator?

What do you think about something like the following:
  - Resource Profiles are specified on an operator level.
  - Not all operators need profiles
  - All Operators without a Resource Profile ended up in the default slot
sharing group with a default profile (will get a default slot).
  - All Operators with a Resource Profile will go into another slot sharing
group (the resource-specified-group).
  - Users can define different slot sharing groups for operators like they
do now, with the exception that you cannot mix operators that have a
resource profile and operators that have no resource profile.
  - The default case where no operator has a resource profile is just a
special case of this model
  - The chaining logic sums up the profiles per operator, like it does now,
and the scheduler sums up the profiles of the tasks that it schedules
together.


There is another question about reactive scaling raised in the FLIP. I need
to think a bit about that. That is indeed a bit more tricky once we have
slots of different sizes.
It is not clear then which of the different slot requests the
ResourceManager should fulfill when new resources (TMs) show up, or how the
JobManager redistributes the slots resources when resources (TMs) disappear
This question is pretty orthogonal, though, to the "how to specify the
resources".


Best,
Stephan

On Fri, Jan 8, 2021 at 5:14 AM Xintong Song <to...@gmail.com> wrote:

> Thanks for drafting the FLIP and driving the discussion, Yangze.
> And Thanks for the feedback, Till and Chesnay.
>
> @Till,
>
> I agree that specifying requirements for SSGs means that SSGs need to be
> supported in fine-grained resource management, otherwise each operator
> might use as many resources as the whole group. However, I cannot think of
> a strong reason for not supporting SSGs in fine-grained resource
> management.
>
>
> > Interestingly, if all operators have their resources properly specified,
> > then slot sharing is no longer needed because Flink could slice off the
> > appropriately sized slots for every Task individually.
> >
>
> So for example, if we have a job consisting of two operator op_1 and op_2
> > where each op needs 100 MB of memory, we would then say that the slot
> > sharing group needs 200 MB of memory to run. If we have a cluster with 2
> > TMs with one slot of 100 MB each, then the system cannot run this job. If
> > the resources were specified on an operator level, then the system could
> > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
>
>
> Couldn't agree more that if all operators' requirements are properly
> specified, slot sharing should be no longer needed. I think this exactly
> disproves the example. If we already know op_1 and op_2 each needs 100 MB
> of memory, why would we put them in the same group? If they are in separate
> groups, with the proposed approach the system can freely deploy them to
> either a 200 MB TM or two 100 MB TMs.
>
> Moreover, the precondition for not needing slot sharing is having resource
> requirements properly specified for all operators. This is not always
> possible, and usually requires tremendous efforts. One of the benefits for
> SSG-based requirements is that it allows the user to freely decide the
> granularity, thus efforts they want to pay. I would consider SSG in
> fine-grained resource management as a group of operators that the user
> would like to specify the total resource for. There can be only one group
> in the job, 2~3 groups dividing the job into a few major parts, or as many
> groups as the number of tasks/operators, depending on how fine-grained the
> user is able to specify the resources.
>
> Having to support SSGs might be a constraint. But given that all the
> current scheduler implementations already support SSGs, I tend to think
> that as an acceptable price for the above discussed usability and
> flexibility.
>
> @Chesnay
>
> Will declaring them on slot sharing groups not also waste resources if the
> > parallelism of operators within that group are different?
> >
> Yes. It's a trade-off between usability and resource utilization. To avoid
> such wasting, the user can define more groups, so that each group contains
> less operators and the chance of having operators with different
> parallelism will be reduced. The price is to have more resource
> requirements to specify.
>
> It also seems like quite a hassle for users having to recalculate the
> > resource requirements if they change the slot sharing.
> > I'd think that it's not really workable for users that create a set of
> > re-usable operators which are mixed and matched in their applications;
> > managing the resources requirements in such a setting would be a
> > nightmare, and in the end would require operator-level requirements any
> > way.
> > In that sense, I'm not even sure whether it really increases usability.
> >
>
>    - As mentioned in my reply to Till's comment, there's no reason to put
>    multiple operators whose individual resource requirements are already
> known
>    into the same group in fine-grained resource management.
>    - Even an operator implementation is reused for multiple applications,
>    it does not guarantee the same resource requirements. During our years
> of
>    practices in Alibaba, with per-operator requirements specified for
> Blink's
>    fine-grained resource management, very few users (including our
> specialists
>    who are dedicated to supporting Blink users) are as experienced as to
>    accurately predict/estimate the operator resource requirements. Most
> people
>    rely on the execution-time metrics (throughput, delay, cpu load, memory
>    usage, GC pressure, etc.) to improve the specification.
>
> To sum up:
> If the user is capable of providing proper resource requirements for every
> operator, that's definitely a good thing and we would not need to rely on
> the SSGs. However, that shouldn't be a *must* for the fine-grained resource
> management to work. For those users who are capable and do not like having
> to set each operator to a separate SSG, I would be ok to have both
> SSG-based and operator-based runtime interfaces and to only fallback to the
> SSG requirements when the operator requirements are not specified. However,
> as the first step, I think we should prioritise the use cases where users
> are not that experienced.
>
> Thank you~
>
> Xintong Song
>
> On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <ch...@apache.org>
> wrote:
>
> > Will declaring them on slot sharing groups not also waste resources if
> > the parallelism of operators within that group are different?
> >
> > It also seems like quite a hassle for users having to recalculate the
> > resource requirements if they change the slot sharing.
> > I'd think that it's not really workable for users that create a set of
> > re-usable operators which are mixed and matched in their applications;
> > managing the resources requirements in such a setting would be a
> > nightmare, and in the end would require operator-level requirements any
> > way.
> > In that sense, I'm not even sure whether it really increases usability.
> >
> > My main worry is that it if we wire the runtime to work on SSGs it's
> > gonna be difficult to implement more fine-grained approaches, which
> > would not be the case if, for the runtime, they are always defined on an
> > operator-level.
> >
> > On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > > Thanks for drafting this FLIP and starting this discussion Yangze.
> > >
> > > I like that defining resource requirements on a slot sharing group
> makes
> > > the overall setup easier and improves usability of resource
> requirements.
> > >
> > > What I do not like about it is that it changes slot sharing groups from
> > > being a scheduling hint to something which needs to be supported in
> order
> > > to support fine grained resource requirements. So far, the idea of slot
> > > sharing groups was that it tells the system that a set of operators can
> > be
> > > deployed in the same slot. But the system still had the freedom to say
> > that
> > > it would rather place these tasks in different slots if it wanted. If
> we
> > > now specify resource requirements on a per slot sharing group, then the
> > > only option for a scheduler which does not support slot sharing groups
> is
> > > to say that every operator in this slot sharing group needs a slot with
> > the
> > > same resources as the whole group.
> > >
> > > So for example, if we have a job consisting of two operator op_1 and
> op_2
> > > where each op needs 100 MB of memory, we would then say that the slot
> > > sharing group needs 200 MB of memory to run. If we have a cluster with
> 2
> > > TMs with one slot of 100 MB each, then the system cannot run this job.
> If
> > > the resources were specified on an operator level, then the system
> could
> > > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> > >
> > > Originally, one of the primary goals of slot sharing groups was to make
> > it
> > > easier for the user to reason about how many slots a job needs
> > independent
> > > of the actual number of operators in the job. Interestingly, if all
> > > operators have their resources properly specified, then slot sharing is
> > no
> > > longer needed because Flink could slice off the appropriately sized
> slots
> > > for every Task individually. What matters is whether the whole cluster
> > has
> > > enough resources to run all tasks or not.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com> wrote:
> > >
> > >> Hi, there,
> > >>
> > >> We would like to start a discussion thread on "FLIP-156: Runtime
> > >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> > >> propose Slot Sharing Group (SSG) based runtime interfaces for
> > >> specifying fine-grained resource requirements.
> > >>
> > >> In this FLIP:
> > >> - Expound the user story of fine-grained resource management.
> > >> - Propose runtime interfaces for specifying SSG-based resource
> > >> requirements.
> > >> - Discuss the pros and cons of the three potential granularities for
> > >> specifying the resource requirements (op, task and slot sharing group)
> > >> and explain why we choose the slot sharing group.
> > >>
> > >> Please find more details in the FLIP wiki document [1]. Looking
> > >> forward to your feedback.
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> > >>
> > >> Best,
> > >> Yangze Guo
> > >>
> >
> >
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Xintong Song <to...@gmail.com>.

Thanks for drafting the FLIP and driving the discussion, Yangze.
And Thanks for the feedback, Till and Chesnay.

@Till,

I agree that specifying requirements for SSGs means that SSGs need to be
supported in fine-grained resource management, otherwise each operator
might use as many resources as the whole group. However, I cannot think of
a strong reason for not supporting SSGs in fine-grained resource management.


> Interestingly, if all operators have their resources properly specified,
> then slot sharing is no longer needed because Flink could slice off the
> appropriately sized slots for every Task individually.
>

So for example, if we have a job consisting of two operator op_1 and op_2
> where each op needs 100 MB of memory, we would then say that the slot
> sharing group needs 200 MB of memory to run. If we have a cluster with 2
> TMs with one slot of 100 MB each, then the system cannot run this job. If
> the resources were specified on an operator level, then the system could
> still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.


Couldn't agree more that if all operators' requirements are properly
specified, slot sharing should be no longer needed. I think this exactly
disproves the example. If we already know op_1 and op_2 each needs 100 MB
of memory, why would we put them in the same group? If they are in separate
groups, with the proposed approach the system can freely deploy them to
either a 200 MB TM or two 100 MB TMs.

Moreover, the precondition for not needing slot sharing is having resource
requirements properly specified for all operators. This is not always
possible, and usually requires tremendous efforts. One of the benefits for
SSG-based requirements is that it allows the user to freely decide the
granularity, thus efforts they want to pay. I would consider SSG in
fine-grained resource management as a group of operators that the user
would like to specify the total resource for. There can be only one group
in the job, 2~3 groups dividing the job into a few major parts, or as many
groups as the number of tasks/operators, depending on how fine-grained the
user is able to specify the resources.

Having to support SSGs might be a constraint. But given that all the
current scheduler implementations already support SSGs, I tend to think
that as an acceptable price for the above discussed usability and
flexibility.

@Chesnay

Will declaring them on slot sharing groups not also waste resources if the
> parallelism of operators within that group are different?
>
Yes. It's a trade-off between usability and resource utilization. To avoid
such wasting, the user can define more groups, so that each group contains
less operators and the chance of having operators with different
parallelism will be reduced. The price is to have more resource
requirements to specify.

It also seems like quite a hassle for users having to recalculate the
> resource requirements if they change the slot sharing.
> I'd think that it's not really workable for users that create a set of
> re-usable operators which are mixed and matched in their applications;
> managing the resources requirements in such a setting would be a
> nightmare, and in the end would require operator-level requirements any
> way.
> In that sense, I'm not even sure whether it really increases usability.
>

   - As mentioned in my reply to Till's comment, there's no reason to put
   multiple operators whose individual resource requirements are already known
   into the same group in fine-grained resource management.
   - Even an operator implementation is reused for multiple applications,
   it does not guarantee the same resource requirements. During our years of
   practices in Alibaba, with per-operator requirements specified for Blink's
   fine-grained resource management, very few users (including our specialists
   who are dedicated to supporting Blink users) are as experienced as to
   accurately predict/estimate the operator resource requirements. Most people
   rely on the execution-time metrics (throughput, delay, cpu load, memory
   usage, GC pressure, etc.) to improve the specification.

To sum up:
If the user is capable of providing proper resource requirements for every
operator, that's definitely a good thing and we would not need to rely on
the SSGs. However, that shouldn't be a *must* for the fine-grained resource
management to work. For those users who are capable and do not like having
to set each operator to a separate SSG, I would be ok to have both
SSG-based and operator-based runtime interfaces and to only fallback to the
SSG requirements when the operator requirements are not specified. However,
as the first step, I think we should prioritise the use cases where users
are not that experienced.

Thank you~

Xintong Song

On Thu, Jan 7, 2021 at 9:55 PM Chesnay Schepler <ch...@apache.org> wrote:

> Will declaring them on slot sharing groups not also waste resources if
> the parallelism of operators within that group are different?
>
> It also seems like quite a hassle for users having to recalculate the
> resource requirements if they change the slot sharing.
> I'd think that it's not really workable for users that create a set of
> re-usable operators which are mixed and matched in their applications;
> managing the resources requirements in such a setting would be a
> nightmare, and in the end would require operator-level requirements any
> way.
> In that sense, I'm not even sure whether it really increases usability.
>
> My main worry is that it if we wire the runtime to work on SSGs it's
> gonna be difficult to implement more fine-grained approaches, which
> would not be the case if, for the runtime, they are always defined on an
> operator-level.
>
> On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> > Thanks for drafting this FLIP and starting this discussion Yangze.
> >
> > I like that defining resource requirements on a slot sharing group makes
> > the overall setup easier and improves usability of resource requirements.
> >
> > What I do not like about it is that it changes slot sharing groups from
> > being a scheduling hint to something which needs to be supported in order
> > to support fine grained resource requirements. So far, the idea of slot
> > sharing groups was that it tells the system that a set of operators can
> be
> > deployed in the same slot. But the system still had the freedom to say
> that
> > it would rather place these tasks in different slots if it wanted. If we
> > now specify resource requirements on a per slot sharing group, then the
> > only option for a scheduler which does not support slot sharing groups is
> > to say that every operator in this slot sharing group needs a slot with
> the
> > same resources as the whole group.
> >
> > So for example, if we have a job consisting of two operator op_1 and op_2
> > where each op needs 100 MB of memory, we would then say that the slot
> > sharing group needs 200 MB of memory to run. If we have a cluster with 2
> > TMs with one slot of 100 MB each, then the system cannot run this job. If
> > the resources were specified on an operator level, then the system could
> > still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
> >
> > Originally, one of the primary goals of slot sharing groups was to make
> it
> > easier for the user to reason about how many slots a job needs
> independent
> > of the actual number of operators in the job. Interestingly, if all
> > operators have their resources properly specified, then slot sharing is
> no
> > longer needed because Flink could slice off the appropriately sized slots
> > for every Task individually. What matters is whether the whole cluster
> has
> > enough resources to run all tasks or not.
> >
> > Cheers,
> > Till
> >
> > On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com> wrote:
> >
> >> Hi, there,
> >>
> >> We would like to start a discussion thread on "FLIP-156: Runtime
> >> Interfaces for Fine-Grained Resource Requirements"[1], where we
> >> propose Slot Sharing Group (SSG) based runtime interfaces for
> >> specifying fine-grained resource requirements.
> >>
> >> In this FLIP:
> >> - Expound the user story of fine-grained resource management.
> >> - Propose runtime interfaces for specifying SSG-based resource
> >> requirements.
> >> - Discuss the pros and cons of the three potential granularities for
> >> specifying the resource requirements (op, task and slot sharing group)
> >> and explain why we choose the slot sharing group.
> >>
> >> Please find more details in the FLIP wiki document [1]. Looking
> >> forward to your feedback.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
> >>
> >> Best,
> >> Yangze Guo
> >>
>
>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Chesnay Schepler <ch...@apache.org>.

Will declaring them on slot sharing groups not also waste resources if 
the parallelism of operators within that group are different?

It also seems like quite a hassle for users having to recalculate the 
resource requirements if they change the slot sharing.
I'd think that it's not really workable for users that create a set of 
re-usable operators which are mixed and matched in their applications;
managing the resources requirements in such a setting would be a 
nightmare, and in the end would require operator-level requirements any way.
In that sense, I'm not even sure whether it really increases usability.

My main worry is that it if we wire the runtime to work on SSGs it's 
gonna be difficult to implement more fine-grained approaches, which 
would not be the case if, for the runtime, they are always defined on an 
operator-level.

On 1/7/2021 2:42 PM, Till Rohrmann wrote:
> Thanks for drafting this FLIP and starting this discussion Yangze.
>
> I like that defining resource requirements on a slot sharing group makes
> the overall setup easier and improves usability of resource requirements.
>
> What I do not like about it is that it changes slot sharing groups from
> being a scheduling hint to something which needs to be supported in order
> to support fine grained resource requirements. So far, the idea of slot
> sharing groups was that it tells the system that a set of operators can be
> deployed in the same slot. But the system still had the freedom to say that
> it would rather place these tasks in different slots if it wanted. If we
> now specify resource requirements on a per slot sharing group, then the
> only option for a scheduler which does not support slot sharing groups is
> to say that every operator in this slot sharing group needs a slot with the
> same resources as the whole group.
>
> So for example, if we have a job consisting of two operator op_1 and op_2
> where each op needs 100 MB of memory, we would then say that the slot
> sharing group needs 200 MB of memory to run. If we have a cluster with 2
> TMs with one slot of 100 MB each, then the system cannot run this job. If
> the resources were specified on an operator level, then the system could
> still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.
>
> Originally, one of the primary goals of slot sharing groups was to make it
> easier for the user to reason about how many slots a job needs independent
> of the actual number of operators in the job. Interestingly, if all
> operators have their resources properly specified, then slot sharing is no
> longer needed because Flink could slice off the appropriately sized slots
> for every Task individually. What matters is whether the whole cluster has
> enough resources to run all tasks or not.
>
> Cheers,
> Till
>
> On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com> wrote:
>
>> Hi, there,
>>
>> We would like to start a discussion thread on "FLIP-156: Runtime
>> Interfaces for Fine-Grained Resource Requirements"[1], where we
>> propose Slot Sharing Group (SSG) based runtime interfaces for
>> specifying fine-grained resource requirements.
>>
>> In this FLIP:
>> - Expound the user story of fine-grained resource management.
>> - Propose runtime interfaces for specifying SSG-based resource
>> requirements.
>> - Discuss the pros and cons of the three potential granularities for
>> specifying the resource requirements (op, task and slot sharing group)
>> and explain why we choose the slot sharing group.
>>
>> Please find more details in the FLIP wiki document [1]. Looking
>> forward to your feedback.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>>
>> Best,
>> Yangze Guo
>>

Re: [DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

Posted by Till Rohrmann <tr...@apache.org>.

Thanks for drafting this FLIP and starting this discussion Yangze.

I like that defining resource requirements on a slot sharing group makes
the overall setup easier and improves usability of resource requirements.

What I do not like about it is that it changes slot sharing groups from
being a scheduling hint to something which needs to be supported in order
to support fine grained resource requirements. So far, the idea of slot
sharing groups was that it tells the system that a set of operators can be
deployed in the same slot. But the system still had the freedom to say that
it would rather place these tasks in different slots if it wanted. If we
now specify resource requirements on a per slot sharing group, then the
only option for a scheduler which does not support slot sharing groups is
to say that every operator in this slot sharing group needs a slot with the
same resources as the whole group.

So for example, if we have a job consisting of two operator op_1 and op_2
where each op needs 100 MB of memory, we would then say that the slot
sharing group needs 200 MB of memory to run. If we have a cluster with 2
TMs with one slot of 100 MB each, then the system cannot run this job. If
the resources were specified on an operator level, then the system could
still make the decision to deploy op_1 to TM_1 and op_2 to TM_2.

Originally, one of the primary goals of slot sharing groups was to make it
easier for the user to reason about how many slots a job needs independent
of the actual number of operators in the job. Interestingly, if all
operators have their resources properly specified, then slot sharing is no
longer needed because Flink could slice off the appropriately sized slots
for every Task individually. What matters is whether the whole cluster has
enough resources to run all tasks or not.

Cheers,
Till

On Thu, Jan 7, 2021 at 4:08 AM Yangze Guo <ka...@gmail.com> wrote:

> Hi, there,
>
> We would like to start a discussion thread on "FLIP-156: Runtime
> Interfaces for Fine-Grained Resource Requirements"[1], where we
> propose Slot Sharing Group (SSG) based runtime interfaces for
> specifying fine-grained resource requirements.
>
> In this FLIP:
> - Expound the user story of fine-grained resource management.
> - Propose runtime interfaces for specifying SSG-based resource
> requirements.
> - Discuss the pros and cons of the three potential granularities for
> specifying the resource requirements (op, task and slot sharing group)
> and explain why we choose the slot sharing group.
>
> Please find more details in the FLIP wiki document [1]. Looking
> forward to your feedback.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-156%3A+Runtime+Interfaces+for+Fine-Grained+Resource+Requirements
>
> Best,
> Yangze Guo
>